Citation
Web server performance measurement methodology

Material Information

Title:
Web server performance measurement methodology
Creator:
Tsai, Dominica Cheung
Place of Publication:
Denver, Colo.
Publisher:
University of Colorado Denver
Publication Date:
Language:
English
Physical Description:
vii, 94 leaves : illustrations ; 28 cm

Thesis/Dissertation Information

Degree:
Master's ( Master of Science)
Degree Grantor:
University of Colorado Denver
Degree Divisions:
Computer Science and Engineering Department, CU Denver
Degree Disciplines:
Computer Science
Committee Chair:
Altman, Tom
Committee Members:
Hagan, Randy
Radenkovic, Mike
Stilman, Boris

Subjects

Subjects / Keywords:
Web servers -- Evaluation ( lcsh )
Browsers (Computer programs) -- Evaluation ( lcsh )
Genre:
bibliography ( marcgt )
theses ( marcgt )
non-fiction ( marcgt )

Notes

Bibliography:
Includes bibliographical references (leaves 92-94).
Thesis:
Computer science
General Note:
Department of Computer Science and Engineering
Statement of Responsibility:
by Dominica Cheung Tsai.

Record Information

Source Institution:
|University of Colorado Denver
Holding Location:
|Auraria Library
Rights Management:
All applicable rights reserved by the source institution and holding location.
Resource Identifier:
42612563 ( OCLC )
ocm42612563
Classification:
LD1190.E52 1999m .T73 ( lcc )

Full Text
WEB SERVER PERFORMANCE MEASUREMENT METHODOLOGY
Dominica Cheung Tsai
B.S., Northeastern University, 1984
A thesis submitted to the
University of Colorado at Denver
in partial fulfillment
of the requirements for the degree of
Master of Science
Computer Science
by
1999


This thesis for the Master of Science
degree by
Dominica Cheung Tsai
has been approved
by
Tom Altman
Date


Tsai, Dominica Cheung (M.S., Computer Science)
Web Server Performance Measurement Methodology
Thesis directed by Professor Tom Altman
ABSTRACT
High-performance Web servers, network infrastructure, and effective searching
through cyberspace are some of the components that are constituted to meet the
growing demands of the World Wide Web. Satisfying these demands requires a
thorough understanding of key factors that are affecting Web performance. A Web
connection consists of a browser, a network, and an http server. There are a lot of
activities happening when a user clicks on a web page link. The Web is governed
by the underlying protocols (such as HTTP, TCP/IP) and the Internet
infrastructure. This thesis discusses what are the events happening and breaks
down the events into different stages.
This thesis presents some measuring techniques for evaluating Web performance.
It describes the custom instrumentation that was developed and deployed to collect
performance data of CU Denver Web server (carbon). The measurement focuses
on the server response time and client response time. Server response time is the
time that a single HTTP request spends at the server pool process. It does not
include queuing delays in the network or at the server prior to the request reaching
the server pool process (client queuing delays). Client response time is the time
between a user sends a request at the browser and the data finished receiving by the
browser. It does not include the additional time the client browser needs to display
the information to the user. Network latency was calculated based on the
measured data (both server response time and client response time) and it is the
queuing delays in the network plus client queuing delays at the server.
To evaluate the performance of a web server, an experiment was set up to measure
response time at different stages. The experiment was conducted to find out
111


impacts of web performance due to the nature of requests and how the web server
was configured to handle light/heavy traffic and static/dynamic web pages. The
key factors depend on the nature of the files (HTML, image files, or CGI
programs). Data were collected at different times of the day. Data collected from
the experiment were analyzed and discussed. Time stamp functions were added at
different stages on the browser and the http server. By taking advantage of the
open source code policies of NCSA httpd server and the Netscapes Mozilla (a
browser), the source codes were changed to include the time stamp function. This
experiment was capable to measure the performance response times at the stages
that: 1) an http request was sent from the browser, 2) an http request was received
at the httpd server, 3) the httpd server sent back the result of the request, and 4) the
browser received the result of the request. Frequently accessed Web pages and
Web pages generated by CGI were selected as the research objects.
This abstract accurately represents the content of the candidate's thesis. I
recommend its publication.
Signed
Tom Altman
IV


CONTENTS
Chapter
1. Introduction.........................................................1
2. World Wide Web.......................................................4
2.1 Web Overview.....................................................4
2.2 HTML Document Delay in a browser.................................7
2.3 Network Infrastructure...........................................8
2.4 Web Server Model................................................11
2.4.1 Web Server architecture.........................................11
2.4.2 Metrics.........................................................14
2.5 HTTP Protocol...................................................17
3. Related Research....................................................19
3.1 Performance degradation by dynamic pages........................19
3.2 Measurement using custom instrumentation........................20
3.3 Performance analysis of workload characteristics................21
3.4 Optimal Web server performance..................................23
4. Measurement Methodology.............................................25
4.1 Measurement objectives..........................................26
4.2 Code Changes in Browser.........................................27
4.3 Code Changes in HTTP Server.....................................28
4.4 Network Latency.................................................30
4.5 Measurement Procedures..........................................31
4.5.1 Synchronization.................................................31
4.5.2 Data Collection.................................................32
4.6 Scope of the measurement........................................35
4.6.1 CU Denver main page.............................................36
4.6.2 Aboutcamp GIF file..............................................36
4.6.3 CGI program.....................................................38
4.7 Time of the day to collect data.................................39
5. Performance Analysis................................................41
5.1 Main page analysis..............................................41
5.1.1 Time of the day measurement.....................................44
5.1.2 HTML Document Vs Image Document.................................47
5.1.3 Embedded Image Document Measurement.............................48
5.2 Most frequently requested GIF file..............................49
v


5.2.1 Time of the day measurement.....................................50
5.3 CGI program......................................................52
5.3.1 Time of the day measurement......................................53
6. Future Research......................................................56
7. Conclusion...........................................................58
8. Acknowledgement......................................................60
Appendix
A. Source code for getting the Unix time..............................61
B. Source code for the CGI program....................................62
C. Source code for Telnet program.....................................63
D. CU Denver Network Configuration.....................................66
E. Traffic Summary of CU Denver Web Server............................67
F. Sample log in the http server.......................................69
G. HTTP Server source code-log_transaction.c...........................72
H. HTTP Server source code http_request.c............................74
I. Browser Mozilla source code.........................................76
J. Response Time Measurement............................................90
References
92


FIGURES
Figure
2.1 A simplified network connectivity for Web access................7
2.2 A simplified dial-up connection from PC to CU Denver Web server.11
2.3 Web server model for HTTP request processing...................12
2.4 A timeline of an HTTP request by a client to a Web server......15
4.1 Mozilla browser layout and the CU Denver Main Page.............34
4.2 Hourly traffic by achieves at CU Denver.......................37
4.3 Hourly traffic at University of Colorado at Denver............40
5.1 CU Denver HTML Measurement.....................................45
5.2 Header.jpg Measurement.........................................45
5.3 Embedded Image Document Measurement............................49
5.4 GradRules HTML Document Measurement............................51
5.5 Aboutcamp.gif Document Measurement.............................52
5.6 Index.html Measurement.........................................54
5.7 CGI Document Measurement.......................................55
vii


1. Introduction
The popularity of the World Wide Web (also called WWW or the Web) has
increased significantly during last few years. Today, the Web usage is dominating
the Internet traffic and has the largest share user of the Internet [6] [7] [26],
There are wide ranges of reasons behind this explosive growth in Web traffic.
These reasons include: the machine-independent nature of the languages and
protocols used for constructing and exchanging Web documents; the availability of
graphical user interface such as browser for navigating the Web (for example:
NCSA Mosaic, Netscape Navigator, and Internet Explorer)', E-commerce is
growing in popularity, and corporate Web pages are increasing in cyberspace; an
emerging trend among researchers and educational institutions to use the Web for
disseminating information in a timely fashion [6] [7] [26].
The phenomenal growth in Web traffic has sparked much research activities on
improving the World Wide Web. Much of the recent research activities have
been aimed at analyzing and evaluating the performance of Internet and Intranet
Web servers, and allowing capacity planning and performance prediction of server
designs.
In order to promote the Internet usage, there is a need to improve the performance
of a web server so that the user will not have to wait too long to get a response
1


from the web server. The purpose of this thesis is to present measurement
techniques for evaluating Web performance. Throughout the study, emphasis is
placed on Internet servers performance and Internet traffic. Measurement
techniques in the above two areas are described in details.
To measure the performance, one needs to use a model to represent how a web
server works. This thesis intends to break down the activities under the hood of an
http connection. By using this model, the response times are measured at different
stages such that the bottleneck can be identified and a solution can be found to
improve the performance.
Chapter two discusses the basic of World Wide Web and the architecture of a
typical Web connection, the breakdown of a typical http connection, and the metric
that the experiment is going to measure. A typical http connection consists of
three components: browser, network and server. These three components are
discussed in depth in this Chapter. Chapter three briefly discusses related works
that have been done to solve the Web performance problems. Chapter four
describes the methodology used in the experiment, how and why the experiment
was performed this way. In Chapter five, the data collected in the experiment are
discussed. It explains the difference in performance because of the nature of the
file, file size and the time of the day that the measurements were taken. Chapter six
presents the topic of the future research that can be done in the Web performance
2


area. Chapter seven concludes the findings of the Web performance measurement
methodology.
3


2. World Wide Web
This chapter presents an overview of the World Wide Web. World Wide Web
consists of three major components: browser, network and server. It describes the
underlying factors that affect the delay of displaying hypertext markup language
(HTML) information content at the client site. The network infrastructure is
briefly discussed. Finally, the general architecture of a Web server system is
presented and followed by specific metrics that are of interest in this research.
2.1 Web Overview
The World Wide Web is based on client-server architecture. Communication
between a client and a server is always in the form of request-response pairs, and is
normally initiated by the client [26].
A client accesses documents on the Web using a Web browser. Each Web page
may consist of multiple documents i.e., hot links to the elements that make up the
page (some page elements may be located on a different server). When a user
retrieves a particular Web page (defined by a Universal Resource Locator URL),
the browser generates a request to be sent to the corresponding Web server. Using
the Hypertext Transfer Protocol (HTTP), a network connection is set up with the
destination server providing the data for the request. Finally, the TCP/IP protocol
is used to move data to the client where the Web page is constructed. However,
4


TCP/IP window and acknowledgement packets do not count as part of Web page
overhead. The response from the destination server includes a status code to
inform the client if the request succeeds. If the request succeeds, then the server
provides an HTML file and the Web page is displayed at the client site. If the
request is unsuccessful, a reason for failure is returned to the client [7] [27].
The sequence of an http request is shown as follows:
A user types in the URL address and opens the web page on a browser. This event
initiates a client request of getting an HTML file. The server gets the request and
sends the basic HTML file. The browser receives the HTML file and displays it on
its display area. However, most of the HTML pages contain embedded image files
or applets. To display the image files or an applet, the browser sends requests to
get them. There is an http connection for each image file. A complete HTML page
is displayed when all the image files and applets are received by the browser.
Under the hood, the browser gets the URL, establishes an http connection and then
sends the request. If it is a request to retrieve a Web page, the browser will issue a
GET command to the server. The http request is sent through the network using the
TCP/IP protocol.
5


The network connection from a dial up network is normally through an Internet
service provider (ISP). CU Denver also provides a dial up network that is similar
to an ISP. A user dials in a modem pool of an ISP. All the requests from the user
are sent from the ISP server to the public Internet network. All the request-
response messages are sent via routers and hubs within the Internet infrastructure.
The ISP sends the request via a router to a hub. Then the hub sends the request to
another router that connects to the Web server. Eventually, the http request arrives
at the appropriate Web server. The http request may have to pass through several
layers of routers and hubs to get to the destination Web server. A Web server
processes the request and sends back an HTML page to the user that can be
displayed on the users browser. Figure 2.1 illustrates a simplified network
connectivity for Web access.
6


Figure 2.1: A simplified network connectivity for Web access.
2.2 HTML Document Delay in a browser
The interface between the Web server and the user is the browser. A user does not
need to know whats behind the scene. The HTML file's content is what interests
the user. An HTML document (Web page) can be extremely complex and large.
Basically, Web response time depends on several criteria the page size and
attachment and the underlying network infrastructure. If the HTML is a dynamic
page, it will take more time for the server to retrieve enough information and
compose the HTML page. For example, an HTML page with a lot of embedded
7


image files, such as GIF files or JPEG files, will require several http connections to
retrieve all the image files (normally one http connection for each image file).
Typically, a browser has a cache to store the most recently received images and
HTML files. If one is using these files repeatedly while surfing the Internet, the
response time can be very quick because the files are already in the PC. Some
browsers also maintain a database that contains files that were previously
downloaded. For example, before a browser issues a GET command for an image
file, it checks the cache and the database before it issues the command to the
server. The most popular browsers are from Netscape and Microsoft. This
research has taken the advantage of the Netscapes Mozilla open source code
policy. Time stamp functions are added to the browser to record the time when the
http request is sent and when the files are received in the cache of the browser.
2.3 Network Infrastructure
Network infrastructure performance depends on many parameters, such as the Web
server performance (which depends on its workload characteristics), physical
distance across the Internet, access line rates, protocol settings and the performance
of intermediate subsystems (such as LAN, Ethernet or routers, etc.).
The typical bandwidth that is used in different segments of a Web environment is
illustrated in Figure 2.1. In general, a Web server is typically connected to the
Internet backbone (usually a high-speed network using ATM or SONET) via ISDN
8


(Integrated Services Digital Network, 128Kb/s), a T1 link (1.5Mb/s), or a T3 link
(45 Mb/s). For a Web server hooked up to a local area network (LAN), which may
or may not be Intranet, common LAN speeds are 10 Mb/s for Ethernet and 100
Mb/s for fast Ethernet (FDDI). Overall, the bandwidth of the Internet backbone is
usually not a limiting factor for Web performance [3]. In general, the Internet
backbone of each ISP varies a lot. The dial up network at CU Denver was used in
this experiment.
Performance bottlenecks often take place between the links of a client or a server
to the Internet backbone. The slowest link in the network of the Web access
usually is the access link connecting a user facility (client) to an access provider
(ISP). The transmission rates for a leased digital telephone line typically range
from 56 Kb/s to 1.5 Mb/s [3] [6]. A dial up connection using a modem can range
from 9600 to 56 Kb/s. For clients, using regular telephone lines (which is often at
28.8 Kb/s or 56Kb/s), the real bottleneck for accessing Web sites is the client
connection to the Internet.
When the experiment was conducted, the modem speed was 56Kb/s. The PC was
dialing in to a modem pool with 56 Kb/s at CU Denver. However, the connection
speed was only at 28.8Kb/s due to the poor quality of the phone line. When the
modem pool established a connection between the computer and the CU Denver
9


computer network, a temporary TCP/IP address was assigned. The computer and
the CU Denver were connected as shown in the Figure 2.2.
The modem pool at CU Denver is simply a digital plug in at the back of a CISCO
router, access 5800, located at the main administration building. An Asonte
switch, which is located in the North Classroom Building, is linked between the
CISCO router and the CU Denver Web server. The link between the CISCO
router and the Asonte switch, switch 50600, is a fiber optic link with 100 Mb/s.
The link between the switch and the Web server is an Ethernet network with 10
Mb/s.
A map showing CU Denver Internet and Intranet network connectivity is found in
the Appendix D. When Web data is sending or retrieving to or from Web sites
through the Internet backbone, all information goes through a fiber optic link to the
chief cudenver.edu router and out the wan-campus.cudenver.edu router and
vice versa. Once the PC is connected to the CU Denver modem pool, the PC is
then part of the CU Denver network. The access of CU Denver Web page from the
PC is just like an Intranet access.
10


Figure 2.2: A simplified dial-up connection from PC to CU Denver
Web server.
2.4 Web Server Model
A Web server eventually handles a user request from a browser. The response
time of a Web server determines how long the user will have to wait. To measure
the response time, we need to understand how a typical World Wide Web server
system works and how to determine the workload metrics required for evaluating
Web performance.
2.4.1 Web Server architecture
The architecture of a typical World Wide Web server and its service environment
is modeled as shown in Figure 2.3 [1].
11


Figure 2.3: Web server model for HTTP request processing.
The Internet or Intranet Web server receives requests for its contents from one or
more client browser applications. HTTP requests arrive at the httpd Listener
process on the server and are dispatched to one of a pool of Web Server processes.
The number of servers in the pool varies from a fixed lower bound to an elastic
upper bound; each process can serve one request at a time. If the request is for
HTML or image content, the Web server retrieves the content and returns it
directly to the client. If the request is for dynamic Common Gateway Interface
(CGI) content the Web server creates a child process which runs a CGI script to
12


compute the requested information and returns output to the httpd server and send
back to the client [1], The number of requests and the number of Web application
servers available will impact the performance of the Web server.
The above model is a typical model for a dedicated server. A dedicated server is a
machine that is used only as a Web server and nothing else. Most of the
commercial Web servers are using this model. And the server pool is normally
distributed on several machines. It can handle multiple requests at the same time.
The machine used for testing in this research was not a dedicated server. All the
components in this model were resided on the same machine. Nevertheless, this
model can still be applied to the testing machine. The listener received an http
request, passed it to the process request function and, in some cases, the http
request initiated a CGI script. The http server used in this experiment was
obtained from the NCSA.
13


2.4.2 Metrics
To measure the response of an http request, it is necessary to break down the
activities behind the hood. First a client sends an http request by putting in the
URL or by clicking a link on a Web page. The browser sets up an http connection
with the Web server and then sends out an http request. The http request arrives at
the Web server via the network connection. The http daemon (httpd) at the server
acts as a listener. It parses the request and figures out the nature of the command.
In tum, it passes the http request to the request processor. The server then sends
back the result of the request back to the browser. The browser receives the result
via the same http connection. If the request is to retrieve an HTML file, the
browser will close the http connection once the browser receives the file.
In this research, the main focuses are on the time that a http request is sent out
(QNet,ciient), the elapse time of the HTML file returned to the browser (Rc, Load), and
the time that the http request is handled by the Web server, server response time
(Rs). The network latency can be calculated by subtracting Rs from the total
client response time (Rc, Load CWciiem)-
Some of the measurement metrics, used in this thesis, are based on the specific
metrics that are described by Dilley et al.[l] and are illustrated in Figure 2.4.
14


Figure 2.4: A timeline of an HTTP request by a client to a Web server.
Time
A A A A A A A A^
Client Sends Request Request Arrives at Web Server Request at httpd Parsing Complete Processing Complete Sending Complete Client received header Client finished receiving
QncI .Client Qs R Sparse Rs.Proc Rsjfcl Q Net .Server Q Client Reload
QNet, client measure the network and queuing delays from the time the client made
the request to the http server to the time when the request arrives at the server.
Qs the time that the request arrives at the Web server.
QNet, server measure the network queuing and transmissions delay for the last
window of TCP data that is queued in the servers network buffers.
Rs, Server response time the time that a single HTTP request spends at the server
pool process. It includes its service time and queuing delays at physical resources
in order to complete the request. It does not include queuing delays in the network
or at the server prior to the request reaching the server pool process. It consists of
three subcomponents.
Rs, Parse, Server parse time the time that the server spends reading the client
request.
Rs, Proc, Server Processing time the time that the server spends processing
the request.
15


Rs,Net, Server network time the time that it takes for the server to reply to
the clients request.
Qciient the time that the header of the HTML/image files arrives at the client
machine.
Rc, Load, Client Loading time the time that the browser finished receiving the
HTML/image file.
Resc, Client residence time the queuing delay plus the Server response time for
one visit to the Web server (i.e., a single HTTP request).
Rc, Client response time the network communication latency plus the client
residence time (i.e., end-to-end or last byte latency for one request).
Server throughput the number of completed HTTP requests per unit time
processed by the server.
Ds, Server service demand the amount of system resources consumed by each
client HTTP request. The measurement consists of the following subcomponents:
Ds, cpu the average CPU time for the request.
Ds, Net the average network delay for the request.
This experiment measured the times that are listed in Figure 2.4 to determine the
performance of a Web server. The time measured at different stages represent
different activities at the server. We can then identify which activity is the
bottleneck.
16


2.5 HTTP Protocol
Since 1990, HTTP has been in use by the World Wide Web global information
initiative.
It is an application-level protocol for distributed hypermedia information systems.
It is a standard, connectionless protocol used for hypertext, name servers and
distributed object management systems. The latest specification is referred to as
"HTTP/1.1".
An http server is a front-end server that handles requests from a client. It accepts
connections in order to service requests by sending back responses. HTTP protocol
is a request/response protocol. The http request from a client can be GET, POST,
HEAD, PUT, CONNECT or DELETE. The most common request is GET such as
retrieving a HTML file or a GIF file. A CGI program can also implement the GET
function. The http server can create a process to run a CGI program that generates
an HTML document output. When the http server gets the output, it will send it
back to the client. The http server first gets connection with the client via the http.
Then it parses the request, puts the request in a queue, sends the file and closes the
http connection. It is using one http connection for each request. For example, an
HTML document may have many embedded images. Each embedded object
represents an http request to retrieve the image file. It is not uncommon that one
has an HTML file with ten or more images. It will take ten or more http requests
17


with one http connection per request. It is not an efficient way to open/close http
connection. In the future, the http request in HTTP/1.1 will be using persistent
connection. It will allow multiple transfers of objects over one http connection.
Persistent connections allow sending multiple requests without waiting for a
response. Multiple requests and response can be contained in a single http
segment.
18


3. Related Research
Previous studies of World Wide Web performance have provided many useful
insights into the behavior of the Web. This section describes some of the work
that has been done on Web servers.
3.1 Performance degradation by dynamic pages
The paper by Iyengar [3], examines the performance of Web servers under high
volume of requests. Iyeugar simulated a heavily loaded Web server in conjunction
with workloads which were obtained by analyzing Web logs and performance data
from several real sites. The studies focused on the correlation of request latencies,
rejected requests, and throughputs as a function of workloads when these
quantities were limited by the processing power (CPU) of the server and not by the
network. The probability of the CPU being a bottleneck could be caused by either
the server receiving more requests than it could handle, or the high percentage of
requests for dynamic pages (pages created by programs that were executing on the
Web servers via the Common Gateway Interface (CGI)).
In order to prevent request latencies from becoming too large at peak periods, the
Web server should reject enough requests so that the average load on the system is
less than
19


95 % of the maximum capacity. It is better to refuse requests than allowing a long
delay in processing a request. To improve Web server performance due to the
increasing proportions of dynamic pages, the study had proposed a number of
techniques to reduce the overhead. For example, use Fast CGI instead of the
Common Gateway Interface for invoking server programs. It is a good strategy to
cache frequently accessed dynamic pages so that they do not have to be
regenerated by a server program every time they are requested. Server programs
should be compiled instead of interpreted. Server applications, which need to
access a database, should communicate with a process holding an open connection
to the database. This way, new connections do not have to be established with the
database for each access [3].
3.2 Measurement using custom instrumentation
The paper by Dilley and Friedrich [1] focused on measuring, analyzing and
evaluating the performance of Internet and Intranet Web servers with a goal of
creating capacity planning models.
Due to the dramatic increase in the number of Web requests, and the enormous
amount of data transferred in the network, there is a powerful motivation for good
capacity planning of Web services. Dilley and Friedrich have defined a set of
metrics, which were used it in this research (see Section 2.4.2), to describe the
relationships between Web servers, clients, and the Internets that connect them.
20


Since the client queuing delays (the time at the server prior to the request reaching
the Server pool process) at a Web server cannot be measured directly with server-
side instrumentation [1], Dilley et al. presented the layered queuing models
(LQMs) to estimate it based on the measured server response time, Rs.
Furthermore, they predicted the client response time at the server (Resc) by LQMs
The study showed that client response times were quite sensitive to the number of
servers in the server pool and were more sensitive in environments with high
network latency such as the Internet.
3.3 Performance analysis of workload characteristics
The paper by Martin F. Arlitt [26] presented a detailed workload characterization
study of Internet World-Wide Web servers. The study used access logs of Web
server at six different Web sites: a lab-level Web server at the University of
Waterloo; a department-level Web server at the University of Calgary; a campus-
wide Web server at the University of Saskatchewan; the Web server at NASAs
Kennedy Space Center; the Web server from ClarkNet, a commercial Internet
provider in the Baltimore-Washington, DC region; and the Web server at the
National Center for Supercomputing Applications (NCSA) in Urbana-Champaign,
IL.
The research emphasized on finding the workload characteristics that are common
to all of the six different Web servers. According to their studies, approximately
21


80 90% of the requests to a Web server result in the successful return of a
document (Successful Requests). The distribution of document types are identified
in the research, across the six data sets, HTML and Image documents are
accounted for 90 100% of the total requests to the server. Sound and video
documents are accounted for only 0.01 and 1.2 % of the total requests across the
six different Web servers. File referencing behaviors, one of the common
characteristics, the study recorded 15 40 % of the files being accessed in the log
are accessed only once in the log and most transferred documents appear to be in
the range of 100 100K bytes, with less than 10% larger than 100K bytes. The six
Web server data sets illustrate some Web documents are extremely popular,
accessed frequently and at short intervals by many clients at many sites, while
other documents are accessed rarely. Another common characteristic described in
their research is the geographic distribution of document requests. This means
whether the clients access to the Web server locally or remotely.
The behavior is mainly caused by the usage of the Web server. Some are primarily
used for teaching and research activities for internal students or staffs. Some are
used for public relations or commercial Internet service provider.
The observed workload characteristics were used to identify two possible strategies
(caching designs that reduce network traffic, and caching designs that reduce the
number of requests presented to Internet Web servers) for the design of a caching
22


system to improve Web server performance. Their results showed that caching to
reduce the number of requests is more effective than caching to reduce bytes
transferred.
3.4 Optimal Web server performance
The paper by James C. Hu [8] illustrates how dynamic and static adaptivity can
enhance Web server performance.
James C. Hu [8] [10] showed that optimal Web server performance requires both
static and dynamic adaptive behavior and no single Web server configuration was
optimal for all circumstances. Static adaptations include configuring an event-
dispatching model that is customized for OS platform-specific features (such as the
I/O completion ports). Dynamic adaptations include prioritized request handling,
caching strategies, and threading strategies. He used a blackbox performance
analysis technique. In his research, the results showed that caching is vital to high
performance, but non-adaptive caching strategies did not provide optimal
performance in Web servers. He concluded that high-performance Web servers
must be adaptive. It must be customizable to utilize the most beneficial strategy
for particular traffic characteristics, workload, and hardware/OS platforms. One of
his measurements was the client latency. He defined the latency as the average
amount of delay in milliseconds seen by the clients from the time it sent the
request to the time it completely received the file. In this thesis, the Web server
23


performance was static, there was no adaptive behavior on the Web server. Both a
blackbox and a whitebox performance technique were used to measure the client
latency.
24


4. Measurement Methodology
The measurement methodology used here is related to one of the models that have
been done in previous researches by Dilley and Friedrich [1]. The purpose of the
analysis is to see which stage of an http request takes the longest time at peak
traffic. The Web server that was used is a campus-wide Web server. The Web
server is a Unix machine that stored the Web pages of the University of Colorado
at Denver (CU Denver). It is a Digital 2100 Server A500MP with two 275 MHz
EV-45 4MB b-cache Alpha AXP chips and 384MB RAM. The http server
software is a NCSA httpd vl.5.2a. It is a simple system that has the httpd server,
http document and CGI server all running on a Unix system. The machine is used
for multiple functions by more that 3,500 users (with a limit of 256 interactive
users at any one time).
The research is primarily used to demonstrate the methodology of how to find the
performance bottleneck of a Web server. The technique can be expanded to a
complicated system such that the same time stamp function can be put at different
stages of the system.
25


4.1 Measurement objectives
The following time line shows the breakdown of the stages of a typical http
request. To measure the response time of a client request at different stages, a time
stamp function is added to the httpd source code and the browser source code at
different stages.
Time
A A

Rs
A A
A
li
A
Client Sends Request Request Arrives at Web Server Request at httpd Parsing Complete Processing Complete Sending Complete Client received header Client finished receiving
QncI .Client Qs R SPnrse Rs,Proc Rs.Net Q Net Server Q Client Rc.Loud
The break down of an http request consists of browser and server activities. The
activities on both the browser and the server are measured. When an http
connection starts, the browser records the time Qwet, client- Qs cannot be measured
directly with the server-side instrumentation [1]. So Qs is not recorded in the
measurement.
The time Rs,parse> Rs,Proc> Rs.Net and QNet.server are measured in the httpd server
program Rs,Parse is recorded when the server received the request and begin
parsing the request. The Rs,proc is measured by the time stamp function inside the
26


httpd server program when the request is started processed by the server. R s,Netis
measured when the server finished processing the request. As the result of the
request arrives, the browser records the time Qciiem when it receives the header of
the returned document. And Rc,Load is time stamped when the document is
completely received by the browser. In summary, the browser records the time
QNet, Client* Qclient ^nd Rq .Load.
4.2 Code Changes in Browser
The browser is one of the key components of the experiment. It had been a lengthy
process to make the changes in the browser. In this thesis, time stamp functions
were added in a browser. Time stamps were added at the start and finish of an http
connection. For instance, when a client clicked on a link on a Web page, the
browser invoked an http request to get another Web page. The browser first
established an http connection and sent an http request to the server. The
timestamps were recorded inside the http connection program mkhttp.c. Its
function was to make an http connection. The time, QNet. Client* was recorded by the
timestamp function inside the routine net_send_request in the program mkhttp.c
(see Appendix I). After the http request was received and an HTML file was sent
back to the browser, the browser time stamped the start and finish of receiving the
HTML file inside the routine net_parse_Jirst_http_line and netjprocessHTTP
respectively. These routines were also inside the program mkhttp.c. If an HTML
27


page consists of several embedded images, the http requested invoked by the
browser will log the time of each individual request.
All the above programs were retrieved from Netscape. Netscape has created a
consortium called mozilla.org to provide leadership for the browser source code.
The source code of the browser was downloaded from the Web site
www.mozilla.org. The browser was called next generation layout (NgLayout).
The source codes were available via Concurrent Versions System (CVS). The
browser was designed to be modularized. It has different components to handle
parsing, new HTML version and graphic user interfaces. To timestamp the http
request activities, only the network library of the browser code was changed.
Although the build process of the NgLayout was lengthy, the changed program was
successfully recompiled and created a new version of the browser. The code
changes are attached in the Appendix I.
4.3 Code Changes in HTTP Server
The server that was used in this thesis was using HTTP/1.0. During the
measurement, it was found that the response time of an HTML page with multiple
embedded objects was longer than one with fewer embedded objects because of
the open/close of multiple http connections.
The http server, used in this experiment, had a log that kept track of the traffic of
the Web server. It recorded the time when the http request is completed.
28


However, the log function just gave the response time up to seconds. The
precision was not sufficient to measure the server response time. The log function
inside the httpd server program had been changed. Instead, the function
gettimeofday() was used to get the system time. The precision was upgraded to
microseconds.
Timestamp functions were also added to keep track of the http request at different
stages at the server. The source code of two programs, httpjrequest.c and
http_log.c, were modified. First, a time stamp function was added in the routine
RequestMain. This routine was the main function to handle any http requests.
Whenever there was an incoming http request, this routine would be called by the
httpd server. It terminated until all the http requests were finished. The timestamp
recorded the time when the http request reached the http server (see Appendix G
and H for the source code). Another time stamp function was put inside the
processjrequest routine. The RequestMain called this routine when the http
request has been decoded. This timestamp recorded the time when the http
processed the request after it had been decoded. The third time stamp put in was at
the logjransaction routine inside the program http_log.c. It recorded the time
stamp when the request was processed and a file was ready to send to the browser
by the http server. The fourth timestamp recorded the close of the http connection.
It was the last statement that runs inside the process ^request routine. Two log files
29


were generated: one from the program http_request.c and the other from
httpjog.c.
4.4 Network Latency
Measurements on the browser and the http server were obtained using the time
stamp functions that were added.
The browser can measure the time QNet, client, Q client, and Rc, Load.
The httpd server can measure the time Rs, parse, Rs, Proc, Rs.Net and Qwet, server.
The client response time is the time between an http request sent and finished,
Rc = Rc, Load QNet, Client
The Rs (Server response time) = QNet. server. Rs, Parse
The client queuing delays, T c,Queue = Rs, Parse Qs
The network latency T Net is defined to be the sum of the time that needs to
transmit an http request to the httpd server and the time that the result is sent from
the httpd server to the browser. However, before the http request reaches the httpd,
it has to go through a queuing process. Since the source code was not available,
the time when the request arrived at the Web server could not be found. The
network latency included the time Tc.Queue.
As shown in the time line, the network latency time was calculated by the formula
T Net + T C.Queue = Rc Rs
30


Time

Rs
T
A A

>1
A
Client Sends Request Request Arrives at Web Server Request at httpd Parsing Complete Processing Complete Sending Complete Client received header Client finished receiving
Q Net .Clienl Qs R S,Parse Rs. Proc R S,Nel QNet Server Qciient Rc.Load
4.5 Measurement Procedures
4.5.1 Synchronization
To obtain the measurement as shown in the time line, the clocks of the Web server
and the PC was synchronized. However, with the limit resources available, it was
not possible to get the system times of both machines at exactly the same moment.
As a result, the synchronization was not very accurate. Nevertheless, it was not
very important because the purpose of the synchronization was to make the time
stamp recorded by the browser and httpd to be as close as possible but not
necessary the same. This step made the matching of the logs from the browser and
server easier. For example, when an http request was initiated at the browser, there
should be time stamps at the server too. In order to locate the http request at the
httpd server, the time and the http senders identification were used. If the clock at
the server and the browser were close enough, the log could be easily traced and
the time stamp of the httpd request at the server could be found.
31


To synchronize the clocks at the server and the browser, a program was created on
the PC (see Appendix C). The program established a telnet session to the Unix
machine, then sent a time stamp command that executed on the Unix machine.
The purpose of the time stamp function was to get the Unix system time. At the
same time, the program got the system time of the PC. In theory, if both system
time functions were executed at the same time, the difference between the two
system times could be found. However, in reality the two functions were not
executed at the same moment because of the network latency and the time
difference to execute the time stamp function.
Since the system time on the Unix machine could not be adjusted, so the system
time of the PC was adjusted. The second step to synchronize the clocks was to
calculate the difference between the system time. Another program was used to
obtain the system time of the PC and adjusted the PC time by calling the function
SetSystemTimef).
This step helped matching the http request on the time log in both browser and
httpd server. However, as shown in Section 4.4, the measurement did not need the
PC and Unix system time to be exactly the same.
4.5.2 Data Collection
After the synchronization of the clocks, the following steps were used to start the
measurement:
32


1. The logs on the browser and the http server were cleaned up.
2. The database files of the browser was removed.
3. A request was initiated by putting an URL in the browser.
4. Let the http request transmitted completely until it finished.
5. Gather the logs from the browser and http server.
Dial-up networking, a PC application, was used to dial in to CU Denver via the
public telephone network. Once there was a connection, the browser was started.
The browser from Mozilla did not look the same as a commercial browser. It had a
very primitive layout. The Mozilla browser is shown in Figure 4.1.
33


Figure 4.1: Mozilla browser layout and the CU Denver Main Page.
Prospect rB Students
Admissions
* Curmnl Sludenta
Tuition. Fees. Financial Aid
Office of the Registrar
International Students
Student Services. Government
* AJumni
Mailing Address
PatVma
~ Virtual Tour
* Axiraria Booh Store
'Destination Denver
' Policies
Facts About....
- Information Technology
Employee Organizations
Graduate School Rules
Schools A-Colieoes
Extended Studies. CU-Qnlme
Academic Calendar
Degrees Offered
Now Llrhan Unnmrsile
Announcements
Studenl ActMt.es
'Commencement
' Employment Ooportunlies
'Press Releases
i


A browser database was used for storage of previously retrieved information such
as GIF file and HTML file. Cleaning up the databases ensured that all the
components in the Web page would generate an http connection. When the URL
was put in the browser, the browser first set up an http connection and sent an http
request. At the time the browser program executed the send request routine, it
time stamped the PC time when the request was sent.
The http request was transmitted from the browser to the CU Denver Web server
via the public telephone network. When the request arrived at the Web server, it
34


was time stamped by the RequestMain function. The RequestMain was the main
function when the http request arrived at the httpd server. It marked the arrival
time of the http request at the httpd server, Rs, parse- When the request was finished,
two time stamps (Rs, pr0C, and QNet, server) were logged in the process_request
routine when the process request terminated at the server end. The browser
routines net jparse JirstJxttpJine (Q client) and net_processHTTP (Rc, Load ) time
stamped the time it took to receive the HTML files or GIF files from the server. In
order to get all the components of the Web page, there should not be any interrupts
(such as mouse click or sending another request) when the browser was displaying
the Web page.
Finally, the data in the log files was rearranged to be used for performance analysis
in the Section 5. Some of the raw data in the server log can be found in Appendix
F.
4.6 Scope of the measurement
HTML, image files and CGI are the three components that constitute the majority
of the resource demand by the client.[l][26]. The following Web pages were
selected during the measurements:
1. The CU Denver main page (cudenver.html).
2. An image file that is mostly used by users (aboutcamp.gif).
35


3. Index.html in my own UNIX directory, which is running a CGI program to
create another HTML page dynamically.
4.6.1 CU Denver main page
CU Denver main page (cudenver.html) is the entry point of most of the Web visits.
This is a typical Web retrieval. Cudenver.html is an HTML document with lots of
embedded images (such as GIF files, JPEG files). When the URL was put in the
browser, the browser got the cudenver.html page. Then the browser sent out
several requests to get all the image files that embedded in the main page. The
time to get the cudenver.html and all the subsequent GIF files were recorded. The
file size of the cudenver.html is 9K bytes.
4.6.2 Aboutcamp GIF file
Based on the traffic summary by archive (Figure 4.2), the aboutcamp.gif was
selected as another object of measurement. It was one of the most frequently
accessed files, since it appeared in a lot of Web documents when searching through
the university Web site.
The aboutcamp.gif was one of the embedded objects in the graduation rules HTML
page.
To retrieve the aboutcamp object, the URL for the graduation rules HTML was put
in the browser. The browser first received the index.html under the directory
36


/public/gradrules/. Then it sent out requests to retrieve the embedded image files.
The time to receive the graduation rules index.html and aboutcamp.gif were logged
to the log files. The file size of the aboutcamp.gif is 365 bytes.
Figure 4.2: Hourly traffic by achieves at CU Denver
Traffic Summary by Archive
carbon.cudenver.edu Tue Nov 3 23:58:59 MST1998
Code 404 Not Found
Code301 Moved Permanently
/-m ryde r/g if /to go.g if
r'-mryder/g ifrbluerib.gif
r'saoH.txi
rpublic'libraryrimages/web(l).gif
/publcdjbrary'images/using.gif
/publio'lib ra ry/i ma gears kyli n e.g if
/publc.4ibrary/lmagesrsetvices.gif
/p Lib lic'I ib raryri mag ears earch (1 ).gtf
rp u b Bc'1 ib raryri mages/res ea rch. g if
/pubticrl ib raryri mag ea/p hoto. j pg
rpublb'ii braryrimagesrgeneral.gif
rpublcrlibraiy/lmagearbann er2.jpg
.'publc'library/lmagea'fjannerl .jpg
/public^ ibra lyrima ges'artide.gif
rpublc^ibrary'aloisrjournala. html
rpublc/libraryr
rpublio'countcgi
rTDme'ucd'imagea'students.gif
rhomerucddmages-rsearctigif
/ho me/ucd'i mag ea/ma i n. g 9
rhome/ucd'imagea/help.gif
.4iameruadrimagesrhappenings.gif
rhom a' ucdri ma gearcucity .g if
rhomafijcd/imagearacadefnes.gB
rho m a' ucdrima ges/abo utca m p. gif
ucdrhead-imageaAV elcome.head.gif
e/uodrh ead- imag esrst ud en. head, g B
homeAjcdrhead-imagesrsmhead.gB
e/uodrh ad- imag ears earch. head.gif
>rneruod/h eadHmagesrhelp.Head.gif
'home/ucdrhead-imagesrh eader.jpg
j.'ucd'head-imagei'happ en.head.gif
/ucd'head*imagearacadem.head.gif
n e/ucdrh ea d- imag esra bo ut. head, gif
rhom erlodra pp roved, g if
rhomerlcd.'academics.html
rScroller.ckass
/
0.0 2000.0 4000.0 6000.0 8000.0 10000.0
Requests
37


4.6.3 CGI program
A CGI program was selected as the third object to measure. CGI was first
implemented by NCSA. A CGI is an open standard, simple, language-independent
interface for Web server application. To measure the response time of a CGI
program, a shell script, called calendar.cgi, was chosen. It was used to display an
HTML page to prompt users for month and year to display the calendar of the
month of the desired year. The calendar.cgi was embedded in the index.html
under the directory /dtsai/public_html/. After the retrieval of the index.html,
calendar.cgi ran when a user clicked on the hypertext link. The time to get both
the index.html and calendar.cgi were recorded in the log files. The file size of the
calendar.cgi is 598 bytes.
A new process was created to run the CGI program for each request and the
process was terminated when the request was done. As one can image, the
efficiency was poor for handling multiple requests. The CGI program was a child
process spawned by the http server. It exited after the program was finished.
Another new version of CGI called FastCGI [23] is being developed that will run
in separate and isolated processes. The processes are persistent. They can handle
multiple requests. After finishing a request, a FastCGI process waits for a new
request instead of exiting.
38


4.7 Time of the day to collect data
Based on the statistics collect by the server as shown in Figure 4.3(dated
November 3rd, 1998), the times that have been chosen to collect data were 9 am,
10 am, 1 pm, 3 pm and 6 pm. At 9 am, 10 am, 1 pm and 3 pm, the Internet traffics
were relatively heavy. At
6 pm, the Internet traffic was relatively moderate. The purpose of the measurement
was to compare the response time at different stages under different scenarios.
Both the client response time (Rc) and server response time (Rs) were recorded in
the browser log and server log files respectively. Network latency was computed
by subtracting server response time (Rs) from the client response time (Rc)-
The data collected by the experiment may not follow a pattern. This was because
the server and the network traffic varied from time to time. However, the
techniques that were used to measure the response time for different types of Web
document were still valid.
39


Requests
Figure 4.3: Hourly traffic at University of Colorado at Denver
Hourly Traffic Summary
carbon.ajdenver.edu Tue Nov 3 23:56:39 MST1996
3D0DD.0
20000.0
10000.0
0.0
0 1 2 3 4 5 6 7 6 9 10 11 12 13 14 IS 16 17 16 19 20 21 22 23
Time
40


5. Performance Analysis
The purpose of this analysis is to demonstrate the measurement methodology work
on different types of document such as cudenver.html, one of the most frequently
requested GIF files and a CGI program. The data was collected in a three days
period within the week of February.
5.1 Main page analysis
The followings show the raw data that were collected in both the browser and
server log files. It explains how the client response time, Rc, and the server
response time, Rs, were computed from the log.
For example, the log in the browser contained the following information:
http://www.cudenver.edu/ sending request.
The time is Wed Feb 10 10:15:40.080 1999
http://www.cudenver.edu/ started receiving.
The time is Wed Feb 10 10:15:40.570 1999
http://www.cudenver.edu/ finished receiving.
The time is Wed Feb 10 10:15:42.000 1999
The first two lines showed that the browser was sending out an http request to the
Web server at the time 10:15:40.080. The next two lines showed that the browser
41


was receiving the http header at 10:15.40.570. The last two lines showed that the
browser had finished receiving the cudenver.html file at 10:15:42.000.
The time that the client sends request, QNet,client =10:15.40.080
The time that the client received header, Qciient =10:15:40.570.
The time that the client finished receiving, Rc.Load =10:15:42.000.
When the same http request arrived at the server side, the corresponding http
request log in the http server contained the following information:
(null) 1999 usec=171298 Wed Feb 10 10:15:40
access-105.cudenver.edu /usr/local/lib/httpd/htdocs/ 1999 usec=178134 Wed Feb 10 10:15:40
access-105.cudenver.edu - GET / HTTP/1.1 1999 usec=308017 Wed Feb 10 10:15:40
access-105.cudenver.edu /usr/local/lib/httpd/htdocs/ 1999 usec=308993 Wed Feb 10 10:15:40
The first line showed that the server had just received the new http request sent by
the browser. The URL name was not stored in the log because the http request, at
the server, had an encrypted URL name. Therefore, only a null URL was logged.
The log showed that the time that the http request arrived was 10:15:40.171298.
42


The second line showed the server finished parsing the http request. The first item
indicated where the http request came from. The ''access-105.cudenver.edu was
the name assigned to the connection when dialing in to CU Denver. The
cudenver.html was stored in /usrAocal/lib/httpd/htdocs/. And the time that the http
finished parsing the http request was 10:15:40.178134.
The third line illustrated that the server finished processing the http request. The
server processed the http request as a GET command. It was getting the
cudenver.html. And the time logged was 10:15:40.308017.
The fourth line showed that the server have finished sending the cudenver.html
file. The time was 10:15:40.308993.
The time that the Request at httpd, Rs.parse, =10:15:40.171298.
The time that the Parsing complete, Rs.proc, =10:15:40.178134.
The time that the Processing complete, Rs,Net, =10:15:40.308017.
The time that the Sending complete, QNet,server, Then, the following time can be calculated: =10:15:40.308993.
Client Response Time, Rc = Rc, Load Qnm, client =1.920 seconds
Server Response Time, Rs = Qwet, server.- Rs, Parse =0.137 seconds
Network Latency, T Net+ Tc,QUeue= Rc Rs =1.782 seconds
The client response time was computed based on the browsers log. The server
response time was computed from the server log. And the network time was the
43


difference between the client response time and the server response time. As one
can see, the process of computing the response time was lengthy. More work
should be done to improve the efficiency of the computation.
5.1.1 Time of the day measurement
The details of the compressed data that were collected within those days were
illustrated in Appendix J. The client response time and server response time
graphs for cudenver.html and header.jpg have been plotted, as shown in Figures
5.1 and 5.2. The file sizes are 9K and 1 IK, respectively.
44


Figure 5.1: CU Denver HTML Measurement.
CU Denver HTML
4.000
3.500
3.000
2.500
2.000
1.500
1.000
0.500
0.000
o O O in o o o o o in o
o CO CO o o o CO o CO CO
03 O CO o CO CD o
03 5? 03 CM 03 CM CM CM o CM o CM o CM VJ CM 2/11 CM
Client Response
Time
aServer Response
Time.Rs
Figure 5.2: Header.jpg Measurement.
header.jpg
30.000
25.000
20.000
15.000
10.000
5.000
0.000
o o o in o o o o o LO o
o CO q o o o CO o CO CO
03 o cd d T cd CD d
03 03 03 o
CM 03 CM CM CM o CM o CM CM o CM o CM 2/11 CM
Client Response
Time
Server Response
Time.Rs
45


As the client response time and server response time varied from time to time. We
had to eliminate the high peak times to get the typical response time. In general,
the client response time Rc, for CU Denver home page, was in the range of 1.760s
to 2.090s. The server response time, Rs was in the range of 0.114s to 0.209s. The
network latency, T Net + Tc,Queue was the difference between the client response
time and the server response time. It was in the range of 1.644s to 1.896s.
However, there were two peaks (February 9 at 10:30 am and February 10 at 2 pm)
in the graph of CU Denver main page. And at the same time, the server response
time, Rs was within the normal range.
In the CU Denver Web page there were several image files such as header.jpg,
PEI.jpg and PLl.jpg. At the time of the peak, header.jpg, PEI.jpg and PLl.jpg
suffered the same delay. It took a long time to receive the entire document.
The reason for the peak might be due to several factors. First, as shown in
Appendix E, the server hourly traffic summary, at these times there have been a lot
of requests especially at 2 pm on February 10. Second, the server log file was very
big, this indicated that there were many clients accessing the server at the time
when the measurement was taken. When the number of clients is large, client
queuing delays at the server can increase much more quickly than the server
response time [1]. This explains that, in the graph, while the network latency, T Net
+ Tc,Queue was large, the server response time, Rs was small. And the end-user
46


quality of service (i.e., client response time Rc) was bad. Third, the public
telephony network was used, and the telephone line was using pair gain
technology. It means that even the PC modem is capable to handle 56K, it will
not go beyond 28.8K. The telephone line was connected to the Internet with slow
links. For this kind of clients, the real bottleneck for accessing Web sites is the
client connection to the Internet. If I/O bandwidth at either the server or the client
is a problem, one of the best ways to improve performance is to reduce the average
request document size [3], However, the file size for PEI.jpg and PLl.jpg are
large. They are 42K and 34K respectively. This accounted for the long delays for
receiving those image files. And the average network time to retrieve the
cudenver.html file was about 1.895 seconds. The server processing time was
negligible and was in the range of 100 to 200 milliseconds with an average of 146
milliseconds. The header.jpg, however, took about 7.475 seconds to download.
5.1.2 HTML Document Vs Image Document
For the CU Denver main page, a HTML document with file size of 9K bytes, the
average client response time, Rc was 1.895second. The average server response
time, Rswas 0.146s. The average network latency, T Net+ Tc,Queue was 1.753s.
However, the header.jpg, a JPEG file with file size of 1 IK bytes, the average client
response time, Rc was 7.475s. The average server response time, Rs was 0.102s.
The average network latency, T Net+ Tc.QUeueWas 7.437s.
47


The difference in time was expected since a JPEG file is usually much larger than
a regular HTML file. The client response time in the image file had more than 5-
seconds difference than the HTML file. The server response times for both
documents were in the range of less than 150 milliseconds. It looks like the client
response time for an image file is relatively longer than an HTML file.
Nevertheless, the server response time was relatively small compared with the
network latency time.
5.1.3 Embedded Image Document Measurement
The details of the compressed data, for embedded image files, were illustrated in
Appendix J. The graphs for tv-l.jpg, campus.jpg, leshlaurie.jpg, SAl.jpg, PLl.jpg,
and PEI.jpg have been plotted, as shown in Figure 5.3. The file size are 19K,
25K, 28K, 30K, 34K, and 42K respectively. The followings are the measurement
of the embedded documents:
48


Figure 5.3: Embedded Image Document Measurement
As expected, the client response time increased as the size of the JPEG file
increased. However, the server response time was not very much impacted by the
size. The increase of response time was mainly due to the network latency.
5.2 Most frequently requested GIF file
The measuring procedure was similar to the step that was used for the main page.
The most frequently requested GIF file used in this research was called
aboutcamp.gif. It was one of the icon files used as a menu item for many Web
pages at CU Denver. The file size was only 365 bytes. The aboutcamp.gif could
be obtained by clicking on a graduate rules link on the CU Denver main page. The
49


graduate rules, like other Web pages at the CU Denver site, contained the
aboutcamp.gif menu item. The logs from the browser and the httpd server were
used for computation. Similar to the computation for the main page, the response
time of the gradrules.html and aboutcamp.gif were calculated. The results are
shown in Figures 5.4 and 5.5. The response time was also measured in a three
days period.
5.2.1 Time of the day measurement
The file size of the gradrules.html was about 28K bytes. Both the file
gradrule.html and aboutcamp.gif had small server response times. But the average
client response time of aboutcamp.gif varied from two to three seconds. The
average client response time of gradrule.html was about six seconds. If one
compares the file sizes of gradrule and aboutcamp, one will find that the client
response time of aboutcamp was not proportional. It was relatively long compared
to its small file size. Since the server response time was relatively stable, the
fluctuation of the client response time was mainly due to network latency. It was
obvious that the aboutcamp files response time fluctuated more than the other
files. This might be because many users requested the most frequently requested
image files at the same time. The aboutcamp.gif tile was shared by a lot of users.
Another observation was the peak of network latency. The difference between
client and server response time was the network latency. The two network
50


latencies of gradrule (Figure 5.4) and aboutcamp.gif (Figure 5.5) showed the same
peak time at around 10:30am. This might be due to the temporary congestion of
the network at that time frame. If requests at a certain moment jump up, the long
queuing time will cause a peak at the client response time.
Figure 5.4: GradRules HTML Document Measurement.
51


Figure 5.5: Aboutcamp.gif Document Measurement.
5.3 CGI program
The same technique and similar steps was used to measure a CGI program. The
time stamp program in the browser worked the same. The time stamp program in
the http server time stamped before a CGI process was spawned. The only
difference was the server processing time was longer. The index.html file
contained an http link that started the CGI program. The CGI program was a
simple shell script that returned a small size of HTML page and asked for the input
(see Appendix B). The CGI program inside the shell script generated the returned
HTML page. When the CGI program was compared with the previous two files,
the server processing time of the CGI was relatively high. It took about 0.5 second
52


to execute the CGI program. The measured result matched with the expectation.
Since the server processed a CGI program, it was obvious that the server response
time would be longer. To improve the response time of a CGI program, the server
processing power should be improved. The file size of the index.html is 2K bytes.
The file size of the CGI file is 598 bytes. The CGI program was also measured
within a period of three days.
5.3.1 Time of the day measurement
As shown in Figures 5.6 and 5.7, the client response time of the index.html was
relatively short. It was in the range of 0.8 to one second. The client response time
of the CGI program was also relatively short. It was in the range of one to two
seconds. One of the observations was the client response time of the CGI program
was directly related to the server response time. Unlike the other kinds of files, the
server response time of the CGI program was relatively high and fluctuated more.
It was in coincidence with the fact that the server needed to process the CGI
program. The CGI program needed to have more processing power from the
server.
The experiment demonstrated that the measurement methodologies could be used
to measure the different http requests: HTML, GIF and CGI files. The average and
the actual response time of the http request files can be found. Naturally, the larger
the file sizes, the longer the client response times.
53


Figure 5.6: Index.html Measurement.
Index.html
5?
CVJ
0) O) 0)
CVI CVJ CVJ
o
CVJ
o
CVJ
CVJ CVJ
CVJ
O O o
CM
CM
Client Response
Time
* Server Response
Time.Rs
54


Figure 5.7: CGI Document Measurement.
CGI Response Time
3.500
3.000
2.500
2.000
1.500
1.000
0.500
0.000
o o O o o LO o O O o O
o o O o o CO o CO o
o CO cb o o T CO cb o ^-1
T T i
05 CM 05 05 05 o o Q
CM CM CM o o CM CM CM CM
CM CM CM
Client Response
Time
Server Response
Time.Rs
55


6. Future Research
The experiment conducted in this thesis has been very successful. The
methodology was able to measure the major components of a Web page. To
continue the research, the network traffic can be further broken down to lower
levels such as time spent over the telephone network and routing delay. We can
apply the technique to a larger system such that the server can reside on different
machines. John Dilley et al.[l] have performed a similar measurement. They
describe that they can use a mathematics model to estimate client response time at
a Web server. The model predicts the impact on server and client response times
as a function of network topology and Web server pool size.
In this experiment, an Intranet setting was used to measure the response time. We
can expand the measurement by accessing the Web page via an Internet service
provider. We expect that the response time will be longer because the path to
reach the Web site is longer. There are more dependencies when an http request
reaches the Web server.
At the time the measurements were conducted, the browser was not able to
measure the response time of Java programs. The feature has not been added to the
Mozilla. Many programmers throughout the world are still working on improving
the Mozilla browser. In the future, we will be able to add more features in the
56


browser to measure the client response time of Java programs or 3D HTML.
Another measurement of interest would be the audio and video files.
57


7. Conclusion
This thesis has presented a methodology of measuring the performance of Web
servers. The techniques have been used to measure the client response times and
the server response times of HTML pages, image files and CGI programs. The
network latency can be computed from the client response time and server
response time of the above research objects. Although the response time varied
from time to time, the average response time of a particular Web page can still be
found.
As expected, both the network latency and the client response time were
proportional to the size of a document. The bigger the HTML file size, the longer
the network latency and the client response time were. The image files had the
longest client response time (including network latency). The server response
time was not very much impacted by the size for both HTML and image files. The
server processing time was relatively high for CGI programs. It was obvious since
the server was required to invoke the program through CGI. Some extra
processing time was added to process a CGI request. A frequently accessed file
could have a lot of fluctuation in the client response time that included the network
latency. This can be due to the resource was shared by too many clients or the
queuing time of the http request before reaching the server.
58


The browser with the time stamp function can be used to measure any Web sites.
We are able to measure the client response time by using the browser. The only
difference is that the server response time may be unavailable if we cannot get the
measurement on the server side. Nevertheless, we can still use the browser as a
measurement tool to measure the client response time of different components of a
Web site. Although the measurement methodology is still a lengthy process, we
can change the source code to make the measurement process more user-friendly.
59


8. Acknowledgement
I would like to thank my thesis supervisor, Professor Tom Altman for his help and
advice to make the thesis possible. I would like to extend my thanks to Dr. Randy
Hagan for his help and advice for putting up the httpd server software on the
carbon machine. Without his help, none of the experiments in this thesis would be
possible. I would like to thank the thesis committee, Professor Mike Radenkovic
and Professor Boris Stilman for their efforts to review my thesis. In addition, this
thesis will not be made possible without the open source codes of Mozilla and
NCSA http server. I would like to thank all those who have contributed to the
open source codes of Mozilla and NCSA http server.
60


Appendix A. Source code for getting the Unix time
/* Timel.C: This program gets the current
* time in time_t form, then uses ctime to display the time in string
form. */
#include
#include
FILE *stream;
void timestamp(char msg)
{ struct timeval ltime;
gettimeofday( <ime,NULL);
fprintf( stream,"%s %s.%d\n",msg, ctime( <ime.tv_sec ),ltime.tv_usec );
fflush(stream);
}
void main(void)
{
stream=fopen("tsailogl","a");
timestampC'calendar");
fclose(stream);
}
61


Appendix B. Source code for the CGI program
#!/bin/sh
CAL=/bin/cal
echo Content-type: text/html
echo
if [ -x $CAL ]; then
if [ $# = 0 ]; then
cat EOM
Calendar
Calendar

To look up a calendar month, type the month followed by a space then the
year.


Example: 3 1993 would give the calendar for March 1993.
EOM
else
echo \
$CAL $*
echo \

fi
else
echo Cannot find cal on this system,
fi
cat EOM

EOM
62


Appendix C. Source code for Telnet program
// telnet.cpp : Defines the class behaviors for the application.
//
include "stdafx.h"
include "telnet.h"
include "sox.hpp"
include
include
include "MainFrm.h"
include "telnetDoc.h"
include telnetview.h"
ifdef _DEBUG
define new DEBUG_NEW
undef THIS_FILE
static char THIS_FILE[] = ___FILE__;
endif
/////////////////////////////////////////////////////////////////////////////
// CTelnetApp
BEGIN_MESSAGE_MAP(CTelnetApp, CWinApp)
//{{AFX_MSG_MAP(CTelnetApp)
ON_COMMAND(ID_APP_ABOUT, OnAppAbout)
// NOTE the ClassWizard will add and remove mapping macros here.
// DO NOT EDIT what you see in these blocks of generated code!
//} }AFX_MSG_MAP
// Standard file based document commands
ON_COMMAND (ID_FILE_NEW, CWinApp: : OnFileNew)
ON_COMMAND (ID_FILE_OPEN, CWinApp: : OnFileOpen)
// Standard print setup command
ON_COMMAND(ID_FILE_PRINT_SETUP, CWinApp::OnFilePrintSetup)
END_MESSAGE_MAP()
/////////////////////////////////////////////////////////////////////////////
// CTelnetApp construction
CTelnetApp::CTelnetApp()
{
// TODO: add construction code here,
// Place all significant initialization in Initlnstance
}
/////////////////////////////////////////////////////////////////////////////
// The one and only CTelnetApp object
CTelnetApp theApp;
/////////////////////////////////////////////////////////////////////////////
// CTelnetApp initialization
int timestamp(){
struct _timeb timebuffer;
char timeline;
FILE stream;
stream = fopen( "time.txt", "a" );
_ftime( Sctimebuffer );
timeline = ctime( & ( timebuffer.time j );
63


fprintf( stream,"The time is %.19s.%.3hu %s", timeline, timebuffer.millitm,
&timeline[20] );
fclose (stream) ;
return 0;
BOOL ConnectToUnix(CTelnetSessionk cTelnet, CString& cstrMsg)
{
CString hostname, uid, pwd;
BOOL rb = TRUE;
hostname="132.194.10.4";
uid = "dtsai\n";
pwd = "shann4on\n";
if( IcTelnet.Connect(hostname,uid,pwd) )
{
cstrMsg.Format("Connect to host %s failed", hostname);
rb = FALSE;
}
return rb;
}
BOOL ExecuteFunc(CTelnetSessionk cTelnet, CString& cstrMsg, CStringSc cstrCmd)
{
BOOL rb = TRUE;
CString cstrLogFile="dummy.log;
cTelnet.m_LogFile = cstrLogFile;
Sleep(10000);
cTelnet.Send! cstrCmd );
timestamp () ,-
Sleep(20000);
return TRUE;
}
BOOL sync_time(LPCSTR cstrParams, CString& cstrMsg)
{
CTelnetSession cTelnet;
BOOL rb = TRUE;
if( !ConnectToUnix(cTelnet, cstrMsg) )
(
return FALSE;
}
CString cmdBuffer;
// 1 try the octel path
cmdBuffer.Format("%s\n", cstrParams);
rb = ExecuteFunc(cTelnet, cstrMsg, cmdBuffer);
cTelnet.Disconnect();
return rb;
}
BOOL CTelnetApp::Initlnstance()
{
AfxEnableControlContainer();
// Standard initialization
64


// If you are not using these features and wish to reduce the size
// of your final executable, you should remove from the following
// the specific initialization routines you do not need.
// Change the registry key under which our settings are stored.
// TODO: You should modify this string to be something appropriate
// such as the name of your company or organization.
SetRegistryKey(_T("Local AppWizard-Generated Applications")) ,-
LoadStdProfileSettings(); // Load standard INI file options (including
MRU)
// Register the application's document templates. Document templates
// serve as the connection between documents, frame windows and views.
CSingleDocTemplate* pDocTemplate;
pDocTemplate = new CSingleDocTemplate(
IDR_MAINFRAME,
RUNTIME_CLASS(CTelnetDoc) ,
RUNTIME_CLASS(CMainFrame) , // main SDI frame window
RUNTIME_CLASS(CTelnetView));
AddDocTemplate(pDocTemplate);
// Parse command line for standard shell commands, DDE, file open
CCommandLinelnfo cmdlnfo;
ParseCommandLine(cmdlnfo);
// Dispatch commands specified on the command line
if (!ProcessShellCommand(cmdlnfo))
return FALSE;
// The one and only window has been initialized, so show and update it.
m_pMainWnd->ShowWindow(SW_SHOW);
m_pMainWnd->UpdateWindow() ,-
CString dummy;
sync_time ("timestamp",dummy) ,-
return TRUE;
/////////////////////////////////////////////////////////////////////////////
// CTelnetApp message handlers
65


Appendix D. CU Denver Network Configuration

____________I urbirii*^
^'T32.t94iC0.0/24''v. z
^-204.131.62.28/3fl..-^ f
132.194.52.0/24
132.194.55.0/24
132.194.56.0/24
132.194.57.0/24
132.194.60.0/24
132.194.61.0/24
132.194.69.0/24
132.194.80.0/24
132.194.81.0/24
132.194.84.0/24
132.194.85.0/24
132.194.86.0/24
132.194.87.0/24
51300-51309
66


Appendix E. Traffic Summary of CU Denver Web Server
The hourly traffic summary was obtained for the previous day. Feb 10 chart
printed the traffic on Feb 9.
Hourly Traffle Summary
carbon.cudenver.edu Wed Feb 10 00:1153 MST1999
17 Id 1920 21 23 23
Hourly Traffle Summary
ca'bon.cudenver.edu Thu Feb 11 00:17
67


noquwsiB nequusas
Hourly Traffic Summary
carbon.cudenver.edu Fri Feb 12 00:23:52 MST1999
isoooao
10D000.D
5D000.0
0.0
0 1 2 3 4 5 6 7 6 9 10 11 12 13 141S16 1716 19 20 21 22 23
Time
Daily Traffle Summary of Feb
68


Appendix F. Sample log in the http server
crquiz.uchsc.edu - GET /home/ucd/images/TV-1.jpg HTTP/1.1
usec=935946 Wed Feb 10 13:30:57 1999
wcad234.mscd.edu - GET /public/library/alois/journals.html
HTTP/1.0 usec=430087 Wed Feb 10 13:30:58 1999
crquiz.uchsc.edu - GET /home/ucd/head-images/about.head.gif
HTTP/1.1 usec=520907 Wed Feb 10 13:30:58 1999
crquiz.uchsc.edu - GET /home/ucd/head-images/happen.head.gif
HTTP/1.1 usec=078525 Wed Feb 10 13:30:59 1999
cache-db06.proxy.aol.com - GET /scroll.txt HTTP/1.0 usec=319735
Wed Feb 10 13:30:59 1999
crquiz.uchsc.edu - GET /home/ucd/head-images/help.head.gif
HTTP/1.1 usec=595126 Wed Feb 10 13:30:59 1999
wcad234.mscd.edu - GET /public/library/alois/general.html
HTTP/1.0 usec=155673 Wed Feb 10 13:31:00 1999
cache-dd05.proxy.aol.com - GET /public/admiss/criteria.html
HTTP/1.0 usec=290439 Wed Feb 10 13:31:00 1999
crquiz.uchsc.edu - GET /scroll.txt HTTP/1.1 usec=385165 Wed Feb
10 13:31:00 1999
crquiz.uchsc.edu - GET /home/ucd/images/cucity.gif HTTP/1.1
usec=400790 Wed Feb 10 13:31:00 1999
crquiz.uchsc.edu - GET /home/ucd/approved.gif HTTP/1.1
usec=739657 Wed Feb 10 13:31:00 1999
asp29.colorado.edu - GET /home/ucd/aboutcamp.html HTTP/1.0
usec=758212 Wed Feb 10 13:31:00 1999
access-137.cudenver.edu - GET / HTTP/1.1 usec=782626 Wed Feb 10
13:31:00 1999
asp29.colorado.edu - GET /home/ucd/images/students.gif HTTP/1.0
usec=517001 Wed Feb 10 13:31:01 1999
asp29.colorado.edu - GET /home/ucd/head-images/header.jpg
HTTP/1.0 usec=517978 Wed Feb 10 13:31:01 1999
asp29.colorado.edu - GET /home/ucd/images/academics.gif HTTP/1.0
usec=530673 Wed Feb 10 13:31:01 1999
69


asp29.colorado.edu - GET /home/ucd/head-images/about.head.gif
HTTP/1.0 usec=267001 Wed Feb 10 13:31:02 1999
asp29.colorado.edu - GET /home/ucd/images/happenings.gif
HTTP/1.0 usec=272860 Wed Feb 10 13:31:02 1999
asp29.colorado.edu - GET /home/ucd/images/main.gif HTTP/1.0
usec=278720 Wed Feb 10 13:31:02 1999
crquiz.uchsc.edu - GET /home/ucd/head-images/search.head.gif
HTTP/1.1 usec=276767 Wed Feb 10 13:31:02 1999
asp29.colorado.edu - GET /home/ucd/images/help.gif HTTP/1.0
usec=292392 Wed Feb 10 13:31:02 1999
asp29.colorado.edu - GET /home/ucd/images/search.gif HTTP/1.0
usec=416415 Wed Feb 10 13:31:02 1999
cache-dclO.proxy.aol.com - GET /home/ucd/images/students.gif
HTTP/1.0 usec=611728 Wed Feb 10 13:31:02 1999
cache-db05.proxy.aol.com - GET /home/ucd/images/academics.gif
HTTP/1.0 usec=661532 Wed Feb 10 13:31:02 1999
access-137.cudenver.edu - GET /home/ucd/head-images/header.jpg
HTTP/1.1 usec=866610 Wed Feb 10 13:31:02 1999
access-137.cudenver.edu - GET /home/ucd/head-
images/studen.head.gif HTTP/1.1 usec=932040 Wed Feb 10 13:31:02
1999
access-137.cudenver.edu - GET /home/ucd/head-
images/welcome.head.gif HTTP/1.1 usec=993564 Wed Feb 10 13:31:02
1999
access-137.cudenver.edu - GET /home/ucd/head-
images/academ.head.gif HTTP/1.1 usec=030673 Wed Feb 10 13:31:03
1999
cache-dc05.proxy.aol.com - GET /home/ucd/images/main.gif
HTTP/1.0 usec=291415 Wed Feb 10 13:31:04 1999
tcnj-88-237.tcnj.edu - GET /~mryder/itc_data/luddite.html
HTTP/1.0 usec=806064 Wed Feb 10 13:31:04 1999
cache-da06.proxy.aol.com - GET /home/ucd/images/help.gif
HTTP/1.0 usec=615634 Wed Feb 10 13:31:05 1999
blrprxl.mediaone.com - GET /public/biology/BioMASTER.html
HTTP/1.0 usec=729892 Wed Feb 10 13:31:05 1999
cache-dc05.proxy.aol.com - GET /home/ucd/images/happenings.gif
HTTP/1.0 usec=724032 Wed Feb 10 13:31:06 1999
70


cache-db06.proxy.aol.com - GET /home/ucd/head-images/smhead.gif
HTTP/1.0 usec=291415 Wed Feb 10 13:31:07 1999
crquiz.uchsc.edu - GET /home/ucd/academics.html HTTP/1.1
usec=400790 Wed Feb 10 13:31:07 1999
access-137.cudenver.edu - GET /home/ucd/approved.gif HTTP/1.1
usec=928134 Wed Feb 10 13:31:07 1999
access-137.cudenver.edu - GET /home/ucd/images/cucity.gif
HTTP/1.1 usec=016025 Wed Feb 10 13:31:08 1999
access-137.cudenver.edu - GET /home/ucd/head-
images/search.head.gif HTTP/1.1 usec=428134 Wed Feb 10 13:31:08
1999
crquiz.uchsc.edu - GET /home/ucd/images/aboutcamp.gif HTTP/1.1
usec=509189 Wed Feb 10 13:31:08 1999
71


Appendix G. HTTP Server source code-log_transaction.c
void log_transaction(per_request *reqInfo)
{
char *str;
char *tempstr;
long timz;
struct tm *t;
char *tstr,sign;
#ifdef LOG_DURATION
extern time_t request_time;
time_t duration = request_time ? (time(NULL) request_time) : 0;
tendif /* LOG_DURATION */
str = newString(HUGE_STRING_LEN,STR_TMP);
tstr = newString(MAX_STRING_LEN,STR_TMP);
tempstr = newString(HUGE_STRING_LEN,STR_TMP);
t = get_gmtoff(fctimz);
sign = (timz < 0 ? : + ');
if(timz < 0)
timz = -timz;
strftime(tstr,MAX_STRING_LEN,"%d/%b/%Y:%H:%M:%S", t) ;
sprintf(str,"%s %s %s [%s %c%021d%02d] \"%s\" ",
(reqlnfo->remote_name ? reqlnfo->remote_name :
(do_rfc931 ? remote_logname :
(reqlnfo->auth_user[0] ? reqInfo->auth_user :
tstr,
sign,
timz/3600,
timz%3600,
the_request);
sprintf(tempstr,"%s %s %s %s ",
(reqlnfo->remote_name ? reqInfo->remote_name ;
(do_rfc931 ? remote_logname :
(reqInfo->auth_user(0] ? reqInfo->auth_user :
the_request);
timestamp(tempstr);
if(reqInfo->status != -1)
sprintf(str,"%s%d ",str,reqlnfo->status);
else
strcat (str, ") ,-
if(reqlnfo->bytes_sent != -1)
sprintf (str, "%s%ld", str,reqlnfo->bytes_sent) ,-
else
strcat(str,) ;
if (reqlnfo->hostInfo->log_opts & LOG_SERVERNAME) {
if (reqlnfo->hostInfo->server_hostname)
sprintf(str,"%s %s",str,reqInfo->hostInfo->server_hostname);
else
strcat(str,"
)
if (reqInfo->hostInfo->referer_ignore && reqlnfo->inh_referer[0]) {
char *strl;
int blgnore = 0;
strl = newString(MAX_STRING_LEN,STR_TMP);
lim_strcpy(strl, reqlnfo->hostlnfo->referer_ignore, 255);
72


if (reqlnfo->hostlnfo->referer_ignore[0]) {
char* tok = strtok (strl, ");
while (tok) {
if (strstr(reqInfo->inh_referer, tok)) {
blgnore = 1;
break;
}
tok = strtok (NULL, ");
}
)
if (blgnore) {
reqInfo->inh_referer[0] = '\0';
)
freeString(strl) ;
}
#ifdef LOG_DURATION
sprintf(str+strlen(str), %ld", duration);
#endif /* LOG_DURATION */
if (!(reqInfo->hostInfo->log_opts & LOG_COMBINED)) {
strcat(str,"\n");
write(reqlnfo->hostlnfo->xfer_log,str,strlen(str) ) ;
/* log the user agent */
if (reqInfo->inh_agent[0]) {
if (reqlnfo->hostlnfo->log_opts & LOG_DATE)
fprintf(reqlnfo->hostInfo->agent_log, "[%s] %s\n",tstr,
reqInfo->inh_agent) ,-
else
fprintf(reqlnfo->hostlnfo->agent_log, "%s\n", reqlnfo->inh_agent);
fflush(reqlnfo->hostlnfo->agent_log);
}
/* log the referer */
if (req!nfo->inh_referer[0]) {
if (reqlnfo->hostlnfo->log_opts & LOG_DATE)
fprintf(reqlnfo->hostInfo->referer_log, "[%s] %s -> %s\n,tstr,
reqlnfo->inh_referer, reqlnfo->url);
else
fprintf(reqlnfo->hostlnfo->referer_log, %s -> %s\n",
reqlnfo->inh_referer, reqlnfo->url);
fflush(reqlnfo->hostlnfo->referer_log);
)
) else {
if (reqInfo->inh_referer[0])
sprintf(str,"%s \"%s\"",str,reqlnfo->inh_referer);
else
strcat(str," V\") ;
if (reqlnfo->inh_agent[0])
sprintf(str,4s \"%s\"\n",str,reqlnfo->inh_agent);
else
strcatlstr," \"\"\n");
write(reqlnfo->hostlnfo->xfer_log,str,strlen(str));
}
freeString (str) ,-
freeString(tstr);
freeString(tempstr);
73


Appendix H. HTTP Server source code http_request.c
void timestamp2(char msg)
{ struct timeval ltime;
gettimeofday( <ime,NULL );
fprintff logstream2,"%s usec=%.6d %s\n",msg,ltime.tv_usec,ctime(
<ime.tv_sec ) );
fflush(logstream2);
}
void process_request(per_request reqlnfo)
{
int s;
s = translate_name(reqlnfo,reqlnfo->url,reqlnfo->filename);
timestamp2(reqlnfo->filename);
switch(s) {
case A_STD_DOCUMENT:
if ((reqlnfo->method == M_HEAD) &&
(reqlnfo->http_version == P_HTTP_0_9)) {
reqlnfo->method = M_GET;
die(reqlnfo,SC_BAD_REQUEST,"Invalid HTTP/0.9 method.");
}
send_node(reqlnfo);
break;
case A_REDIRECT_TEMP:
die(reqlnfo,SC_REDIRECT_TEMP,reqInfo->filename);
break;
case A_REDIRECT_PERM:
die(reqInfo,SC_REDIRECT_PERM,reqInfo->filename);
break;
case A_SCRIPT_CGI:
exec_cgi_script(reqlnfo);
break;
#ifdef FCGI_SUPPORT
case A_SCRIPT_FCGI:
FastCgiHandler(reqlnfo);
break;
#endif /* FCGI_SUPPORT */
)
timestamp2(reqlnfo->filename);
}
void RequestMain(per_request reqlnfo)
{
int options = 0;
signal(SIGPIPE,send_fd_timed_out);
#ifdef LOG_DURATION
request_time = 0;
#endif /* LOG_DURATION */
if (!logstream2) logstream2=fopen("/httpd/logs/projlog2","a");
if (reqlnfo->sb == NULL) {
reqInfo->sb = new_sock_buf(reqlnfo,reqInfo->in);
sockbuf_count++;
)
if (getline(reqInfo->sb, as_requested, HUGE_STRING_LEN,
options, timeout) == -1)
return;
74


i f(!as_requested[0])
return;
tifdef LOG_DURATION
request_time = time(NULL);
#endif /* LOG_DURATION */
strcpy(the_request, as_requested);
#ifdef SETPROCTITLE
setproctitle(the_request);
#endif /* SETPROCTITLE */
timestamp2(reqInfo->filename);
decode_request(reqlnfo, as_requested);
unescape_url(reqInfo->url);
/* Moved this to later so we can read the headers first for HTTP/1.1
* Host: support
*/
which_host_conf(reqlnfo);
if (reqInfo->dns_host_lookup == FALSE) {
/* Only when we haven't done DNS do we call get_remote_host().
* If we aren't supposed to, get_remote_host() will not do it.
*/
get_remote_host(reqlnfo);
)
process_request(reqlnfo);
if (!logstream2) {
fflush(logstream2) ;
fclose(logstream2);
)
}
75


Appendix I. Browser Mozilla source code
PUBLIC
int timestamp(char url,char mode){
struct _timeb timebuffer;
char timeline;
FILE stream;
stream = fopen( "time.txt", "a" );
_ftime( Stimebuffer );
timeline = ctime( & ( timebuffer.time ) );
fprintf( stream,"%s %s.\nThe time is %.19s.%.3hu %s, url, mode,timeline,
timebuffer.millitm, ktimeline[20] );
fclose(stream);
return 0;
)
/* send_http_request makes a connection to the http server
* and sends a request (ethier httpl or http/.9) to the server
*
* returns the TCP status code
*/
#define SOCK_CHUNK_SIZE 1024
PRIVATE int
net_send_http_request (ActiveEntry cel
{
/* assign it so that we have a typed structure pointer */
HTTPConData cd = (HTTPConData *)ce->con_data;
char command=0;
unsigned int command_size;
/we make the assumption that we are always acting as a proxy
* server if we get http_headers passed in the URL structure.
*
* when acting as a proxy_server the clients headers are sent
* to the server instead of being generated here and the headers
* from the server are unparsed and sent on the the client.
*/
if (ce->format_OUt == FO_OUT_TO_PROXY_CLIENT
|| ce->format_out == FO_CACHE_AND_OUT_TO_PROXY_CLIENT)
cd->ac t ing_as_proxy=YES;
TRACEMSG(("Entered \"send_http_request\"") ) ;
/* we make the assumption that we will always use the POST
* method if there is post data in the URL structure
* Is that a bad assumption?
*/
if(ce->URL_s->method == URL_POST_METHOD
| I Ce->URL_S ->method == URL_PUT_METHOD) {
cd->posting = YES;
(tifdef MOZILLA_CLIENT
SECNAV_Posting(cd->connection->sock) ,-
#endif /* MOZILLA_CLIENT */
) else if (ce->URL_s->method == URL_HEAD_METHOD) {
#ifdef MOZILLA_CLIENT
SECNAV_HTTPHead(cd->connection->sock);
#endif /* MOZILLA_CLIENT */
}
/* zero out associated mime fields in URL structure
* all except content_length since that may be passed in */
ce->URL_s->protection_template = 0;
76


PR_FREEIF (ce->URL_s->redirecting_url) ,-
ce->URL_s->redirecting_url = NULL;
PR_FREEIF(ce->URL_s->authenticate) ;
ce->URL_s->authenticate = NULL;
PR_FREEIF(ce->URL_s->proxy_authenticate) ;
ce->URL_s->proxy_authenticate = NULL;
/* Build the request command. (It must be free'd!!!)
*
* build_http_request will assemble a string containing
* the main request line, HTTP/1.0 MIME headers and
* the post data (if it exits)
*/
command_size = net_build_http_request(ce->URL_s,
ce->format_out,
cd,
ce->window_id,
Secommand) ;
TRACEMSG ( ( Sending HTTP Request :\n-------------------------------"));
TIMING_STARTCLOCK_NAME("http:xfer", ce->URL_s->address);
TIMING_STARTCLOCK_NAME("httpireq", ce->URL_s->address);
ce->status = (int) NET_BlockingWrite(cd->connection->sock, command,
command_size);
timestamp(ce->URL_s->address,"\t started sending request");
#if defined(JAVA)
#if defined(DEBUG)
if(MKLib_trace_flag) {
NET_NTrace("Tx: ", 4);
NET_NTrace(command, command_size);
)
#endif /* DEBUG */
Sendif /* JAVA */
PR_Free (command); /* freeing the request */
if (ce->status < 0) {
int err = PR_GetError();
if(err == PR_WOULD_BLOCK_ERROR)
return(l); /* continue */
ce->URL_s->error_msg = NET_ExplainErrorDetai1s(MK_TCP_WRITE_ERROR, err)
cd->next_state= HTTP_ERROR_DONE ;
return(MK_TCP_WRITE_ERROR);
}
/* make sure these are empty
*/
PR_FREEIF(cd->line_buffer); /* reset */
cd->line_buffer = NULL;
cd->line_buffer_size=0; /* reset */
if(cd->posting && ce->URL_s->post_data) {
NET_ClearReadSelect(ce->window_id, cd->connection->sock);
NET_SetConnectSelect(ce->window_id, cd->connection->sock);
#ifdef XP_WIN
cd->calling_netlib_all_the_time = TRUE;
NET_SetCallNetlibAllTheTime(ce->window_id, "mkhttp");
#endif /* XP_WIN */
ce->con_sock = cd->connection->sock;
cd->next_state = HTTP_SEND_POST_DATA;
77


return(0);
}
cd->next_state = HTTP_PARSE_FIRST_LINE;
/* Don't pause for read any more because we need to do
* at least a single read right away to detect if the
* connection is bad. Apparently windows will queue a
* write and not return a failure, but in fact the
* connection has already been closed by the server.
* Windows select will not even detect the exception
* so we end up deadlocked. By doing one read we
* can detect errors immediately
*
* cd->pause_for_read = TRUE;
*/
{
char nonProxyHost = NET_ParseURL(ce->URL_s->address, GET_HOST_PART);
if (nonProxyHost) {
char* msg = PR_smprintf(XP_GetString(XP_PROGRESS_WAIT_REPLY),
nonProxyHost);
if (msg) {
#if defined(SMOOTH_PROGRESS) && !defined(MODULAR_NETLIB)
PM_Status(ce->window_id, ce->URL_s, msg);
#else
NET_Progress(ce->window_id, msg);
#endif
PR_Free(msg);
}
PR_Free(nonProxyHost);
}
}
return STATUS(ce->status);
) /* end of net_send_http_request */
/* parses the first line of the http server's response
* and determines if it is an httpl or http/.9 server
*
* returns the tcp status code
*/
PRIVATE int
net_parse_first_http_line (ActiveEntry *ce) {
char *line_feed=0;
char *ptr;
int line_size;
int num_fields;
char server_version[36]
HTTPConData cd = (HTTPConData *)ce->con_data;
char small_buf[MAX_FIRST_LINE_SIZE+16];
Bool do_redirect = TRUE;
TRACEMSG(("Entered: net_parse_first_http_line));
ce->status = PR_Read(cd->connection->sock, small_buf, MAX_FIRST_LINE_SIZE+10);
if(ce->status == 0) {
/* the server dropped the connection? */
ce->URL_s->error_msg = NET_ExplainErrorDetails(MK_ZERO_LENGTH_FILE);
cd->next_state = HTTP_ERR0R_D0NE;
cd->pause_for_read = FALSE;
return(MK_ZERO_LENGTH_FILE);
) else if(ce->status < 0) {
int s_error = PR_GetError();
78


if (s_error == PR_WOULD_BLOCK_ERROR) {
cd->pause_for_read = TRUE;
return(0);
} else {
ce->URL_s->error_msg = NET_ExplainErrorDetai1s(MK_TCP_READ__ERROR,
s_error);
/* return TCP error */
return MK_TCP_READ_ERROR;
}
}
HG21899
#ifdef MOZILLA_CLIENT
HG19088
#endif /* MOZILLA__CLIENT */
/* ce->status greater than 0 */
TIMING_STOPCLOCK_NAME("http:req", ce->URL_s->address,
ce->window_id, "response received");
BlockAllocCat(cd->line_buffer, cd->line_buffer_size, small_buf, ce->status);
cd->line_buffer_size += ce->status;
for(ptr = cd->line_buffer, line_size=0; line_size < cd->line_buffer_size;
ptr++, line_size++)
if(*ptr == LF) {
line_feed = ptr;
break;
)
/* assume HTTP/.9 until we know better */
cd->protocol_version = POINT_NINE;
if(line_feed) {
*server_version = 0;
*line_feed = '\0';
num_fields = sscanf(cd->line_buffer, "%20s %d", server_version, tce->URL_s-
>server_status);
TRACEMSG(("HTTP: Scanned %d fields from first_line: %s", num_fields, cd-
>line_buffer));
/* Try and make sure this is an HTTP/1.0 reply */
if (num_fields == 2 || !PL_stmcmp("HTTP/", server_version, 5))
{
double ver = atof(server_version+5);
if(ver >1.0) {
/* HTTP1.1 */
cd->protocol_version = ONE_POINT_ONE;
net_setup_httpll_defaults(ce);
PR_ASSERT(ver == 1.1);
} else {
/* HTTP1 */
cd->protocol_version = 0NE_P0INT_0;
PR_ASSERT(ver ==1.0 || ver == 0.0); /* allow 0 bug */
}
)
/* put the line back the way it should be
79


*/
*line_feed = LF;
} else if(cd->line_bu£fer_size < MAX_FIRST_LINE_SIZE) {
return(0); /* not ready to process */
}
if(cd->connection->prev_cache && cd->protocol_version == POINT_NINE)
{
/* got a non HTTP/1.0 or above response from
* server that must support HTTP/1.0 since it
* supports keep-alive. The connection
* must be in a bad state now so
* restart the whole thang by going to the ERROR state
*/
cd->next_state = HTTP_ERROR_DONE;
}
/* if we are getting a successful read here
* then the connection is valid
*/
cd->connection_is_valid = TRUE;
if(cd->protocol_version == POINT_NINE)
t /* HTTP/0.9 */
NET_cinfo ctype;
TRACEMSG(("Receiving HTTP/0.9") ) ;
ce->URL_s->content_length = 0;
ce->URL_s->real_content_length = 0;
PR_FREEIF(ce->URL_s->content_encoding);
ce->URL_s->content_encoding = NULL;
PR_FREEIF(ce->URL_s->content_name),-
ce->URL_s->content_name = NULL;
if(!ce->URL_s->preset_content_type) {
PR_FREEIF(ce->URL_s->content_type);
ce->URL_s->content_type = NULL;
/* fake the content_type since we can't get it */
ctype = NET_cinfo_find_type(ce->URL_s->address);
/* treat unknown types as HTML */
if(ctype->is_default)
StrAllocCopy(ce->URL_s->content_type, TEXT_HTML);
else
StrAllocCopy(ce->URL_s->content_type, ctype->type);
/* fake the content_encoding since we can't get it */
StrAllocCopy(ce->URL_s->content_encoding,
(NET_cinfo_find_enc(ce->URL_s->address))->encoding);
)
cd->next_state = HTTP_SETUP_STREAM;
} else {
/* Decode full HTTP/1.0 or 1.1 response */
TRACEMSG(("Receiving HTTP1 reply, status; %d", ce->URL_s->server_status))
timestamp(ce->URL_s->address,"\t started receiving");
if(ce->URL_s->server_status != 304
&& (!cd->use_proxy_tunnel |] cd->proxy_tunnel_setup_done))
{
I* if we don't have a 304, zero all
80


* the content headers so that they
* don't interfere with the
* incoming object
*
* don't zero these in the case of a proxy tunnel
* since this isn't the real request this
* is just the connection open request response
*/
if(!ce->URL_s->preset_content_type) {
PR_FREEIF(ce->URL_s->content_type),-
ce->URL_s->content_type = NULL;
}
ce->URL_s->content_length = 0;
ce->URL_s->real_content_length = 0;
PR_FREEIF(ce->URL_s->content_encoding);
ce->URL_s->content_encoding = NULL;
PR_FREEIF(ce->URL_s->transfer_encoding);
ce->URL_s-> trans fer_encoding=NULL;
PR_FREE IF (c e >URL_s > c on t en t_name) ;
ce->URL_s->content_name = NULL;
switch (ce->URL_s->server_status / 100) {
case 1:
if(ce->URL_s->server_status == 100) {
char endOfResp;
char tmp;
int curRespSize=0;
/* We just got a 100 Continue response from the server.
* Skip it. We don't do any special handling. We're not
* dealing with the status line or any headers sent with
* the response. */
/* Get the length of the response as a string. The end
* of the response is a double CRLF sequence. */
endOfResp=PL_strstr(cd->line_buffer, CRLF CRLF);
/* If we can't find the end of the response, either we
* haven't received it all yet, or it's a malformed
* resposne. */
if(endOfResp) {
endOfResp+=4;
tmp= endOfResp;
*endOfResp='\0';
curRespSize=PL_strlen(cd->line_buffer);
* endO f Re sp=tmp ,-
cd->line_buffer_size -= curRespSize;
PR_ASSERT(cd->line_buffer_size >= 0) ;
if(cd->line_buffer_size > 0)
memmove(cd->line_buffer, endOfResp, cd-
>line_buf fer_size) ,-
)
/* by not setting cd->next_state to something different
* we will come back to this function and look for another
* first line.
*/
return(0);
}
break;
case 2: /* Succesful reply */
81


/* Since we
* are getting a new copy, delete the old one
* from the cache
*/
#ifdef MOZILLA_CLXENT
/* NET_RemoveURLFromCache(ce->URL_s); */
#endif
if((ce->URL_s->server_status == 204
|| ce->URL_s->server_status == 201)
&& cd->acting_as_proxy)
{
if(ce->URL_s->files_to_post && ce->URL_s->files_to_post[0])
{
/* fall through since we need to get
* the headers to complete the
* file transfer
*/
} else {
ce->status = MK_NO_ACTION;
cd->next_state = HTTP_ERR0R_D0NE;
return STATUS(ce->status);
}
) else if(cd->partial_cache_file
&& ce->URL_s->range_header
&& ce->URL_s->server_status != 206)
{
/* we asked for a range and got back
* the whole document. Something
* went wrong, error out.
*/
ce->status = MK_TCP_READ_ERROR;
ce->URL_s->error_msg = NET_ExplainErrorDetails(
MK_TCP_READ_ERROR,
XP_ERRNO_EIO) ;
cd->next_state = HTTP_ERROR_DONE;
return(ce->status);
)
break;
case 3: /* redirection and other such stuff */
if(!cd->acting_as_proxy
&& (ce->URL_s->server_status == 301
|| ce->URL_s->server_status == 302))
C
/* Redirect with GET only (change POST to get)
*
* Supported within the HTTP module.
* Will retry after mime parsing
*/
TRACEMSG(("Got Redirect code"));
cd->doing_redirect = TRUE;
}
if(!cd->acting_as_proxy && ce->URL_s->server_status == 307)
{
/* Redirect without changing METHOD
*
* Supported within the HTTP module.
* Will retry after mime parsing
*/
TRACEMSG(("Got Redirect code"));
cd->doing_redirect = TRUE;
cd->save_redirect_method = TRUE;
82


} else if(ce->URL_s->server_status == 304)
{
/* use Che copy from Che cache since ic wasn'C modified
*
* noCe: Chis will work wich proxy clienCs Coo
V
if(ce->URL_s->lasC_modified)
{
#ifdef MOZILLA_CLIENT
/* check Co see if we jusC now enCered a secure space
*
* don'C do if Chis is coming from hisCory
* don'C do Chis if abouC Co redirecc
V
if(HG82773
Sc.Sc (ce->formaC_ouC == FO_CACHE_AND_PRESENT
| I ce->formaC_ouC == FO_PRESENT)
ScSc ce->URL_s->hisCory_num)
{
HisCory_enCry h = SHIST_GeCCurrenC(&ce->window_id-
>hisC);
HG03903
)
#endif /* MOZILLA_CLIENT */
cd->use_copy_from_cache = TRUE;
/* no longer reCum since we need Co parse
* headers from Che server
*
* reCurn(MK_USE_COPY_FROM_CACHE);
*/
) /* end url_s->lasC_modified */
/* else conCinue since Che server messed up
* and senC us a 304 even wiChouC us having senC
* ic an if-modified-since header
*/
}
break;
case 4: /* clienc error */
ce->URL_s->preseC_conCenC_Cype = FALSE;
if (ce->URL_s->server_sCaCus == 401 ScSc !cd->acCing_as_proxy)
{
/* never do Chis if acCing as a proxy
* If we are a proxy Chen jusC pass on
* Che headers and Che documenC.
*
* if auChorizaCion_required is sec we will check
* below afCer parsing Che MIME headers Co see
* if we should redo Che requesC wich an auChorizaCion
* sCring
*/
cd->auChorizaCion_required = TRUE;
/* Since we
* are geCCing a new copy, deleCe Che old one
* from Che cache
*/
#ifdef MOZILLA_CLIENT
#endif
NET_RemoveURLFromCache(ce->URL_s);
/* we probably wane Co cache Chis unless
* Che user chooses noc Co auChorize himself
83


*/
} else if (ce->URL_s->server_status == 407 && !cd->acting_as_proxy)
{
/* This happens only if acting as a client */
cd->proxy_auth_required = TRUE;
/* we probably want to cache this unless
* the user chooses not to authorize himself
*/
} else {
/* don't cache unless we have a succesful reply */
TRACEMSG(("Server did not return success: NOT CACHEING!!"));
ce->URL_s->dont_cache = TRUE;
/* all other codes should be displayed */
}
break;
case 5: /* server error code */
TRACEMSG(("Server did not return success: NOT CACHEING!!!"));
ce->URL_s->dont_cache = TRUE;
ce->URL_s->preset_content_type = FALSE;
#ifdef D0_503
if(ce->URL_s->server_status == 503 && !cd->acting_as_proxy)
{
cd->server_busy_retry = TRUE;
}
#endif /* DO_503 */
/* display the error results */
break;
default: /* unexpected reply code */
{
char message_buffer[256];
sprintf(message_buffer,
XP_Ge tString (XP_ALERT_UNKNOWN_STATUS) ,
ce->URL_s->server_status);
FE_Alert(ce->window_id, message_buffer);
/* don't cache unless we have a succesful reply */
TRACEMSG(rServer did not return 200: NOT CACHEING!!!"));
ce->URL_s->dont_cache = TRUE;
}
break;
} /* Switch on server_status/100 */
cd->next_state = HTTP_PARSE_MIME_HEADERS;
} /* Full HTTP reply */
return(0); /* good */
}
/* NET_process_HTTP will control the state machine that
* loads an HTTP document
*
* returns negative if the transfer is finished or error'd out

* returns zero or more if the transfer needs to be continued.
*/
PRIVATE int32
net_ProcessHTTP (ActiveEntry *ce)
{
HTTPConData cd = (HTTPConData *)ce->con_data;
TRACEMSG( ("Entering NET_ProcessHTTP")
84


cd->pause_for_read = FALSE; /* already paused; reset */
while(!cd->pause_for_read) {
switch(cd->next_state) {
case HTTP_START_CONNECT:
ce->status = net_start_http_connect(ce);
break;
case HTTP_FINISH_CONNECT:
ce->status = net_finish_http_connect(ce);
break;
case HTTP_SEND_PROXY_TUNNEL_REQUEST:
/* send proxy tunnel init stuff */
ce->status = net_send_proxy_tunnel_request(ce);
break;
case HTTP_BEGIN_UPLOAD_FILE:
/* form a put request */
ce->status = net_begin_upload_file (ce);
break;
HG51096
case HTTP_SEND_REQUEST:
/* send HTTP request */
ce->status = net_send_http_request(ce);
break;
case HTTP_SEND_POST_DATA:
ce->status = net_http_send_post_data(ce);
break;
case HTTP_PARSE_FIRST_LINE:
ce->status = net_parse_first_http_line(ce);
break;
case HTTP_PARSE_MIME_HEADERS:
ce->status = net_parse_http_mime_headers(ce);
break;
case HTTP_SETUP_STREAM:
ce->status = net_setup_http_stream(ce);
break;
case HTTP_BEGIN_PUSH_PARTIAL_CACHE_FILE:
ce->status = net_http_begin_push_partial_cache_file(ce);
break;
case HTTP_PUSH_PARTIAL_CACHE_FILE:
ce->status = net_httpjmsh_partial_cache_file(ce);
break;
case HTTP_PULL_DATA:
ce->status = net_pull_http_data(ce);
break;
case HTTP_DONE:
TIMING_STOPCLOCK_NAME("http:xfer, ce->URL_s->address,
ce->window_id, "ok");
NET_ClearReadSelect(ce->window_id, cd->connection->sock);
NET_TotalNumberO fOpenConnec t i ons--;
#ifdef TRUST_LABELS
ProcessCookiesAndTrustLabels(.ce ) ;
#endif
85


if(ce->URL_s->can_reuse_connection && !cd->use_proxy_tunnel) {
cd->connection->busy = FALSE;
} else {
PR_Close(cd->connection->sock);
/* remove the connection from the cache list
* and free the data */
XP_ListRemoveObject(http_connection_list, cd->connection);
if(cd->connection) {
PR_FREEIF(cd->connection->hostname);
PR_Free(cd->connection);
}
}
#ifdef MOZILLA_CLIENT
/* make any meta tag changes take effect */
NET_RefreshCacheFileExpiration(ce->URL_s)
#endif /* MOZILLA_CLIENT */
if(cd->stream) {
COMPLETE_STREAM;
PR_Free(cd->stream)
cd->stream = 0;
}
#if defined(SMOOTH_PROGRESS) && !defined(MODULAR_NETLIB)
/* XXX what to do if redirected to cache? */
PM_StopBinding(ce->window_id, ce->URL_s, 0, NULL);
#endif
timestamp(ce->URL_s->address, "\t finished receiving");
cd->next_state = HTTP_FREE;
break;
case HTTP_ERROR_DONE:
TIMING_STOPCLOCK_NAME("http:post", ce->URL_s->address, ce->window_id,
" error")
TIMING_STOPCLOCK_NAME("http:request", ce->URL_s->address, ce-
>window_id, "error");
TIMING_STOPCLOCK_NAME("http:complete", ce->URL_s->address, ce-
>window_id, "error");
if(cd->connection->sock != NULL) {
NET_ClearDNSSelect(ce->window_id, cd->connection->sock);
NET_ClearReadSelect(ce->window_id, cd->connection->sock);
NET_ClearConnectSelect(ce->window_id, cd->connection->sock);
PR_Close(cd->connection->sock);
NET_TotalNumberOfOpenConnections--;
cd->connection->sock = NULL;
}
if(cd->partial_cache_fp) {
NET_ClearFileReadSelect(ce->window_id, XP_Fileno(cd-
>partial_cache_fp));
NET_XP_FileClose(cd->partial_cache_fp);
cd->partial_cache_fp = 0;
}
if(cd->connection->prev_cache
&& !cd->connection_is_valid
&& ce->status != MK_INTERRUPTED) {
if(cd->stream && !cd->reuse_stream) {
ABORT_STREAM(ce->status);
PR_Free(cd->stream);
cd->stream = 0;
}
86


/* the connection came from the cache and
* may have been stale. Try it again
*/
/* clear any error message */
if(ce->URL_s->error_msg) {
PR_Free(ce->URL_s->error_msg);
ce->URL_s->error_msg = NULL;
}
cd->next_state = HTTP_START_CONNECT;
ce->status = 0;
/* we don't know if the connection is valid anymore
* because we are going to try it again
*/
cd->connection_is_valid = FALSE;
cd->connection->prev_cache = FALSE;
/* if we were posting a file and received an error put the
* current file we were posting back on the end of the list
* so that our state is correctly reset
*/
net_revert_post_data(ce);
} else {
cd->next_state = HTTP_FREE;
if(cd->stream) {
ABORT_STREAM(ce->Status);
PR_Free(cd->stream);
cd->stream = 0;
}
/* remove the connection from the cache list
* and free the data
*/
XP_ListRemoveObject(http_connection_list, cd->connection);
PR_FREEIF(cd->connection->hostname);
PR_Free(cd->connection);
#if defined(SMOOTH_PROGRESS) && !defined(MODULAR_NETLIB)
PM_StopBinding(ce->window_id, ce->URL_s, -1, NULL);
#endif
break; /* HTTP_ERROR_DONE */
case HTTP_FREE:
/* close the file upload progress. If a stream was created
* then some sort of HTTP error occured. Send in an error code */
#ifdef EDITOR
if(cd->destroy_file_upload_progress_dialog) {
/* Convert from netlib errors to editor errors. */
ED_FileError error = ED_ERROR_NONE;
if ( (ce->URL_s->server_status != 204 && ce->URL_s->server_status
!= 201 )
|| ce->status < 0 )
error = ED_ERROR_PUBLISHING ;
FE_SaveDialogDestroy(ce->window_id, error, ce->URL_s->post_data);
/* FE_SaveDialogDestroy(ce->window_id, ce->URL_s->server_status !
204 ? -1 : ce->status, ce->URL_s->post_data); */
}
#endif /* EDITOR */
if(ce->URL_s->files_to_post && ce->URL_s->post_data) {
/* we shoved the file to post into the post data.
87


* remove it so the history doesn't get confused
* and try and delete the file. */
PR_FREEIF (ce->URL_s->post_data) ;
ce->URL_s->post_data = NULL;
ce->URL_s->post_data_is_file = FALSE;
}
#if !defined(SMOOTH_PROGRESS) || defined(MODULAR_NETLIB)
if(cd->destroy_graph_progress)
FE_GraphProgressDestroy(ce->window_id,
ce->URL_s,
cd->original_content_length,
ce->bytes_received);
#endif
PR_FREEIF(cd->line_buffer);
PR_Free(cd->stream); /* don't forget the stream */
PR_FREEIF(cd->server_headers);
PR_FREEIF(cd->orig_host);
if(cd->tcp_con_data)
NET_FreeTCPConData(cd->tcp_con_data);
PR_FREEIF(cd);
JSCF_Cleanup();
return STATUS (-1); /* final end HTTP_FREE */
default: /* should never happen !!! */
TRACEMSG(("HTTP: BAD STATE! ) ) ;
cd->next_state = HTTP_ERROR_DONE;
break;
} /* end switch */
/* check for errors during load and call error state if found */
if(ce->status < 0
Sc Sc ce->status != MK_USE_COPY_FROM_CACHE
Sc Sc cd->next_state != HTTP_FREE) {
if (ce->status == MK_MULTIPART_MESSAGE_COMPLETED) {
/* We found the end of a multipart/mixed response
* from a CGI script in a http keep-alive response
* That signifies the end of a message. */
TRACEMSG(("mkhttp.c: End of multipart keep-alive response"));
ce->status = MK_DATA_LOADED;
cd->next_state = HTTP_DONE;
) else if (ce->status == MK_HTTP_TYPE_CONFLICT
/* Don't retry if were HTTP/.9 */
&& !cd->send_httpl
/* Don't retry if we're posting. */
&& !cd->posting) {
/* Could be a HTTP 0/1 compability problem. */
TRACEMSG(("HTTP: Read error trying again with HTTPO request."));
#if defined(SMOOTH_PROGRESS) &Sc !defined(MODULAR_NETLIB)
PM_Status(ce->window_id, ce->URL_s,
XP_GetString(XP_PROGRESS_TRYAGAIN));
#else
NET_Progress (ce->window_id, XP_GetString(XP_PROGRESS_TRYAGAIN))
#endif
NET_ClearReadSelect(ce->window_id, cd->connection->sock);
NET_ClearConnectSelect(ce->window_id, cd->connection->sock);
#ifdef XP_WIN
if(cd->calling_netlib_all_the_time)
NET_ClearCallNetlibAllTheTime(ce->window_id, "mkhttp");
#endif /* XP_WIN */
NET_ClearDNSSelect(ce->window_id, cd->connection->sock);
PR_Close(cd->connection->sock) ;
88


NET_TotalNumberOfOpenConnections--;
cd->connection->sock = NULL;
if(cd->stream)
(*cd->stream->abort)(cd->stream, ce->status);
cd->send_httpl = FALSE;
/* go back and send an HTTPO request */
cd->next_state = HTTP_START_CONNECT;
} else {
cd->next_state = HTTP_ERROR_DONE;
}
/* dont exit! loop around again and do the free case */
cd->pause_for_read = FALSE;
} /* ce->status < 0 */
} /* while(!cd->pause_for_read) */
return STATUS(ce->status);
}
89


Appendix J. Response Time Measurement
cudenver 2/9 9:00 2/9 9:30 2/9 10:30 2/9 1:00 2/9 1:30 2/9 1:58 2/93:05 2/9 6:00 2/10 10:00 2/10 10:30 2/10 11:00
Client Response Time 1.760 1.920 3.660 1.920 1.870 2250 1.920 2.690 1.920 2.090 1.810
Sewer Response Time,As 0.116 0.209 0 139 0.118 0.127 0.115 0.153 0.118 0.136 0.443 0.123
Network Time 1.644 1.711 3.541 1.602 1.743 2.135 1.767 2.572 1.782 1.647 1.687
header.jpg
Client Response Time 9.120 9.510 15.110 8.240 8.570 8.020 9.400 6.490 7.140 6.040 6.420
Server Response Time.Rs 0.200 0.060 0.024 0.015 0.122 0.058 0.019 0.016 0.015 0.060 0.115
Network Time 9.100 9.451 15.670 8.225 8.448 7.963 9.381 6.474 7.12S 5.960 6.305
Grad rule 2/9 10:00 2/9 10:30 2/9 1:00 2/9 1:30 2/9 3:00 2/9 6 00 2/96:20 2/10 10:00 2/10 10:30 2/10 10:45 2/10 1:00
Client Response Time 4.620 12.250 5.660 S.660 5.650 5.660 5.770 5720 7.360 5.660 5.600
Server Response Time,Rs 0.031 0.032 0.055 0.084 0.510 0.057 0.070 0.044 0.232 0.037 Q.0S5
Network Time 4.589 12.218 5.605 5 576 5.599 5.803 S.700 5.676 7.128 5.843 5.S4S
About camp
Client Response Time 4 220 5.490 1.650 1.920 3.130 1.760 1.760 2.810 1.980 1.920 3.240
Server Response Time.Rs 0.264 0.085 0.014 0.040 0.013 0.015 0.013 0.013 0.049 0.014 0016
Network Time 3.956 5.405 1.636 1.880 3.117 1.745 1.747 2.797 1.931 1.906 3.224
pi4)lic_hlml 2/9 10:00 2/9 10:30 2/9 1:00 2/9 1:30 2/9 3:00 2/9 4:00 2/9 6:00 2/9 6:20 2/10 10:00 2/10 10:20 2/10 10:45
Client Response Time 0.870 1.260 1.100 0.880 0620 0 870 0.880 0.820 0.940 0.930 0.880
Server Response Time.Rs 0.064 0.026 0.119 0.040 0.023 0.060 0.024 0021 0.190 0.082 0.065
Network Time 0.806 1 234 0.981 0.640 0.797 0.810 0.856 0.799 0.750 0.848 0.795
calendar
Client Response Time 1.260 0.820 0.880 1.430 1.370 3.020 1.490 1.320 1.380 1.640 1.810
Server Response Time.Rs 0.477 0.393 0.471 0.346 0.553 1.085 0.474 0.339 1.004 0.424 0.479
Network Time 0.783 0.427 0.409 1.062 0817 1.935 1.016 0.981 0.277 1.216 1.331
90


cudenver 2/101:00 2/10 1:30 2/10 ZOO 2/10 3:00 2/10 6:00 2/10 6:30 2/11 10:00 2/11 10:45 2/11 1 00 2/11 1:30
Client Response Time 1.920 2.090 3.620 1.870 1.920 1.980 1.920 1.870 1.870 1.920
Server Response Tirne.Rs 0.148 0.194 0.130 0.121 0.227 0.118 0.125 0.117 0.144 0.118
Network Time 1.772 1 896 3.490 1.749 1.693 1.862 1.795 1.753 1.726 1.802
header.jpg
Client Response Time 8.290 9.560 8.070 25.600 5.990 7.910 8070 6.160 7 140 6200
Server Response Tme.Rs 0.01 S 0.031 0.022 0.255 0.015 0041 0017 0026 0074 0077
Network Time 8.275 9.529 8.048 25.345 5.976 7.869 6 053 6.134 7.066 6.123
Grad rule 2/10 1:30 2/101:45 2/10 3:00 2/10 6:00 2/10 6:30 2/11 10:00 2/11 10:45 2/11 1:00 2/11 1:30
Client Response Time 5.380 5.330 7.800 9.670 5.660 5.820 5.430 5.770 6.100
Server Response Time.Rs 0.091 0 030 0.450 0.060 0.031 0.065 0.049 0.144 0 610
Network Time 5.289 5.300 7.360 9.610 S.629 5.755 5.381 5.626 6039
About camp
diem Response Time 2.960 1.380 4400 2900 1.640 3.570 1.930 1.970 3S70
Server Response Time.Rs 0.024 0.140 0.070 0.034 0.015 0.014 0.033 0.028 0.013
Network Time 2.936 1.366 4.330 2.560 1.625 3.556 1.897 1.944 3557
public Jilml 2/10 1:00 2/10 1:30 2/10 1:45 2/10 3:00 2/10 6:00 2/10 6:30 2/11 10:00 2/11 10:40 2/11 10:45 2/11 1:00 2/11 1:30
Client Response Time 0.930 0.940 0.880 0.990 0.860 0.860 0.680 0.860 0.820 0.860 0.990
Server Response Time.Rs 0.063 0.028 0.022 0.019 0.061 0019 0.027 0.020 0.023 0.026 0.027
Netmrk Time 0.867 0.912 0.858 0.971 0.619 0.861 0.853 0.860 0.797 0.854 0.963
calendar
Client Response Time 1.260 1.260 1.750 1.430 0820 1.540 1.920 1.320 1.310 2.140 1.640
Server Response Tme.Rs 0.491 0.374 0.437 0.295 0 437 0.295 0.602 0 383 0 385 0.469 0.408
Network Time 0.769 0666 1.313 1.135 0.383 1.245 1.316 0.937 0.925 1.871 1.234
91


References
[1] Dilley, John; Friedrich, Rich; Jin, Tai; Rolia, Jerome. Web server
performance measurement and modeling techniques, Performance Evaluation,
vol. 33, no. 1, pp.5,1998.
[2] Dilley, John; Friedrich, Rich; Jin, Tai; Rolia, Jerome. Measurement tools and
modeling techniques for evaluating Web server performance, Computer
Performance Evaluation, no. 1245, pp.155, 1997.
[3] Iyengar, A.; Macnair, E.; Nguyen, T. An analysis of Web server
performance, IEEE Global Telecommunications Conference, vol. 3, pp.1943,
1997.
[4] Tucker, Michael Jay. Managing your Web-to-database performance,
Datamation, vol. 43, no. 1, pp.106,1997.
[5] Web balancing, Computerworld, vol.30, no. 48, pp.59, 1996.
[6] Blum, Howard. Internet connection for Web access: an example for
performance modeling and simulation, SIGCSE Bulletin, vol.28, no. 2, pp. 62,
1996.
[7] Sevcik, Peter J. Designing a high performance Web site, Business
Communications Review, vol.26, no.3, pp.27, 1996.
[8] Hu, J. C.; Mungee, S.; Schmidt, D. C. Techniques for developing and
measuring high performance Web servers over high speed networks, IEEE
INFOCOM98, vol. 3., pp.1222, 1998.
[9] Fleming, T. B.; Midkiff, S. F.; Davis, N. J. Improving the performance of the
world wide web over wireless Networks, IEEE Global Telecommunications
Conference, vol. 3, pp.1937, 1997.
[10] Hu, J. C.; Pyarali, I.; Schmidt, D. C. Measuring the impact of event
dispatching and concurrency modles on Web server performance over high-speed
networks, IEEE Global Telecommunications Conference, vol. 3, pp.1924, 1997.
92


[11] Kung, H. T.; Wang, S. Y. Client-server performance on flow-controlled
ATM networks: A Web database of simulation results, IEEE INFOCOM, vol. 3,
pp.1218, 1997.
[12] Clarke, S. J.; Willett, P. Estimating the recall performance of Web search
engines, Aslib Proceedings, vol. 49, no.7 pp.184,1997.
[13] Ding, W.; Marchionini, G. A comparative study of Web search service
performance, Proceedings oftheASIS annual meeting, vol. 33, pp.136, 1996.
[14] MaCarthy, Vance. Web security: how much is enough, Datamation, vol.
43.no. 1, pp.l 12, 1997.
[15] Rosenthal, L.; Skall, R.; Brady, M.; Kass, M.; Montanez-Rivera, C. Web-
based conformance testing for VRML, Standard View, vol.5, no. 3, pp.l 10,
September, 1997.
[16] Crovella, M.; Barford, P. The network effects of prefetching, IEEE
INFOCOM 98, vol. 3, pp.1232, 1998.
[17] Nomachi, M; Sugaya, Y; Togawa, H; Yasuda, K; Mandjavidze, I.
Performance measurements of mixed data acquisition and LAN traffic on a
credit-based flow-controlled ATM network, IEEE Transactions on Nuclear
Science, vol. 45, no. 4, pp.1854, 1998.
[18] Hal Berghel. Internet kiosk: The world wide web test Pattern to check
HTML client compliance, Computer, vol.28, no. 9, pp.63, 1995.
[19] SPECweb96 Release 1.0 Run and Reporting Rules., System Performance
Evaluation Cooperative(SPEC), http://www.specbench.org/osg/web96/, 1998.
[20] Lee Bruno. Web application server, Data communications, vol.27, no. 1,
pp.36, 1998.
[21] Rolia, J.A.,Sevcik,K.C. The Method of Layers, IEEE Transaction on
Software Engineering, vol.21, no.8, pp. 689, August, 1995.
[22] Khan, Kushal; Locatis, Craig. Searching through Cyberspace: The effects of
link dispaly and link density on information retrieval from hypertext on the world
93