How to implement a real life benchmark with PHP

To determine the maximum capacity of a web page, Apache ab is often used in the first step. Fetching one URL very often is optimal for caching and gives a best case. To get the worst case for caching, it is necessary to fetch different URLs in a random order.

Here is a PHP script to walk randomly on a web page:

To get the average case concerning caching and response times, we need to choose the most relevant links. For example, we skip links from headers and footers. This can be done by using a different xpath expression in the code:

// fetch all links under <div id="content">...</div>
$xpath = '//div[@id="content"]//a';

// fetch all links under <div id="content"> and <div id="menu">
$xpath = '//div[@id="content" or @id="menu"]//a';

To make the benchmark more realistic, you can define a waiting period between two requests: Uncomment "// sleep(1)" at the end of the script.
To get the right values for $limit (number of pages per user) and $processes (number of users), you can consult your favorite analytics tool.

Example output:

php random_crawler.php >details.log

Testing http://www.spiegel.de/, 100 requests, 10 processes
#2393 start 10 requests
#2395 start 10 requests
#2396 start 10 requests
#2394 start 10 requests
#2399 start 10 requests
#2397 start 10 requests
#2398 start 10 requests
#2402 start 10 requests
#2401 start 10 requests
#2400 start 10 requests
#2398 end 188/815 KB 3.54s 0.35s/req
#2393 end 176/751 KB 3.78s 0.38s/req
#2396 end 153/562 KB 3.90s 0.39s/req
#2401 end 137/628 KB 4.19s 0.42s/req
#2397 end 149/456 KB 4.89s 0.49s/req
#2399 end 156/525 KB 4.90s 0.49s/req
#2402 end 171/619 KB 5.95s 0.60s/req
#2400 end 127/349 KB 7.40s 0.74s/req
#2394 end 157/465 KB 8.36s 0.84s/req
#2395 end 167/662 KB 10.62s 1.06s/req
(sizes shown as compressed/uncompressed)

Comments

Popular posts from this blog

How to show only month and year fields in android Date-picker?

How to construct a B+ tree with example

Conflict Serializability in database