No, it’s not the antithesis of the lower education system in much of the Arabic world. After writing my previous post, I was thinking about how powerful each “micro” instance actually is… I mean, what happens when my newly minted pages of beauty get linked from lifehacker or arstechnica or TUAW? Will the magical server in the cloud be able to handle the rush to download the woefully outdated ones and zeros?? And so I set forth to find out how my tiny little instance (costing me US $0.03, thus far) can handle the traffic.
I found a number of cool sites that automate the testing process for you. Yottaa is super fun (and it’s hosted in AWS itself!); it provides longitudinal monitoring, tells you which parts of your page may be slowing the load process, and compares several more benchmarks to other sites around the world. Great for pondering about all the geeky ways you can spend your time optimizing your site instead of struggling through that poorly worded research paper (*cough*). LoadImpact was were I did most of my stress testing; their free trial lets you launch 50 clients to attack your website, or about 49 more than typically visit mine. Amazon’s own CloudWatch service also lets you retrieve metrics for your server by typing incantations like the one below.
mon-get-stats CPUUtilization --start-time 2011-01-30T16:20:00.00Z --end-time 2011-01-30T19:20:00.00Z --period 300 --statistics "Average,Minimum,Maximum" --namespace "AWS/EC2" --dimensions "InstanceId=i-0a123456"
After an epic battle with the still buggy UI of Excel 2011 (yes, okay, I’ll use MATLAB next time), I finally managed to produce the following graphs:
Initial results were disastrous. With only 13 clients on the most complex page, load times were already above 10 seconds. In fact, with only 7 clients my server’s logs indicated out of memory errors, forcing instances of Apache to close. WordPress provides some great advantages with its dynamic production of webpages, but PHP + Apache can be a memory hog. This turns out to be a real problem on a server with only 613 MB of RAM. After 7 clients, we see memory swapping causing the disk op/s (IOPS) rapidly rise into the thousands (!), the CPU utilization follows due data starvation. As indicated by the network bandwidth, the whole operation maxes out at 10 clients when the CPU also hits 100% utilization. Amazon’s storage backend struggles mightily to keep up with all the swapping, but even with IOPS about 20 times higher than a 7200 RPM SATA drive, the latency really destroys the CPU. (In fact, I didn’t saturate the storage backend… the average queue length was only around 12 IOPS.) At 32 clients, load times exceeded 50 seconds and LoadImpact aborted the test. =P
The solution, of course, is caching! Fortunately, WordPress has no shortage of caching plugins available. I chose W3 Total Cache because the documentation was well written. With page, database, and object caching enabled, load times with just a single client were approximately halved. The test ran all the way to 50 clients without the server breaking a sweat. In fact, the limiting resource was the # of the Apache instance spawned; CPU utilization never exceeded 30%, and disk IOPS didn’t even register! Great success! Now if I do some tweaking to apache2.conf to increase # of Apache instances I can avoid reading more research papers…
Even less entertaining thoughts follow:
Curiously the CloudWatch stats returned are for 1 minute intervals… I didn’t see this documented anywhere. For example, the NetworkOut metric returned the average number of Bytes in 1 minute, even though the measurement period was 5 minutes. Also, the measured bandwidth on the AWS side was a little higher than what LoadImpact returned.
apache2.conf: <IfModule mpm_prefork_module> StartServers 5 MinSpareServers 5 MaxSpareServers 10 MaxClients 50 MaxRequestsPerChild 3000 </IfModule>