Update! Learn the ins and outs of a faster site in our Ultimate Guide to Site Speed!
As you may already know, we’re a little obsessed with page load speed. We wanted our home page to load in under 1 second, and we were close. But close isn’t good enough.
So, with some guidance from Ian, I started out on my quest for sub 1 second page load times. The journey took me about a month, filled with research, converting and building configurations, trial-and-error, performance, and load testing. In the end, it was all worth it because portent.com is now screaming fast:
Since re-launching in the new environment, portent.com averages .4 seconds/page (fist bump).
The New Environment
Our old environment was a 2-server setup: 1 dedicated web server with Apache, PHP, and APC and 1 dedicated database server with MySQL. It utilized keep-alives, compression (gzip, image, code), expires headers, a CDN, and caching provided by W3 Total Cache coupled with APC. This setup held its own for quite a while, but it did not accomplish our goal and with big traffic growth in 2012, there was plenty of room for improvement.
Bring in the new players: Varnish, NGINX, PHP-FPM, and APC
We spun up 3 Ubuntu 12.04 servers with help from our new friends at Rackspace:
- dedicated web server with NGINX, PHP-FPM5 and APC (4 GB RAM)
- dedicated MySQL database server (4 GB RAM)
- dedicated Varnish server (4 GB RAM)
NGINX
First, we setup NGINX. NGINX is an HTTP server with modular architecture that serves static and index files, supporting accelerated reverse proxying with caching, simple load balancing, autoindexing, gzipping, FastCGI caching, and much more. It wins high praise for its performance and scalability.
With some help from Tobias Baldauf’s article, I configured NGINX for our WordPress install. I added gzip compression to common file types in /…/nginx/nginx.config, including the custom fonts our site uses. In our domain-specific configuration (ie. /…/nginx/conf.d/portent.conf), I implemented pretty heavy caching for static files:
# Defined default caching of 24h
expires 86400s;
add_header Pragma public;
add_header Cache-Control "max-age=86400, public, must-revalidate, proxy-revalidate";
# Aggressive caching for static files
location ~* \.(asf|asx|wax|wmv|wmx|avi|bmp|class|divx|doc|docx|eot|exe|
gif|gz|gzip|ico|jpg|jpeg|jpe|mdb|mid|midi|mov|qt|mp3|m4a|mp4|m4v|mpeg|
mpg|mpe|mpp|odb|odc|odf|odg|odp|ods|odt|ogg|ogv|otf|pdf|png|pot|pps|
ppt|pptx|ra|ram|svg|svgz|swf|tar|t?gz|tif|tiff|ttf|wav|webm|wma|woff|
wri|xla|xls|xlsx|xlt|xlw|zip)$ {
expires 31536000s;
access_log off;
log_not_found off;
add_header Pragma public;
add_header Cache-Control "max-age=31536000, public";
}
I also added in the necessary directives to utilize PHP-FPM:
set $my_https "off";
if ($http_x_forwarded_proto = "https") {
set $my_https "on";
}
#Added for php-fpm.
location ~ \.php$ {
# Customizations for PHP-FPM
try_files $uri =404;
fastcgi_split_path_info ^(.+.php)(.*)$;
fastcgi_pass php5-fpm-sock;
fastcgi_index index.php;
fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
include /etc/nginx/fastcgi_params;
fastcgi_intercept_errors on;
fastcgi_ignore_client_abort off;
fastcgi_connect_timeout 60;
fastcgi_send_timeout 180;
fastcgi_read_timeout 180;
fastcgi_buffer_size 128k;
fastcgi_buffers 4 256k;
fastcgi_busy_buffers_size 256k;
fastcgi_temp_file_write_size 256k;
fastcgi_param HTTPS $my_https;
fastcgi_param REMOTE_ADDR $http_x_cluster_client_ip;
}
In the above code, the last two lines defined FastCGI parameters configure SSL terminating load balancing. In order to get the proper value for PHP variables like $_SERVER[‘HTTPS’] and $_SERVER[‘REMOTE_ADDR’], these definitions were required with our load balancing setup.
PHP-FPM
Next, I configured PHP-FPM for our environment. FPM stands for FastCGI Process Manager and is an alternative PHP FastCGI implementation and its features can be found here. Getting these values required performance tuning research. Please note that you should do your own research and testing. Here are some of the main definitions in /…/php*/fpm/pool.d/www.conf:
pm.max_children = 25
; The number of child processes created on startup.
; Note: Used only when pm is set to 'dynamic'
; Default Value: min_spare_servers + (max_spare_servers - min_spare_servers) / 2
pm.start_servers = 8
; The desired minimum number of idle server processes.
; Note: Used only when pm is set to 'dynamic'
; Note: Mandatory when pm is set to 'dynamic'
pm.min_spare_servers = 5
; The desired maximum number of idle server processes.
; Note: Used only when pm is set to 'dynamic'
; Note: Mandatory when pm is set to 'dynamic'
pm.max_spare_servers = 15
; The number of seconds after which an idle process will be killed.
; Note: Used only when pm is set to 'ondemand'
; Default Value: 10s
pm.process_idle_timeout = 60s;
; The number of requests each child process should execute before respawning.
; This can be useful to work around memory leaks in 3rd party libraries. For
; endless request processing specify '0'. Equivalent to PHP_FCGI_MAX_REQUESTS.
; Default Value: 0
pm.max_requests = 500
php_flag[display_errors] = off
php_admin_value[error_reporting] = 0
php_admin_value[error_log] = /var/log/php5-fpm.log
php_admin_flag[log_errors] = on
php_admin_value[memory_limit] = 128M
php_admin_value[date.timezone] = America/Los_Angeles
APC
Next, I brought APC into the fold. It’s a HUGE performance booster. APC stands for Alternative PHP Cache. And it works wonders. APC heavily optimizes and caches PHP code, storing it in shared memory and reducing the load on the web server. You can read all about its awesomeness here.
Again, you will want to do research and testing for your own environment, but here are the APC settings at the bottom of our PHP.ini file, /…/php*/fpm/php.ini:
[apc]
apc.max_file_size = "2M"
apc.localcache = "1"
apc.localcache.size = "256"
apc.shm_segments = "1"
apc.ttl = "3600"
apc.user_ttl = "7200"
apc.gc_ttl = "3600"
apc.cache_by_default = "1"
apc.filters = ""
apc.write_lock = "1"
apc.num_files_hint= "500"
apc.user_entries_hint="4096"
apc.shm_size = "256M"
apc.mmap_file_mask=/tmp/apc.XXXXXX
apc.include_once_override = "0"
apc.file_update_protection="2"
apc.canonicalize = "1"
apc.report_autofilter="0"
apc.stat_ctime="0"
apc.stat = "1"
You can boost your performance even further by setting apc.stat to “0”, but it will require you to flush the APC opcode every time you upload a new version of a PHP file. Because we are constantly working on our site, this wasn’t a very practical option. When apc.stat is set to “1” (on), it will check the file/code being requested against the cached version and update the cache automatically if there is a difference. A slight performance hit, but in my testing, not enough to warrant the hassle of turning it off.
Varnish
Lastly, we setup Varnish on its dedicated server. Varnish is a reverse proxy HTTP accelerator developed for dynamic, content-heavy web sites. Varnish caches pages in virtual memory, leaving the operating system to decide what gets written to disc or stored in RAM. Varnish becomes the top layer of the web stack. All traffic routes through it. Because Varnish keeps static content stored in RAM for fast access, the web server makes many fewer PHP and MySQL calls.
Challenges
Converting our Apache .htaccess file to NGINX configuration syntax nearly drove me nuts. This was my first time working with NGINX so there was a lot of research and trial-and-error testing, but NGINX config can handle anything that Apache can, so it was a matter of problem solving.
Another challenge was getting PHP to properly define variables in the new load balanced environment — mainly HTTPS and REMOTE_ADDR. The definitions for PHP-FPM found in our site-specific NGINX configuration file did the trick.
The last big challenge was getting the hang of Varnish. After a few days of testing we came across an issue where some of our pages were being cached with our mobile styles, regardless of being viewed on a desktop, tablet, or mobile device. When a page’s Varnish cache had expired, the next request would get cached. Occasionally, that first post-expired request came from a mobile device, thus caching the request with the mobile settings. The solution is to configure Varnish to keep your mobile cache separate from your main cache. I added this vcl_hash function to our Varnish config located in /…/varnish/default.vcl:
sub vcl_hash {
hash_data(req.url);
if (req.http.host) {
hash_data(req.http.host);
} else {
hash_data(server.ip);
}
# ensure separate cache for mobile clients (WPTouch workaround)
if (req.http.User-Agent ~ "iP(hone|od)" || req.http.User-Agent ~ "Android" || req.http.User-Agent ~ "SymbianOS" || req.http.User-Agent ~ "^BlackBerry" || req.http.User-Agent ~ "^SonyEricsson" || req.http.User-Agent ~ "^Nokia" || req.http.User-Agent ~ "^SAMSUNG" || req.http.User-Agent ~ "^LG") {
hash_data("touch");
}
return (hash);
}
The function above will add ‘touch’ to each data cache being requested if the user-agent meets the conditions of the if statement, thus keeping mobile cache separate.
In Conclusion
It was a lot of work. But our new configuration loads twice as fast, and it doesn’t bog down if we have a big traffic day. We upped our speed to plaid.
Good Job.
Let us know what RUM tell you about the real web performance 😉
Thanks! I just got Pingdom’s RUM (real user monitoring) setup on the site, so I’ll have even more conclusive results soon. Thanks for the idea.
Insanely cool. Nice work Andy!
The Schwartz is strong with Portent.
Holy crap. THANK YOU JEDI BROTHERS. Your Schwartz is large.
We have been struggling with a few speed issues over the past month and this is insanely cool. Rock on, and thanks.