I guess you have two problems, and one is simply unveiling the other.
The first problem is that the newsletter software/service is probably using unique links to track clicks and campaigns. For example, if Google Analytics tracking is enabled in a MailChimp campaign, it’ll append a utm_campaign
variable to every target URL, and a unique per-user utm_term
variable.
This means, that every single click in your e-mail campaign is a unique URL, and your page caching plugin will probably not serve it from cache (assuming you’re using a page caching plugin), but generate it from scratch, which could be causing the load. I’m not sure about W3 Total Cache, there’s probably a setting or something, but here’s how I ignore utm_
variables in my Batcache configuration:
// Ignore get keys not used by PHP to serve cached pages.
$ignore_get_keys = array( 'utm_source', 'utm_medium', 'utm_term', 'utm_content', 'utm_campaign' );
parse_str( $_SERVER['QUERY_STRING'], $query );
foreach ( $ignore_get_keys as $key ) {
if ( isset( $query[ $key ] ) )
unset( $query[ $key ] );
if ( isset( $_GET[ $key ] ) )
unset( $_GET[ $key ] );
}
$_SERVER['QUERY_STRING'] = http_build_query( $query );
The second problem is the fact that your 4-core 16G server dies with 155 opens. A properly configured $5 single-core 512M server can serve over 5000 requests per second, with page caching to be fair. Around 5-10 per second without caching.
So according to my very rough calculations, you should be able to serve a minimum of 50 requests per second without caching at all, so if 155 opens is causing massive load problems on your server, then something is clearly wrong.
Profiling is a good place to start. Get yourself an XHProf module and you can even do it on your production server. Have it e-mail and/or log requests that take longer than 1s and you’ll probably spot the bottleneck pretty quickly.
After you figure out and sort the bottleneck, I also recommend you ditch Apache in favor of nginx and php 5.6 in fpm mode.