BTMash

Blob of contradictions

Optimizing your site - modules gone wild.

Fri, 06/03/2011 - 23:00 -- btmash
This is a graph from munin showing how there was a large increase in queries/sec

While working on the one of the more content-heavy websites at my workplace (over 100k nodes consisting of galleries of photos, with some galleries having well over 2000 photos), I had noticed that while performance for logged in users (which isn't high since only site editors can log in) has mostly stabilized (there are areas where performance can improve), performance for all other site visitors was very inconsistent. Some of the galleries and gallery photos pages seemed to load fairly quickly while others took a few seconds for the page to show up. I wasn't entirely sure on what was going on since (each time I viewed the site) I would not see the issue and as a anonymous user, it only seemed to occur when I was looking at an non cached page (second, third+ visits and the page was loading just fine. So based off the requirements in my previous article (see Optimizing your site query by adding indexes) on the various possibilities, it seems like the issue was stemming from (5) something is up at the PHP layer (drupal, one of the modules, etc).

I enabled devel and displaying the queries again on a page. When looking at the pages where the issue was occurring, things looked fairly normal (about a hundred queries, less than 100ms total query running time). But since the problem was only occurring for anonymous users, I set the permissions so anonymous users could see devel data. And the results were staggering. The same page that ran under 100 queries was suddenly running well over 5000!
Vegeta from Dragonball Z yells 'Its over 9000!' when Goku's behavior suddenly changes. Same as the kind of behavior from the site
With old tv show references out of the way, it was time to start figuring out what was causing the issue. After digging and some experimenting, I was able to narrow down that the problem was occurring at the caching layer (when I disabled the Boost module, the number of queries dropped back down to sane numbers (under 100, low total query running time). The main reason all of this was being caused:

  1. Enabled Custom Pagers module.
  2. Each gallery photo has a pager with a previous photo/next photo in gallery via a view.
  3. Some galleries have over 1000 photos.
  4. Boost implements hook_views_render to get related nodes to a piece of content so if the content changes, the cache for the other nodes are flushed as well.
  5. Instead of adding the node IDs, Boost performed a full node_load to get a small amount of information regarding the node.
  6. Fiasco.

One solution was to disable boost though that would also be getting rid of the benefits of using the amazing things that boost does. So I wrote a small bit of code in my module:

/**
* Implementation of hook_views_pre_execute().
* Run before boost executes
*/
function calarts_photo_gallery_views_pre_execute(&$view) {
  if ($view->name == 'photo_gallery_content' && module_exists('boost')) {
    global $_boost_cache_this, $_boost_cache_this_backup;
    $_boost_cache_this_backup = $_boost_cache_this;
    $_boost_cache_this = FALSE;
  }
}

/**
* Implementation of hook_views_post_execute().
* Run after boost executes
*/
function calarts_photo_gallery_views_post_render(&$view) {
  if ($view->name == 'photo_gallery_content' && module_exists('boost')) {
    global $_boost_relationships, $_boost_cache_this, $_boost_cache_this_backup;
    $_boost_cache_this = $_boost_cache_this_backup;
    if ($_boost_cache_this) {
      $nids = array();
      foreach ($view->result as $item) {
        $nids[] = $item->nid;
      }
      $result = db_query("SELECT nid, type FROM {node} WHERE nid IN (". implode(', ', $nids) .")");
      while($node = db_fetch_object($result)) {
        $GLOBALS['_boost_relationships'][] = array('child_page_callback' => 'node', 'child_page_type' => $node->type, 'child_page_id' => $node->nid);
      }
    }
  }
}

Essentially, what this module does is, for that particular view, I implement hook_views_render and hook_views_post_render. In the former, I essentially tell boost that caching the current page is not allowed so it does not run the kind of queries that it does. In the latter, I gather the data that boost requires using a much simpler query (1 query as opposed to more than 1 query as seen by boost), and tell boost that the current page *can* be cached. Problem solved.

Since then, I had submitted a patch for the boost module to do something a lot saner (you can view it here) but it seems that the dev branch of boost supports something even cooler. But the point of all this is that no amount of tuning on the other ends (more powerful hardware, fine-tuning the mysql layer, memcached, boost (which was the cause of the issue in the first place!!)) would have been able to fix a module essentially going wild and it is again very important to diagnose causes of issues before running forward with fixes that do not remedy the situation.