I have the pleasure of working on CalArt's Photo Archive. From the conversion as a flat-file website into Drupal, the site has come a long way. And for a mid-sized website (a little over 100k nodes) running on somewhat older hardware (at least 3 years old), we are fairly happy with the performance (using the boost module allows us to serve cached content at approximately 2k requests per second). However, I was noticing that some of our content type pages were loading quite slowly prior to being cached.
Before we start to seriously think about the things others may say such as 'get a faster version of php', 'get better hardware', or 'drupal sucks!', it is important to try and diagnose what is going on. In our scenario, the possibilities were:
- Something is up with the hardware.
- Something is happening in the web server (apache)
- Something is happening at the extra caching layer (memcached)
- Something is happening at the database layer (mysql)
- Something is up with at the php layer(Drupal, Boost - time to switch to pressflow or rewrite what we have)
We started by diagnosing using the Devel module, looking to see how much memory is taken up on a page load, what queries are running, etc. Its a great tool to diagnosing some of the issues that can come up in the software/db layer. By doing this, I noticed we had one query (its a long one!)
SELECT node.nid AS nid, node.title AS node_title, node_data_field_photo_lossy_image.field_photo_lossy_image_fid AS node_data_field_photo_lossy_image_field_photo_lossy_image_fid, node_data_field_photo_lossy_image.field_photo_lossy_image_list AS node_data_field_photo_lossy_image_field_photo_lossy_image_list, node_data_field_photo_lossy_image.field_photo_lossy_image_data AS node_data_field_photo_lossy_image_field_photo_lossy_image_data, node.type AS node_type, node.vid AS node_vid, flag_content_node.timestamp AS flag_content_node_timestamp, flag_content_node_1.timestamp AS flag_content_node_1_timestamp, DATE_FORMAT((FROM_UNIXTIME(flag_content_node.timestamp) + INTERVAL -25200 SECOND), '%Y') AS flag_content_node_timestamp_year, DATE_FORMAT((FROM_UNIXTIME(flag_content_node_1.timestamp) + INTERVAL -25200 SECOND), '%Y') AS flag_content_node_1_timestamp_year, node.nid AS node_nid FROM node node LEFT JOIN content_type_calarts_photo node_data_field_photo_lossy_image ON node.vid = node_data_field_photo_lossy_image.vid INNER JOIN files files_node_data_field_photo_lossy_image ON node_data_field_photo_lossy_image.field_photo_lossy_image_fid = files_node_data_field_photo_lossy_image.fid LEFT JOIN flag_content flag_content_node ON node.nid = flag_content_node.content_id AND flag_content_node.fid = 4 LEFT JOIN flag_content flag_content_node_1 ON node.nid = flag_content_node_1.content_id AND flag_content_node_1.fid = 3 LEFT JOIN content_field_photo_gallery node_data_field_photo_gallery ON node.vid = node_data_field_photo_gallery.vid WHERE (node.status = 1) AND (node_data_field_photo_gallery.field_photo_gallery_nid = 205385) ORDER BY flag_content_node_timestamp_year DESC, flag_content_node_1_timestamp_year DESC, node_title ASC, node_nid ASC
Once you recover from the query and everything that's going on, devel showed the amount of time it took to run the query (1.4 seconds - I ran this directly on mysql as well with very similar results), it was easy to see that this was the most probable cause of slowness on the site. With the other settings via devel (there is actually one other issue that I found but perhaps that will be for another blog post ^_^), it was time to examine the reason for why this query was running slowly. With a lot of components in there, it can get fairly daunting to try and figure out what is going on. Luckily, you can also run 'explain ' which will tell you everything that is going on when you go through the query.
As I was going through it, the piece that seemed out to me was that, aside from the last (top) portion using a filesort, the flag_content portions of the joins involved using the index
fid_content_type_content_id_uid. What this means is that the join above, which does not have any conditions on the content_type or on the uid (the uid does not matter, but content_type does) means that the index is not fully utilized and only the index on the fid is added in. So we have a *MUCH* later join set. So my test came by directly doing the following query where we add an index on fid and content_type:
CREATE INDEX fid_content_id ON flag_content(fid, content_id); . I then ran the explain query again, saw that the new index was being used and felt hopeful. And once I ran the query above once again, the query now took 0.05 seconds to run. This means the query now runs 28 times faster than before!
By now adding the index to my module install file (I have opened an issue on D.O for the flag module to get an index but until then, we need to safely add the index), the index can safely be applied from dev -> staging -> production versions of the site and it is now functioning correctly again.
The important lesson is to remember that there may be many possible factors into the performance of a website. In this instance, the problem was at the database layer. And by doing some research on how to solve such and issue (be in tuning the database server settings or in this case, a simple index), it can mean the difference between requiring another server and keeping what you have :)