Faster site response times in Drupal
One of the sites we are building out right now is a digital library for the school where I work. The library holds almost 35,000 media assets of various types (mostly images, but also some video, audio, PDF documents, and other file formats), and includes a robust set of credits and classification categories to help browsing and searching. We are using Drupal 6 as our framework, with several dozen modules giving us the feature set we are after.
As we get closer and closer to site launch, however, we’ve noticed a performance issue on the site that has us worried. Specifically, the site’s response times at times are unacceptably slow (at times greater than a minute between clicking on a link and having the page load). The library’s curator, our IT staff and I have been working to identify the causes of the problem, but have not made much headway thus far.
This post is an attempt to map out our strategies, share what has worked and what has not, and (hopefully) get feedback about other things we can try. So, to the details:
Existing Infrastructure / Stack
The site uses several WAMP stacks (Windows Server 2008, Apache, MySQL, and PHP). Versions are recent, but not currently the latest. Apache runs on one server, with a separate database server that also supports an IIS server for other school sites. We have a separate Tomcat server that runs an Solr search, leaving a total of three servers that get called into play for various tasks.
We are using Shibboleth for user accounts, in order to tie into our University’s identity structure – we wanted to avoid having to ask our students to create yet another account to access the library, about half of which cannot be shared publicly due to copyright and permissions issues. The presence of Shibboleth should not, in my view, impact performance – but I mention it for completeness.
Our full module list is still somewhat in flux, but includes the usual suspects: CCK, Views, Panels, ImageCache, Schema, Emfield, Token, Workflow, etc. There are 52 modules in total (or at least, 52 directories in sites/all/modules; some have been disabled by this point). I keep intending to put together the official list, but have not yet done so.
What We Have Tried
I started with our database server, as that seems like a natural pinch point. Because our server supports two different production servers that include a handful of sites, load from one site could cause slowdowns elsewhere. I’ve turned on the slow query log, and reviewed that every few days looking for queries which might benefit from additional indexes. So far nothing stands out in blazing red letters, but I may not be looking closely enough.
I have requested reports from our IT staff showing performance indexes from the various servers, like CPU load, bandwidth and memory utilization, etc. I have access to realtime dashboards, but (unfortunately) nothing yet that provides a historical survey of these metrics like what might come from MRTG.
We’ve added RAM to the Apache server, bringing it up to 6 Gb. This is a virtualized, single-processor server. Increasing the RAM seemed to help, at least for a bit, but it has since slowed after the initial improvement.
We’ve added APC to the PHP environment on the Apache server. It has 48 Mb available to it, but according to the APC dashboard it never seems to be using more than 3-4 Mb. The last four days have seen about 320,000 hits to the cache, with zero reported misses.
I have high hopes for this module, which claims to be able to divert anonymous traffic away from hitting the full PHP/MySQL side of things and simply directing Apache to serve a static HTML file where appropriate. It took a bit of doing to get the module up and running (there was a conflict in our httpd.conf file which prevented .htaccess overrides), but it does seem to have sped up the site for anonymous users.
The gist of the Boost module seems to be similar in some regards to imagecache, in that it avoids dynamic generation of content in favor of serving static content that has been pre-generated. There are some crucial differences, but I’m still very hopeful that this can be part of the solution.
From here, I’m not sure. I’m hoping to go through some of the materials that Lullabot/Drupalize.me has put out on performance tuning, including a look at our query and opcode caches. In the back of my head I’m wondering whether we shouldn’t rebuild our Apache server to have multiple processors.
One thing is clear, though; we need a site that responds faster, both for the browsing users and for curatorial staff who are trying to make changes to the data.