When things break – Yet another look inside the workshop for a self-hosted blog
You can classify blogs into categories using many different criteria. From a blog-administrator’s point of view, the main classification whether a blog is self-hosted on your own server (such as WordPress.org) or hosted by the blogging service itself (such as Blogger, Tumblr, Posterous, WordPress.com,…)
Blogs hosted by the blogging service make you dependent on their uptime and functionality. Self-hosting blogs allow you to take the tiller in your own hands, giving you much more freedom to sculpture the blog as you see best fit. But it also dumps all the work to backup, upgrade and maintain your blog into your lap. “With freedom also comes responsibility”, did your parents not tell you that when you were allowed to go to a party for the first time?
I just wrapped up yet another adventure with one of my selfhosted blogs. Let me share it with you, as this is yet another reminder to all those of you thinkering about self-hosting your blog.
When things go wrong…
News On Green is my main aggregator for environmental news. It is hosted on my Hostgator VPS server, like most of my other blogs, and has about 130,000 blogposts. I monitor this server’s performance continuously and three days ago, I saw the resource hogging going haywire. The amount of CPU time consumed went through the roof, free memory got exhausted and everything slowed down up to the point where I could not even log into the server anymore.
I had seen instances like that before, when the caching broke on some of the blogs. As my sites get a lot of traffic (currently around 70,000 visits per month, excluding all the RSS feeds polls, search engine crawlers,..), proper caching is critical. If caching breaks on even one of the sites, the server has to go to its MySQL database for every single page visit, and the server goes belly up under the CPU load.
This is apparently what happened on News On Green, where I use the WP Supercache plugin for caching. For each individual page or post on your blog, Supercache creates a subdirectory in the /cache/supercache directory. That’s where it stores a “pre-cooked” .html file, rather than executing a MySQL query for each page. HTML files are “served” much faster than MySQL queries.
Debugging, once more…
To help debugging, Supercache also puts a comment line at the bottom of the HTML page for each post. It contains some statistics or error messages. This time, I saw the error “can not create /cache/supercache/subdirectory/xxx.tmp”.
The error.log came up with all sorts of errors, including:
PHP Notice: Undefined index: HTTP_ACCEPT in xxx/wp-content/plugins/wp-super-cache/wp-cache-phase1.php on line 415
So Supercache was no longer caching the pages, so far was clear. But the question was why.
I checked the /supercache directory for the blog, and found that I had 31,998 cached pages, thus 31,998 directories. One subdirectory for each cached page or post. When I see numbers anywhere close to a magic “^2″ figure, I get suspicious: 32, 64, 16,000, 32,0000… And as “31,9998″ pages was close to one of the magic “^2″ figures (32,000), I suspected a system or account limitation on my server.
I called up my hosting company, Hostgator via an online chat channel. The technician was really patient. Whereas on other hosting services, they would have answered “This is a WordPress problem”, their support took the trouble of checking several system parameters. But we could not find anything limiting the number of files or sub directories per directory…
I was still convinced I hit some hard threshold. As an experiment, I deleted some 100 cached pages, and lo and behold, the cache started working again. Eureka, workaround found! But what was the cause of the problem?
While online, I searched the internet for “maximum Linux subdirectories”, and found this post. There seems to be Linux system parameter (for filesystems EXT4, EXT3, EXT2 – which are used on most Linux hosts). The maximum number of Linux subdirectories in one single directory is… 31,998.
Bingo! That was exact the amount of subdirectories I had in my cache directory.
So what was the solution? I deleted the oldest 5,000 cached pages. As if by miracle, the crippled server resurrected like a phoenix.
The permanent solution will be, of course, to automate this check: a CRON job should automatically trim the oldest cached pages, once I come close to the limit of 32,000. Or 31,998 to be exact. Work to do in the next days.
Now how is that for a nerdy story, hey?
Picture courtesy Kaboodle