HowTo's, Technical
 

TimThumb, Nginx and Cache – Cookbook for Minimal Resource Consumption

TimThumb is one the most popular PHP scripts for simple image manipulations (cropping, zooming & resizing). Mainly used on WordPress for cached thumbnails generation. Yet, TimThumb’s caching mechanism is far from being perfect and requires execution of the timthumb PHP script for every image request. In this practical post we will show you how we’ve increased our server’s daily requests limit by 26%. We will guide you through step-by-step how to modify TimThumb and setup Nginx’s routing to serve TimThumb images with minimal resources.

Background:

We’ve recently migrated from Apache to Nginx to reduce our servers’ memory consumption. Just several days after the migration, we’ve noticed that during the rush hours Nginx’s is constantly returning Http error 502 Bad Gateway. After quick investigation of the error logs we’ve noticed the following repeating errors:

2014/02/24 19:52:30 [crit] 21081#0:
*1106 connect() to unix:/tmp/php5-fpm.sock failed (2: No such file or directory)
while connecting to upstream, client: 141.101.99.124,
server: tmb.rating-widget.com,
request: "GET /?src=http%3A%2F%2Fwww.pravda-tv.ru%2Fwp-content%2Fuploads%2F2013%2F12%2FJdWgm9t_DHU.jpg&w=50&h=40&zc=1 HTTP/1.1",
upstream: "fastcgi://unix:/tmp/php5-fpm.sock:",
host: "tmb.rating-widget.com",
referrer: "http://www.pravda-tv.ru/2014/02/24/38448"

Googling the error finds a huge collection of forum discussions about this issue. None of them actually provides a solid solution, just a set of speculations about the socket’s I/O and system’s resources limits. Many of the threads recommended to bind php-fpm to a TCP port rather than using a socket. Though after testing this approach, we’ve found other errors (mainly timeouts) on much lower load. Thus, we’ve re-configured Nginx’s PHP serving back to sockets and started to think how we can reduce the number of php requests being served by the machine.

In depth logs diagnose revealed that about 30% of the requests are being handled by TimThumb’s php script. In addition, after a quick research we’ve found several posts discussing the poor caching of TimThumb with ideas of how it could be fixed. Particularly, we found this short post – Nginx, TimThumb and Cache by Francisco Aranda, which had the best guidelines.

RatingWidget’s TimThumb Implementation

Before we dive into the solution details, it’s important to understand how we use TimThumb at RatingWidget. One of our products which called the Top-Rated Widget, is a really cool sidebar widget that shows the top rated pages in your website. To make it visually compelling, we also show a featured thumbnails next to the links. Here’s how it looks:

Top-Rated Widget (Compact Style)

The thumbnail generation is handled by TimThumb, which set up on a separated sub domain. Here’s a thumbnail url example:
http://tmb.rating-widget.com/?src=http%3A%2F%2Fwww.pravda-tv.ru%2Fwp-content%2Fuploads%2F2013%2F12%2FJdWgm9t_DHU.jpg&w=50&h=40&zc=1
w – thumbnail’s width
h – thumbnail’s height
src – featured image url

Solution:

To reduce the load of the php-fpm socket we need to decrease the number of TimThumb’s executions. In order to do that, we have to save the generated thumbnails on the physical hard drive as an image, not as a txt file. So on every thumbnail request, Nginx’s routing controller will serve the local file if it already exists on the disk. Which is much faster and don’t require any resource allocation by php-fpm. And only if the thumbnail hasn’t been yet generated, Nginx will execute TimThumb’s script that will generate the cached thumbnail.

Challenge #1 – Picking the right filename

As described before, we serve images from our external domains owned by our clients’. Thus, saving the file with its original name can cause collisions among different clients. After reading about the Rules for Naming Files on Linux, here’s what we’ve learned:

  • In short, file names may contain any character except / (root directory), which is reserved as the separator between files and directories in a path name.
  • Most modern Linux and UNIX limit file name to 255 characters (255 bytes).

This means that the url-encoded src parameter that holds the image url can be a valid filename as long as it’s no longer than 255 characters. Believe it or not, a quick test shows that
http%3A%2F%2Fwww.pravda-tv.ru%2Fwp-content%2Fuploads%2F2013%2F12%2FJdWgm9t_DHU.jpg
is a legit file name! Hence, we’ve decided that our cached filenames would be the encoded url. If longer than 255 characters, we use the 115 characters from the beginning (starting from the sub-domain) and 115 from the right of the image url. That way we ensure to include the sub-domain and the image original name as part of the filename. Thus, we prevent cross domain collisions, and highly reduce the chance of filename collisions from the same site. In fact, most of the image urls are much shorter than that, so the chance to collisions is practically ZERO.

Challenge #2 – Nginx routing

Even though Nginx’s IfIsEvil and try_files is the better way for rewrite rules, we had to use it to test and extract the data from the request’s query string. After hours of trials playing with the routing settings and with some help from Michael Gelfand (Senexx’s awesome CTO), here’s the complete Nginx configuration file that handles the TimThumb routing and caching:

server {
    listen       80;
    server_name  tmb.rating-widget.com;
    root         /full/path/to/your/app/thumb;

    location = / {
        set $src_filename_l _;
        set $src_filename_r _;
        set $src_extension _;

        #
        # Parse src param that have a valid image extension (jpg, jpeg, png or gif).
        #
        if ($args ~ "src=http(s?)%3A%2F%2F(?<src_filename_l>.{3,115})(.*?)(?<src_filename_r>.{0,115})\.(?<src_extension>(jpg|gif|png|jpeg))$"){ }

        #
        # Parse src param with a general address (image or any other given url).
        #
        if ($args ~ "src=http(s?)%3A%2F%2F(?<src_address_l>.{3,115})(.*?)(?<src_address_r>.{0,115})$"){ }

        #
        # If the src param was NOT a valid image url - use general address and set the file extension to jpg by default.
        #
        if ($src_extension = "_") {
            set $src_filename_l $src_address_l;
            set $src_filename_r $src_address_r;
            set $src_extension jpg;
        }

        #
        # Set some default values for timthumb width, height and zoom crop.
        #
        set $zc 1;
        set $w 152;
        set $h 110;
        if ($arg_zc) {
            set $zc $arg_zc;
        }
        if ($arg_w) {
            set $w $arg_w;
        }
        if ($arg_h) {
            set $h $arg_h;
        }
        #
        # Finally, combine the cache filename.
        #
        set $os_filename "$src_filename_l$src_filename_r-$zc-$h-$w.$src_extension";

        #
        # If the file found on the disk, simply serve the cached file without starting php-fpm.
        #
        if (-f $document_root/img-cache/$os_filename) {
            rewrite ^ /img-cache/$os_filename last;
            return 200;
        }

        include        fastcgi_params;
        fastcgi_read_timeout 120s;
        fastcgi_index  timthumb.php;
        # Optional addition of the filename for DEBUG.
        fastcgi_param  QUERY_STRING     $query_string&p=$os_filename;
        fastcgi_param  SCRIPT_FILENAME $document_root/timthumb.php;

        #
        # If the file is not found on the disk route the request to timthumb php script.
        #
        if (!-f $document_root/img-cache/$os_filename) {
             fastcgi_pass   unix:/tmp/php5-fpm.sock;
        }
    }

    #
    # Cache all requests
    #
    add_header        Cache-Control public;
    add_header        Cache-Control must-revalidate;
    expires           max;

    include /etc/nginx/conf.global.d/restrictions.conf;
    include /etc/nginx/conf.global.d/common.conf;
}

Challenge #3 – TimThumb script modification

There are two main things we had to modify in the code.
1. The cache filename generation which should be synced with Nginx’s settings:

a) We’ve added two protected members just below $src:

    protected $src = "";
    protected $ext = '';
    protected $src_img = false;
    protected $is404 = false;

b) Then we’ve created the cached filename generation method:

/**
* Source: http://stackoverflow.com/questions/1734250/what-is-the-equivalent-of-javascripts-encodeuricomponent-in-php
*/
public static function encodeURIComponent($str)
{
    $revert = array('%21'=>'!', '%2A'=>'*', '%27'=>"'", '%28'=>'(', '%29'=>')');
    return strtr(rawurlencode($str), $revert);
}

public function getCacheFilename()
{
    // Remove protocol prefix.
    // We cache and serve the same image for both Http and Https.
    $filename = ('https' ===  strtolower(substr($this->src, 0, 5))) ? substr($this->src, 8) : substr($this->src, 7);

    // Encode Uri componenet because Php auto decode url params.
    //
    // ref: http://stackoverflow.com/questions/1734250/what-is-the-equivalent-of-javascripts-encodeuricomponent-in-php
    $filename = self::encodeURIComponent($filename);

    // Get source file extension position.
    $extension_pos = strrpos($filename, '.');

    // Get requested thumbnail extension.
    $this->ext = strtolower(substr($filename, $extension_pos + 1));

    // Check if requested src is an image.
    $this->src_img = in_array($this->ext, array('jpg', 'png', 'gif', 'jpeg'));

    if (!$this->src_img)
    {
        // Source don't have a valid image extension.
        // We set it to jpg and serve a random jpg image from local images bank.
        $this->ext = 'jpg';

        // Set 404.
        $this->set404();
    }
    else
    {
        // Cut extension from filename.
        $filename = substr($filename, 0, $extension_pos);
    }

    // Get timthumb width, height and zoop params.
    // Also, limit the length of params in filename to 4 characters each.
    $w = min(9999, (int)abs($this->param('w', 0)));
    $h = min(9999, (int)abs($this->param('h', 0)));
    $zoom_crop = min(9999, (int)$this->param('zc', DEFAULT_ZC));

    $filename_len = strlen($filename);

    // Get left 115 characters starting from domain.
    $filename_left = substr($filename, 0, 115);

    if ($filename_len <= 115)         $filename_right = '';     else if ($filename_len >= 230)
        // Right 115 characters.
        $filename_right = substr($filename, max(0, $filename_len - 115));
    else
        // All characters left from the right.
        $filename_right = substr($filename, 115);

    $os_filename = $filename_left . $filename_right . '-' . $zoom_crop . '-' . $h . '-' . $w . '.' . $this->ext;

    return $os_filename;
}

c) Finally, we’ve modified TimThumb’s constructor to work with the new filename structure:

        …
        $cachePrefix = ($this->isURL ? '_ext_' : '_int_');
        if($this->isURL){
            $this->cachefile = $this->cacheDirectory . '/' . $this->getCacheFilename();
        } else {
            $this->localImage = $this->getLocalImagePath($this->src);
        …

2. Remove TimThumb’s cached file security block and image type data prefix:

a) In processImageAndWriteToCache:

        …
        $fp = fopen($tempfile,'r',0,$context);
        if (empty($this->ext))
            file_put_contents($tempfile4, $this->filePrependSecurityBlock . $imgType . ' ?' . '>'); //6 extra bytes, first 3 being image type
        file_put_contents($tempfile4, $fp, FILE_APPEND);
        …

b) In serveCacheFile:

        …
        $fp = fopen($this->cachefile, 'rb');
        if(! $fp){ return $this->error("Could not open cachefile."); }

        if (empty($this->ext))
        {
            fseek($fp, strlen($this->filePrependSecurityBlock), SEEK_SET);
            $imgType = fread($fp, 3);
            fseek($fp, 3, SEEK_CUR);
            if(ftell($fp) != strlen($this->filePrependSecurityBlock) + 6){
                @unlink($this->cachefile);
                return $this->error("The cached image file seems to be corrupt.");
            }
            $imageDataSize = filesize($this->cachefile) - (strlen($this->filePrependSecurityBlock) + 6);
        }
        else
        {
            $imageDataSize = filesize($this->cachefile);
            $imgType = $this->ext;
        }

        $this->sendImageHeaders($imgType, $imageDataSize);
        $bytesSent = @fpassthru($fp);
        fclose($fp);
        if($bytesSent > 0){
            return true;
        }
        $content = file_get_contents ($this->cachefile);
        if ($content != FALSE) {
            if (empty($this->ext))
                $content = substr($content, strlen($this->filePrependSecurityBlock) + 6);

            echo $content;
        …

Challenge #4 – Cleaning TimThumb’s cache

TimThumb has its own caching expiration mechanism. But due to the fact that each thumbnail request invokes the TimThumb script only once, the cached thumbnails will never expire and the hard disk will be constantly filled. The solution here is pretty simple, all we need to do is to set up a daily cronjob that will delete the cached files which are older than X days (we use 7 days caching):

0 0 */1 * * find "/full/path/to/your/app/img-cache" -name "*.jpg" -mtime +7 -exec rm -f {} \;
0 0 */1 * * find "/full/path/to/your/app/img-cache" -name "*.jpeg" -mtime +7 -exec rm -f {} \;
0 0 */1 * * find "/full/path/to/your/app/img-cache" -name "*.png" -mtime +7 -exec rm -f {} \;
0 0 */1 * * find "/full/path/to/your/app/img-cache" -name "*.gif" -mtime +7 -exec rm -f {} \;

Done! If you have any comments, feedback or ideas of how this could be improved please share your thoughts in the comments below.

You can download all the files from GitHub:

Posted by
 

One Comment

 

  1. Ran Rubinstein

    Hi Vova, great post! Check out Cloudinary, the cloud service that will take the heavy lifting of image manipulations off your server’s hands and let if focus on your actual service.

     
    Reply