A New Breed of Cron in Magento EE 1.13

By now the overhaul of the indexer system used in Magento Enterprise Edition 1.13 is fairly common knowledge, especially amongst those that have the privilege to work with it on some large builds. I’ve had the chance to work with it a fair bit on a rather large project currently still in the oven over at Classy Llama. This project has over 40 million product pages and 5 million parts in inventory!! I don’t think it would have been possible before EE 1.13 was released…

Amongst all these leaps and bounds forward also came some more behind-the-scene type changes. Those under the hood changes which support the development of new things, but which can also be a pain to figure out given just the right circumstances. One of these was the introduction of a new breed of cron task along with a unique dispatch mechanism. If you don’t understand it (and have shell_exec disabled on your servers) the cron will fail 100% silently and without remorse.

If you’ve configured a cron job in the past, you’ll be somewhat familiar with this bit of XML which incidentally uses the same type of expression as used by crontab on *nix systems to define the frequency of it’s run:

    <crontab>
        <jobs>
            <task_name>
                <schedule>
                    <cron_expr>0 1 * * *</cron_expr>
                </schedule>
                <run>
                    <model>mymodule/observer::dailyUpdateTask</model>
                </run>
            </task_name>
        </jobs>
    </crontab>

What you may not have noticed or seen before is the existence of crontab schedules with a cron_expr value of simply always — this would happen to be the case because it’s indeed a new breed of cron job. As it stands right now EE 1.13 is the only version of Magento using this type of cron… but the changes under the hood supporting it are present in Magento CE 1.8 as well.

Here is an example (actually the only real case at this point) of this type of cron task being used:

    <crontab>
        <jobs>
            <enterprise_refresh_index>
                <schedule>
                    <cron_expr>always</cron_expr>
                </schedule>
                <run>
                    <model>enterprise_index/observer::refreshIndex</model>
                </run>
            </enterprise_refresh_index>
        </jobs>
    </crontab>

The new indexers were probably a good candidate for it’s inaugural process, but a few other pesky EE cron tasks also come to mind as perfect candidates for this type of handling. Yes, I’m looking directly at the enterprise_staging_automates job which schedules itself every single minute resulting in a very bloated cron_schedule table. Same goes for the cron task for the new queue system in EE 1.13 which has the same every-minute schedule and potential to build up.

I mentioned that there is a new dispatch mechanism for this new breed (as I’ve decided to affectionately call it since ‘always’ is obviously breaking with the traditional cron expression syntax) but how exactly does this new type of cron differ functionally? There are a few differences:

  1. They do not fill up the `cron_schedule` table with entries for future runs. You won’t see them in there until they have run.
  2. They are executed from the new `Mage_Cron_Model_Observer::dispatchAlways` method called via `Mage::dispatchEvent('always');` in the cron.php entry point.
  3. The frequency of such tasks is defined by the frequency at which the cron.sh file is executed by crontab on the server. If you call cron.sh every 15 minutes, they’ll run every 15, etc.

It appears one of the goals of this new cron type was allowing them to run potentially long cron jobs (such as a re-index where triggered by admin actions) without affecting the frequency of other scheduled cron tasks. This is accomplished through the cron.sh script now supporting a mode flag be passed to determine whether it will run the default or always cron tasks. With one of the functions of cron.sh being to prevent multiple cron “threads” from running on the server simultaneously, breaking it up like this allows a maximum of two, with one dedicated to functions like indexing where literally nothing or anything could be done.

The most substantial (or relevant) differences to the cron entry point and how the cron should be setup are in the cron.php file. In a nutshell, it checks for the presences of an option specifying which “mode” to work in and if not found it will attempt to spawn to cron.sh processes in the background… unless your server happens to have shell_exec disabled. In this case, the only real workaround is to put two entries in the crontab to accomplish the same thing: having both a job for default cron tasks and one for these new ‘always’ tasks.

Another thing to note is that although CE 1.8 doesn’t yet make use of the new type of cron task, the functionality differences in the entry point are the same as EE 1.13 and so the ramifications still apply.

Pretty much from here on out, my crontabs will look something like this:

* * * * * /home/mypretty/myprettysite.com/html/cron.sh cron.php -m=default
* * * * * /home/mypretty/myprettysite.com/html/cron.sh cron.php -m=always

As long as the servers you work with do not have shell_exec disabled as part of their security hardening, you should be able to keep right on working with one old fashioned crontab entry to keep both modes active.