Feeds, Cron, Nginx and bad gateways
Recently I noticed that although I have many Feeds nodes set up to import on the hour only 4 or 5 were firing off. And they tended to be the same ones all the time.
Running Cron from the console I realised that the webserver was timing out after a minute and throwing sometimes throwing an error. You can see this by Curling the cron URL or putting it in a browser.
http://example.com/admin/config/system/cron
So the timeout needs to go up I think. But having worked through all the timeouts I could find I still had the error. My final solution (see the second to last edit) was a patch to feeds to increase how long Drupal is allowed to run a queue.
Testing with curl
time curl --connect-timeout 600 http://example.com/cron.php?cron_key=xxxxxxxxxxxxxxxx
Configuring NGINX
Set read_timeout for cron path to 10 minutes. We need some overhead in here or we'll get collisions with the 403 Gateway timeout error.
# vi /etc/nginx/sites-enabled/default
location ~ ^/cron.php {
include fastcgi_params;
fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
fastcgi_intercept_errors on;
fastcgi_pass 127.0.0.1:9000;
fastcgi_read_timeout 600s;
}
Restart NGINX
# service nginx reload
Configuring PHP-FPM
Set max_execution_time in PHP to 5 minutes and check with phpinfo();
# vi /etc/php5/fpm/php.ini
max_execution_time = 360
# service php5-fpm reload
Configuring the Feeds hidden setting in sites/default/settings.php. The first (http_request_timeout) sets the timeout for Curl attempting to connect to a Feeds source. The default is 30 seconds. There is a patch to move this into UI. The second (feeds_process_limit) limits how many nodes will be imported each time. This defaults to 50. For large datasets this should go waaaay up if you're expecting them to come in every run.
$ vi sites/default/settings.php
$conf['http_request_timeout'] = 120;
$conf['feeds_process_limit'] = 150; # how many nodes to import per feed http://drupalcode.org/project/feeds.git/blob/HEAD:/README.txt
Overriding the hardcoded timelimit that Feeds sets for the Queue API call (module hack--this should to be in a patch or a conf variable really).
$ vi feeds/feeds.module
function feeds_cron_queue_info() {
$queues = array();
$queues['feeds_source_import'] = array(
'worker callback' => 'feeds_source_import',
'time' => 360,
);
And if you haven't already configure Cron itself to run every 10 minutes. If Drupal's cron collides with another cron run it can refuse to execute.
$ crontab -e
*/10 * * * * curl -q "http://example.com/cron.php?cron_key=xxxxxxxx" > /dev/null
In the end though all of this wasn't enough. I've had to resort to drush and have added some work to the Feeds drush integration effort. This means I can call feeds-import-all on a cron job directly and avoid all the queueing and web server timeouts all together.