Recently, I have faced an annoyed tweak when setting crontab to run some ruby script. I have tested the command many times before putting it in cron, however, when cron tried to excute that ruby script, it immediately failed.
What my script does is reading from a json file and import to DB line by line, a simple task! And because of the json file contain Japanese character, I put the
# encoding: utf-8
at the beginning of the file. and it works perfectly when I run
$ ruby myscript.rb
However, when I tried to put it in crontab and check the log I got this
“\xE4” on US-ASCII
And that was when I got freaked out. After reading some interesting articles, I found out that when crontab running, it runs on its own environment which is lack of some settings in our user’s environment such as LANG, PATH stuff etc… and here the problem came out. Since the json file contain Japanese character and I didn’t force encoding to utf-8 when I read the file
File.open(“#{dir}/#{file_name}”, ‘r’).each do |line|
master_collection.insert JSON::parse(line)
end
and the consequence is that I got the encode error.
There are some way you can work around this error.
1. Encode the data when read from the file.
Here is the manual http://ruby-doc.org/core-1.9.3/IO.html#method-c-read. So the File handling block could be re-written as
File.readlines(“#{dir}/#{file_name}”, :encoding => ‘ja_JP.utf-8’).each do |line|
master_collection.insert JSON::parse(line)
end
2. Set crontab environment settings.
On many Unix derived systems you can set cron configuration before the section of setting jobs to run by adding them on top of job declaration. For example:
$ sudo crontab -e
SHELL=/bin/bash
PATH=/sbin:/bin:/usr/sbin:/usr/bin
LANG=ja_JP.UTF-8
0 21 * * * /usr/local/rvm/rubies/ruby-1.9.3-p392/bin/ruby /web/project/myscript.rb
This will tell cron to export all the environment stuff every times it runs.
SHELL: this will ensure all the jobs will be run on bash
PATH: directories where to look up executable files.
LANG: environment locale. (the key for our situation)
Besides those, there are a few more useful configurations such as
BASE_ENV=/path/to/environment : You can easily to put all of your environment setting in one config file (typically .profile, .bash_profile, or .bashrc) and tell cron to load it
HOME=/path/to/home : This will automatically derives from /etc/passwd but you can override it
In case of you OS does not support the environment settings, you can always export environment variable or load environment file by using source. For example:
0 21 * * * . $HOME/.profile && /usr/local/rvm/rubies/ruby-1.9.3-p392/bin/ruby /web/project/myscript.rb
or
0 21 * * * export BASE_ENV=$HOME/.profile && /usr/local/rvm/rubies/ruby-1.9.3-p392/bin/ruby /web/project/myscript.rb
You can take the full reference at linux.about.com and pantz.org for more details of cron settings.