The quest for the Holy object serialization format
One of my favorite features of delayed_job is that you can delay execution of any method on any object. In order to get this to work, you have to be able to serialize the object into a database field, and then load it in a separate process.
For certain objects, like ActiveRecord or MongoMapper models, you don’t actually want to “serialize” this object, but instead just reload the record from the database. To accomplish this, delayed_job previously would call #to_yaml on the job, and do a nasty hack to store any objects that were ActiveRecord objects. This has always bothered me, so yesterday I set out to find a proper solution to serializing jobs.
Scene 1: enter YAML
YAML had 2 major problems in the context of how delayed_job was using it:
- It would serialize the attributes of the ActiveRecord class and reload them in the same state that they were in when the job was created. In most instances, I want to load the class in its current state from the database.
- You can’t call
#to_yamlon a class, which delayed_job required to delay execution of class methods
String.to_yaml # TypeError: can't dump anonymous class Class
Scene 2: who needs documentation?
It turns out that YAML has an undocumented feature (and YAML was originally written by _why, so you have to be mad genius to understand the code) where you can define how objects should be serialized and unserialized.
Here is how it works for ActiveRecord:
class ActiveRecord::Base yaml_as "tag:ruby.yaml.org,2002:ActiveRecord" def self.yaml_new(klass, tag, val) klass.find(val['attributes']['id']) end def to_yaml_properties ['@attributes'] end end
Problem 1: solved.
Scene 3: a Class act
As luck would have it, someone submitted a patch to YAML back in 2006 to add #to_yaml to Class and Module. _why was reluctant to accept the patch because “reloading these objects causes trouble if you haven’t required the right libraries”. This doesn’t worry me with delayed_job because the worker will be running in the same environment.
Here’s the monkey patch in all its glory:
class Module yaml_as "tag:ruby.yaml.org,2002:module" def Module.yaml_new( klass, tag, val ) if String === val val.split(/::/).inject(Object) {|m, n| m.const_get(n)} else raise YAML::TypeError, "Invalid Module: " + val.inspect end end def to_yaml( opts = {} ) YAML::quick_emit( nil, opts ) { |out| out.scalar( "tag:ruby.yaml.org,2002:module", self.name, :plain ) } end end class Class yaml_as "tag:ruby.yaml.org,2002:class" def Class.yaml_new( klass, tag, val ) if String === val val.split(/::/).inject(Object) {|m, n| m.const_get(n)} else raise YAML::TypeError, "Invalid Class: " + val.inspect end end def to_yaml( opts = {} ) YAML::quick_emit( nil, opts ) { |out| out.scalar( "tag:ruby.yaml.org,2002:class", self.name, :plain ) } end end
Problem 2: Solved
Scene 4: Finalé
This all seem seems to work wonderfully, but I’m left wondering if there’s something I’m missing. Anyone see any problems with using YAML in this way, or have I found the Holy Grail?
Post a Comment