Introducing Python Pickling
The motivation for writing this article came when I was working on my first major Python project and I wanted a way to write my class data to a on-disk file, just the way I had done it on numerous occasions in C, where I wrote the structure data to a file. So if you want to learn the Pythonic way of persistence storage of your class data, this is for you. Let us start!
A. Pickle, Unpickle
A pickle is a Python object represented as a string of bytes. Sounds utterly simple? Oh well, it is that simple! This process is called Pickling. So we have successfully converted our object into some bytes, now how do we get that back? To unpickle means to reconstruct the Python object from the pickled string of bytes. Strictly speaking its not reconstruction in a physical sense - it only means that if we have pickled a list, L, then after unpickling we can get back the contents of list simply by again accessing L.
The terms 'pickle' and 'unpickle' are related to object serialization and de-serialization respectively, which are language-neutral related terms for a process that turns arbitrarily complex objects into textual or binary representations of those objects and back.
A.1 The 'pickle' Module
The pickle module implements the functions to dump the class instance's data to a file and load the pickled data to make it usable.
Consider the Demo class below:
import pickle class Demo: def __init__(self): self.a = 6 self.l = ('hello','world') print self.a,self.l
Now, we will create an instance of Demo and pickle it.
>>> f=Demo() 6 ('hello', 'world') >>> pickle.dumps(f) "(i__main__\nDemo\np0\n(dp1\nS'a'\np2\nI6\nsS'l'\np3\n(S'hello'\np4\nS'world'\np5\ntp6\nsb.
The dumps function pickles the object and dumps the pickled object on the screen. I am sure that this is not really comprehensible and doesn't look very useful - but if we dump the pickled object to a on-disk file, the utility increases many fold. This is what we'll do next. Let's modify our code slightly to include the pickling code:
import pickle class Demo: def __init__(self): self.a = 6 self.l = ('hello','world') print self.a,self.l if __name__ == "__main__": f=Demo() pickle.dump(f, file('Demo.pickle','w'))
Now, let us unpickle:
>>> f3=pickle.load(file("Demo.pickle")) >>> f3.a 6 >>> f3.l ('hello', 'world') >>>
So far, so good.
A.2 The 'cPickle' Module
cPickle is an extension module written in C to provide pickling facilities which is about 1000 times faster than the pickle module. The usage is the same as pickle. Pickles produced by each are compatible.
>>> import cPickle >>> f3=cPickle.load(file("Demo.pickle")) >>> f3.l ('hello', 'world')
B. A Glimpse Behind the Scenes
The data format used by pickle is Python specific, which obviously discards pickling as an option for persistent storage if you are looking for a language-neutral solution. Human-readable and thus easily debuggable ASCII is the default format used by Python for writing pickled objects. There are 3 different protocols which can be used for pickling:
- Protocol version 0 is the original ASCII protocol and is backward compatible with earlier versions of Python.
- Protocol version 1 is the old binary format which is also compatible with earlier versions of Python.
- Protocol version 2 was introduced in Python 2.3. It provides much more efficient pickling of new-style classes.
C. Conclusion
The basic goal of this short tutorial was a hands-on introduction to pickling in Python as a method of writing class data to persistent storage, especially for new Python programmers. I have intentionally left out issues related to working with complex and bigger classes, for which some good resources are listed below. Again, more basic things such as pickling simple lists and dictionaries have been omitted, but this will not require much looking around to find the answers.
I hope that you are ready to use pickling in your projects. Happy coding!
References:
Talkback: Discuss this article with The Answer Gang
The author is a freelance technical writer. He mainly writes on the Linux kernel, Network Security and XML.