Thursday, October 08, 2015

Deconstructing .pyc files

I've recently been trying to work out why python was recompiling a bunch of .pyc files. I haven't solved that, but I learnt a little along the way, enough to be worth writing down.

Python will recompile a .py file onto a .pyc file if it thinks something's changed. But how does it decide something has changed? It encodes some of the pertinent details in the header of the .pyc file.

Consider a file. There's a and a foo.pyc. I open up the .pyc file in emacs and view it in hex. (ESC-x hexl-mode for those unfamiliar.)

The file starts off like this:

 03f3 0d0a c368 7955 6300 0000 ....

The first 4 bytes 03f30d0a are the magic number, and encode the version of python. There's a list of magic numbers in the source, here.

To check this, take the 03f3, reverse it to f303, which is 62211 decimal. That corresponds to 2.7a0 - this is python 2.7, so that matches. (The 0d0a is also part of the encoding of the magic number.) This check is just to see if the .pyc file is compatible with the version of python you're using. If it's not, it will ignore the .pyc file and may regenerate it.

The next bit is c3687955. Reverse this again to get the endianness right, and it's 557968c3. In decimal, that's 1434020035.

That's a timestamp, standard unix time. What does that correspond to?

perl -e '$f=localtime(1434020035); print $f'
Thu Jun 11 11:53:55 2015

And I can look at the file (on Solaris and illumos, there's a -e flag to ls to give us the time in the right format rather than the default "simplified" version).

/bin/ls -eo
-rw-r--r-- 1 root  7917 Jun 11 11:53:55 2015

As you can see, that matches the timestamp on the source file exactly. If the timestamp doesn't match, then again python will ignore it.

This has consequences for packaging. SVR4 packaging automatically preserves timestamps, with IPS you need to use pkgsend -T to do so as it's not done by default.

No comments: