I did a bit of googling, and it seems the easiest way out is to simply check the first few bytes of the file for magic numbers. So here's a bit of Python code for checking for whether the file data belongs to a JPEG, PNG or TIFF image:
def is_jpg(data):If the file is already on disk, you can grab the first few bytes with
return data[:2] == '\xff\xd8'
return data[:8] == '\x89PNG\x0d\x0a\x1a\x0a'
return data[:4] == 'MM\x00\x2a' or data[:4] == 'II\x2a\x00'
f = open("somefile.jpg", 'r')Of course this won't test that the whole file is valid. But it's easier to do that afterwards with an image library once the extension is correct.
data = f.read(11)
ext = ".jpg"
ext = ".jpg"
The magic numbers are documented in the specifications for the formats. You can also find some help for other formats in the source code of the
filecommand on Unix systems.
Update: I'm liking this so much that I ended up putting it in a separate file and making a convenience function for getting an extension like '.jpg'. Grab the Python file here. I also added support for GIF. Here's another easy reference for magic file numbers.
Second update: I've updated the code, there was a bug detecting JPEGs from certain digital cameras that put Exif data in the first segment. Suffice to check the two first bytes of the JPEG, then the problem does not occur.