Posts Tagged ‘ binary

More binary data parsing

Way back when I was trying to read binary data, I came across ‘struct’ in an initial search. I ended up going with ‘bitstring’ instead, because I thought I was going to have to if I wanted to read individual bits of data.  But David from USGS told me last week that he uses Struct to read at the byte level, then parses out what he needs to afterwards.  He showed me some of his code, and it really looks a lot more efficient than what I was doing with bitstring – something that would have taken me like 20 lines of code only took him 1!  :-)

David gave me a sample file, and some tips to help me get started.  Here are the highlights:

Michelle, the classes you are looking for are in the recordframe.py
file. The basic pattern is this:

class RecordType

def __init__(self, data):

This method reads the data record portion of the data frame

All the binary data is stored in a string of bytes called data. If the
record is pre-defined, I just use the struct package to read the bytes
in all at once. Frequently, though, I had to read a “size” field
before I knew how large the record was that needed to be read, you can
see it gets a little more messy in that case. It also might have been
cleaner to pass in the file handle directly rather than the data bytes
themselves, then I wouldn’t need all the code for indexing into the
data array. This is what Eric’s matlab code does. Might try that way
later and see if it is cleaner.

def __str__(self):

This method allows you to print RecordTypes and it prints out whatever
__str__ returns. Just overrides Python’s default printing for an
object. You could just as easily done something like a printString()
method, but this way makes code cleaner later on.

And later….

One thing about struct, there are two ways to use it. If the struct
isn’t going to change you should make it a class and use it like this

s = struct.Struct(“format string”)

for i in 100000000:
fields = s.unpack(data)

This pre-compiles the format and all the work is done in efficient C
code. Unfortunately, this won’t work for reson data because the
structs have variable sizes that you need to figure out before you
read the data. In that case you do it like this:

for i in 100000000:
fmt = ”
fields = struct.unpack(fmt, data)

notice that the struct module now takes the format string along with
the data on each and every iteration through the loop. This is much
slower. But if you need to change the fmt string on each trip through
the loop you don’t have a choice. The re module for regular expression
parsing has the same options: a pre-compiled and fast version and a
flexible slow version.

Read my first binary file!

Wow.  I am shocked that this worked on the first try.  I created a script to read some of the header information from an S7K file.  One thing that got me stuck for a bit was the fact that the byte order used is little-endian.  Turns out this is no big deal.  I just had to end things with ‘le’, for example s.uintle instead of s.uint.

So here’s my first ever script to read binary data:

#!usr/bin/env Python

# Purpose: to read the header of an s7k file
# Creator: Michelle

from bitstring import *

s = BitString(filename='sample.s7k') # Read in the file sample.s7k, and create a BitString object from it
print s.pos #Print the position in the file.  For fun.

protocolversion = s.readbytes(2).uintle
offset = s.readbytes(2).uintle
syncpat = s.readbytes(4).uintle
size = s.readbytes(4).uintle
print size

odof = s.readbytes(4).uintle
odid = s.readbytes(4).uintle

year = s.readbytes(2).uintle
day = s.readbytes(2).uintle
seconds = s.readbytes(4).floatle
hours = s.readbytes(1).uintle
minutes = s.readbytes(1).uintle

print year,day,seconds,hours,minutes

And the output of my first ever binary-data-reading script:

0
396
2009 353 55.1300010681 17 53

Installing BitString

I’m installing BitString right now. It seems like what I need. The first sentence on the BitString Google Code website is: “bitstring is a pure Python module designed to help make the creation, manipulation and analysis of binary data as simple and natural as possible.” Perfect. The first thing I did was download the manual (pdf) and the zipped installation file for my version of Python (2.6).

Unzipping zipped files in Ubuntu is really easy – just right click on the file in the File Browser, and choose to Unzip using Archive Manager.  But I wanted to know how to do it using the command line.  Here is an example:

unzip bitstring-1.1.3.zip

I installed the contents of the file into the appropriate Python Module locations on my computer by going to the directory where the unzipped files were, and typing the following into the command line:

sudo python setup.py install

Now I’m ready to dig in!  Luckily, a friend of mine at work helped me out with some pointers on how to read in the s7k binary file.  He explained the commands he’d used when creating his Matlab scripts, and how the structure was defined.  He’s not a Python user, so couldn’t give me any specific Python tips, but it was enough for me to get started.  So now I have to figure out the BitString part.  The reason I’m not going with struct or array is because it seems that these are meant to work with whole bytes and are clunky when it comes to parsing out individual bits. BitString is designed with more flexibility.

Fortunately BigString seems pretty straightforward.  I had a very quick look through the manual, and if I understand correctly, I will start by converting the s7k file to a BitString object, then just read through it bit by bit (or byte by byte).

New project: use Python to read multibeam data

Am I getting in over my head before I’ve learned the basics?  Most likely.  But I find that if I set my goals high, I learn lots of unexpected things along the way.  This latest goal is probably not going to be completed from start to finish in any kind of linear fashion, and I will probably drop it and come back to it several times before its completion.

I’m hoping to figure out how to read S7K multibeam data format.  Not a simple challenge for someone like me who can barely piece together a print statement.  This isn’t like reading an ASCII text file.  I started reading the Data Format Definition document (DFD), and came upon some pretty daunting tasks right away, including reading headers, different number formats, and lots of bits and bytes stuff.  Scary, but sort of exciting.  When I took a computer science class way back in undergrad, I remember learning all the really basic stuff, but since I didn’t have a real application for it, it was sort of meaningless to me, and therefore did not stick in my brain.  But now it’s fun!  (I’m a nerd).  Hopefully it’ll stick this time :-)

These links might be helpful:

Reading and Writing data using Python’s input and output functionality

Understanding Big and Little Endian

The Learning Python book says (p. 901): “If you are processing image files, packed data created by other programs whose content you must extract, or some device data streams, chances are good that you will want to deal with it using bytes and binary-mode files.”