CIS 111 Binary File I/O

Objectives

  • Write data to a binary file
  • Read data from a binary file
  • Use the struct library in Python to work with binary data

Opening and closing a binary file

  • Opening a binary file for read access: file = open("filename", "rb")
  • Opening a binary file for write access: file = open("filename", "wb")
  • Opening a binary file for append write access: file = open("filename", "ab")
  • Opening a binary file for read and write access: file = open("filename", "rb+")
  • Closing a binary file referred to by variable file: file.close()

Reading and writing binary data

  • Binary data is often best written as a chunk of packed bytes, which are fields packed into a binary data structure.
  • Binary data is often best read as a chunk of packed bytes, which are then unpacked into individual fields/variables.
  • It helps if all the binary records (groupings of fields) are the same length. Then you can quickly find out how many records there are and seek to specific records.
  • Read in an entire file referred to by variable file: data = file.read()
  • Read in twenty bytes from file referred to by variable file: data = file.read(20)
  • Seek to byte 10 in the file referred to by variable file: file.seek(9)
  • Write binary data to a file referred to by variable file: file.write(data)
  • Creating a packed binary data structure: packedData = struct.pack(formatString, variableList)
  • The format string structure: first character represents byte order:
    • @ specifies native (default if no order specified)
    • = specifies native standard
    • < specifies little-endian
    • > specifies big-endian
    • ! specifies network order (big-endian)
  • The rest of the string specifies the data fields:
    • Note: There are many more types than this, but this list contains the ones we will use.
    • i specifies an integer (4 bytes)
    • h specifies a short integer (2 bytes)
    • f specifies a floating point number (4 bytes)
    • d specifies a floating point number (8 bytes)
    • c specifies a character (1 byte)
    • ns specifies a series of characters (bytes) of length n
  • Strings should be encoded to UTF-8 before packing: s = s.encode('utf-8')
  • Strings should be decoded after unpacking: s = s.decode()

Binary file read and write example

The outline given in the previous section is a little confusing and does not cover every detail, but the following example should help fill in some details and explain how binary files can be read and written. This example should be sufficient for the binary file work you need to do on your assignments.

# Import library needed to use binary records import struct # Create some data to be written to file names = ["Dan Rowan", "Dick Martin", "Goldie Hawn", "Teresa Graves", "Larry Hovis", "Gary Owens"] nums = [27, 68, 12, -4, 99, 22] rates = [12.34, 56.78, 90.12, 34.56, 78.90, 55.55] # Set filename and display a blank line filename = "bintest.bin" print() # Open file for binary writing file = open(filename, "wb") # For each name in the names list, write out a name, num, and rate for i in range(len(names)): # First, pack the data into a binary data structure (a record) # > specifies the byte order is big-endian # if30s specifies an integer, a floating point number, and a 30 character string # The fields that follow must match in data type and order # Note that the string is encoded as UTF-8 packedData = struct.pack(">if30s", nums[i], rates[i], names[i].encode("utf-8")) # Write the binary record to the file file.write(packedData) # Done writing, so close the file file.close() # Open file for binary reading file = open(filename, "rb") # recNum is used to keep track of the number of the record just read recNum = 0 # Read in one binary record (priming read) # 4 + 4 + 30 is the length in bytes of the data in the record packedData = file.read(4 + 4 + 30) # If some bytes were read, then there was a record available while len(packedData) > 0: # Read was successful, so increase record count recNum += 1 # Unpack the binary data into a tuple # > specifies the byte order is big-endian # if30s specifies an integer, a floating point number, and a 30 character string fields = struct.unpack(">if30s", packedData) # Move the members of the tuple into variables with better names num = fields[0] rate = fields[1] # The string should be decoded # The string was padded with null characters to get to 30 bytes # .strip('\0') will get rid of the null characters used for padding name = fields[2].decode().strip('\0') # Display the fields on the screen # :3d specifies an integer in a column three wide # :>20s specifies a right justified string in a column twenty wide # :4d specifies an integer in a column four wide # :6.2f specifies a floating point in a column six wide with two digits after the decimal # The arguments to format must match the data types and order specified print("{:3d}. {:>20s} {:4d} {:6.2f} ".format(recNum, name, num, rate)) # Read in the next record (end of priming read) packedData = file.read(4 + 4 + 30) # Close the file. The program is done. file.close()