1. Technology

How to Extract Data from an Image File in C#

And store it in a CSV

By

Single Hexagon with colored traingles surrounding it making a rectangle

How to extract data from a BMP file and generate a csv file from. This data was needed for Empire tutorial nine where a hexagonal graphic was created to help identify the hex over which the mouse cursor was located.

The hexagon was colored in one color and each of the four corner triangles surrounding it was given another color. Thus the file had five unique colors and as it's a paleted file format, the data in the file consists of values 0-4. It's this data that the extract utility is there to get at.

It often helps to state what the program will do before creating it. In this case, it's a command line utility that is passed the name of a .BMP file. It then reads the file, verifies that it is a .BMP file, then extracts the file dimensions (height and width), and the file data and generates a csv file which consists of multiple lines each with one number per pixel.

Although this is a quick and dirty utility that I may never use again, I have created this at least once before so I think it's worth putting in a little bit of effort to error check inputs.

BMP File Format

Wikipedia has a useful definition of a .BMP file with a couple of examples. As with many image file formats, it consists of a header which has the requisite dimensions and BMP id plus an offset to where the file data is stored.

Important BMP Header Data

These are in hex, ie base16 except where stated in other bases.

Offset, Purpose (Size)

0 ---- Should be B - 6610 (byte)
1 ---- Should be M - 7710 (byte)
...
0A --- Offset to Start of Data (Int)
12 --- Width in Pixels (Int)
16 --- Height in Pixels (Int)
...
offset ... data (lots of bytes)

This header data along with the real data has to be extracted and .NET provides at least two methods. I think it could be done with file streams but I chose the BinaryReader class to do it with.

I've structured the program as a call to ExtractParameters() which checks that input file names are valid. It can take either one input file name or an input file name followed by an output file name. If no extension is supplied then it appends .bmp to the input file name and .csv for the output. Invalid extensions are rejected with a display of the message by the ShowUsage() method call and immediate return.

ExtractParameters() uses the Path class methods to extract the extension and the rest of the filename. If no output file is specified the input file has its extension removed and .csv appended.

Using BinaryReader

Reading Binary files is easy enough with the BinaryReader class but we do want to check bytes at certain places. This is a one way read and there's no seek like with streams. However it's fast and works well.

It actually makes uses of another class File to do the read open. Another alternative would be to use File.ReadAllBytes() to read the entire file into an byte array and then process it via indexes. That would need slightly more memory.

The File class has some very useful methods for file processing, especially if you are working with text files, you don't always need to use StreamReader.

For this program there are various BinaryReader Read... methods to use. ReadByte, reads in one byte, ReadInt32 reads in four etc.

Processing a File

If ExtractParameters is correct and the specified input file exists then it calls ProcessFile(). After opening the file inside Using(), this reads the first two bytes and compares them against "BM" to ensure it really is a .BMP file.

Using the information from the Wikipedia article, the int at location 0xa is the offset to where the file pixels start. Before that though we skip over some bytes; ReadDouble skips four bytes, ReadInt32 skips four. By skipping I mean reading the data in but not storing it.

At location 0x12a the next int has the width of the file in pixels, followed by the height. The file pointer is now at 0x1a and we need to get to the offset, so just read in Offset- 0x1a bytes.

Finally we're positioned at the start of the file data. The property BaseStream.Length has the size of the whole file, so by subtracting the offset from that we know how many bytes must be read to get all of the data. This data is then passed to the OutputCsv method, along with the file width and height and the values output with commas separating them.

OutputCsv() uses a StreamWriter class to output the csv file. It loops through the rows building a new line with a StringBuilder object and then looping the byte data and a comma for each. Note this could have been done with a String but a StringBuilder is more efficient.

Why Using()?

Using( var ... is not just a neat way to create a variable, it ensures that that variable is disposed of as well. When it comes to reading or writing to files, this is a very good practice.

Error Trapping

I've gone possibly a bit over the top with error checking in this but it's no bad thing. Ideally there would be a suite of tests to test all the different input permutations. I use Try Catch on file opening- "Bolton's Rule 45. The number one reason for writing to csv failures is because someone has it open in Excel".

Finally the Output

Here's the above image converted to CSV and then I manually edited the commas out.

44444444444444222211111111111111
44444444444422222222111111111111
44444444442222222222221111111111
44444444222222222222222211111111
44444422222222222222222222111111
44444222222222222222222222221111
44422222222222222222222222222211
42222222222222222222222222222221
22222222222222222222222222222222
22222222222222222222222222222222
22222222222222222222222222222222
22222222222222222222222222222222
22222222222222222222222222222222
22222222222222222222222222222222
22222222222222222222222222222222
22222222222222222222222222222222
22222222222222222222222222222222
22222222222222222222222222222222
22222222222222222222222222222222
22222222222222222222222222222222
22222222222222222222222222222222
22222222222222222222222222222222
22222222222222222222222222222222
22222222222222222222222222222222
22222222222222222222222222222222
22222222222222222222222222222222
33222222222222222222222222222200
33332222222222222222222222220000
33333322222222222222222222200000
33333333222222222222222220000000
33333333322222222222222000000000
33333333333222222222200000000000
33333333333332222220000000000000
33333333333333322000000000000000
  1. About.com
  2. Technology
  3. C / C++ / C#
  4. C# / C Sharp
  5. Learn C Sharp
  6. How to Extract Data from an Image File in C#

©2014 About.com. All rights reserved.