

Information that is critical for security applications. However, theįormats reverse engineered by previous tools have missed important Recent work has established the importance of automatic reverseĮngineering of protocol or file format specifications. Tupni: Automatic Reverse Engineering of Input Formats ( ACM digital library) Tupni to my knowledge not directly available out of Microsoft Research, but there is a paper about this tool which can be of interest to someone wanting to write a similar program (perhaps open source): There are some good tools for that (I think that Hex Workshop has such a tool). Try to find as many strings as possible, try different encodings (c strings, pascal strings, utf8/16, etc.). Write some functions that will search for repeating or very similar parts in the data, this way you can easily spot headers. Try to convert parts of the binary into 2 or 4 byte integers or into floats, print them and see if they make sence. Scattered zeros may mean integer values or Unicode strings and so on. Random data, for example, will tell you that this part is probably compressed/encrypted. For example:ĭo some statistical analysis on various parts. Then you can write scripts that will take your binary and check various things. You can write a simple framework to deal with binary streams and some simple algorithms. From my experience, interactive scripting languages (I use Python) can be a great help.
