Binary Markup Toolkit website

Welcome to the Binary Markup Toolkit website.

Binary Markup Toolkit is a suite of software tools for use in Digital Forensics. It was developed by James Clark whilst studying for a Master’s degree in Forensic Computing at the Cyber Technology Institute of De Montfort University, Leicester. The software is available free-of-charge to bona fide forensic practitioners and researchers. Extracts from the Master’s degree dissertation describing applicatons for the technology are also available.

James also maintains several related tools:

If you would like to use the software or find out more please use the contact form below.

What is Binary Markup Toolkit?

Binary Markup Toolkit (BMTK) processes data into Binary Markup Language (BML). It includes tools to:

The currently supported BMTK workflows are shown below:
BMTK workflow

What is Binary Markup Language?

BML is an XML-based language for describing the provenance of arbitrary binary data. It is human readable and can be authored by hand or generated automatically by software. It describes the location and size of fields within the underlying data. It is data agnostic and can represent a filesystem like FAT or NTFS or an application file format file JPEG. Optionally, it can also describe hierarchical data relationships, field names, interpreted data values/types and descriptions.

BML is designed for forensic computing, protocol debugging, reverse engineering and similar scenarios. Typically, a single BML file is associated with a single binary file, forensic "image" or protocol "dump".

BML can be used like a "microscope" to dig deep into binary data and put artefacts into accurate context. Practitioners can use BML to aid investigations and share findings.

An example of BML, from a Windows shortcut file, is shown below:

Example BML syntax

Is BML the same as DFXML?

DFXML is another XML based language for use in digital forensics. It was created by Simson Garfinkel and uses specific XML elements to describe certain file system metadata, file locations and Windows Registry values. With some exceptions DFXML does not describe the actual location of binary data such as metadata.

BML takes a different, lower-level, approach and is designed to record the detailed content of arbitrary binary data. BML is therefore typically more verbose than DFXML. In some cases BML and DFXML may be complementary.

BMTK Technical Requirements

BMTK has the following technical requirements:

There is no functional difference between the 32-bit and 64-bit software and the 32-bit build may be used on most 64-bit systems. The 64-bit build may provide enhanced performance on 64-bit systems and is recommended where available. Please contact the author if you need to use BMTK on an operating system prior to Windows XP.

* BML can be verbose and a BML description of binary data can be several times the size of the original data. BML files >10GB are quite common and we suggest a typical system should have at least 100GB free storage for practical use with BMTK.

Annotated hexadecimal dump

BMTK can generate annotated hexadecimal dumps from binary data. The format of these may be very familiar to practitioners who have attended a certain popular UK forensic course. An example for a Windows shortcut (.LNK) file is below:

Binary Markup Toolkit Components

BMTK currently includes the following tools:

BMCONSOLE Process a "raw" image file, storage device or file system into a BML document
BML2CSV Convert a BML document to industry standard CSV format suitable for further processing using other tools
BML2DB Convert a BML document to a SQLite3 database suitable for direct querying or further processing using other tools
BML2DUMP Convert a BML database file (produced by BML2DB) to an annotated hexadecimal dump. This may be used to visualise the data descriptions provided by BML.

BMTK Agents (plug-ins)

BMTK is extensible and uses agents (or “plug-ins”) to provide format specific processing. For example, an agent may target the JPEG file format or a complete file system such as FAT. BMTK agents are recursive and data identified by one agent (for instance in a filesystem) can be automatically routed to the most appropriate format specific agent. The following agents are currently available:

MBRDiskAgentPartition table agent supporting MBR-style partitions (e.g. not GPT) and extended partitions. This agent allows complete disk images to be conveniently processed in a single session.
FATFSAgentFAT file system agent supporting Microsoft implementation of FAT12/16/32 and long file name extensions.
NTFSAgentNTFS file system agent supported Microsoft NTFS v1.2 (Windows NT v3.51) and later
MFTAgentNTFS Master File Table agent for detailed processing of NTFS MFT records.
INDXAgentNTFS Indx stream agent for detailed processing of non-resident NTFS indexes (e.g. typically directories)
UsnJrnlAgentNTFS agent for detailed processing of filesystem USN Change Journal ($UsnJrnl:$J). The complete source code for this agent is in the SDK.
WinShellLinkAgentWindows Shell Link (shortcut) agent

BMTK Software Development Kit

The BMTK Software Development Kit (SDK) describes how additional BMTK agents can be created to process new data formats. Agents are implemented using standard Windows DLLs. The interface is primarily designed for the C/C++ languages but is compatible with any language than can produce standard DLLs. If you would like to develop a new BMTK agent, or need help developing an agent, please use the contact form below.

BMTK Documentation

The BMTK Quick Start Guide explains how to install BMTK and perform basic BML operations. It includes a number of walkthroughs introducing the software. These include:

BMTK Quick Start GuideBinary Markup Toolkit (BMTK) Quick Start Guide

BMTK Research Applications

Two chapters in the final Master’s degree report describe practical research applications for BMTK. The chapters and associated appendices are available from the links below:

Chapter 7 This chapter discusses how BML and the BMTK software can be used to investigate and explain the behaviour of timestamps in the Microsoft NTFS filesystem.

This work clarifies a long-standing uncertainty in the digital forensic literature and also illustrates how a similar approach could be used to investigate other filesystem and file format behaviour using human-readable BML data description.
Chapter 8 This chapter takes this concept further and explores how BML and BMTK can be used to investigate other forensic artefacts generated by typical user activity on a computer running Microsoft Windows.

An experiment was conducted to simulate a realistic forensic case scenario and then BML and BTMK are used to develop a timeline of past user actions. The scenario incorporated several distinct elements and illustrated how BML data descriptions can be used to establish links between them. The detailed analysis procedure and findings are presented in a form similar to a forensic practitioner log.

This experiment identified several features of the NTFS USN Change Journal and Windows Shell Link (shortcut) files that can be used to investigate historic activity using filesystem metadata alone. The discussion highlights these and also describes suggestions for further work that could be carried out.

Who can use BMTK? What's the catch?

BMTK is available free of free-of-charge to bona fide forensic practitioners working in law enforcement, academia or similar. You are free to use the software for any purpose. I would welcome bug reports or suggestions for new features. Please tell me if you think the software is useful or useless. I won’t be offended!

Download BMTK / Ask Question

If you would like to use BMTK, report a bug, make a suggestion or just ask a question please use the form below:

Organisation Email: