QED Image File Format Specification

The file format looks like this:

+----------+----------+----------+-----+
| cluster0 | cluster1 | cluster2 | ... |
+----------+----------+----------+-----+

The first cluster begins with the header. The header contains information about where regular clusters start; this allows the header to be extensible and store extra information about the image file. A regular cluster may be a data cluster, an L2, or an L1 table. L1 and L2 tables are composed of one or more contiguous clusters.

Normally the file size will be a multiple of the cluster size. If the file size is not a multiple, extra information after the last cluster may not be preserved if data is written. Legitimate extra information should use space between the header and the first regular cluster.

All fields are little-endian.

Tables

Tables provide the translation from logical offsets in the block device to cluster offsets in the file.

#define TABLE_NOFFSETS (table_size * cluster_size / sizeof(uint64_t))

Table {
    uint64_t offsets[TABLE_NOFFSETS];
}

The tables are organized as follows:

                   +----------+
                   | L1 table |
                   +----------+
              ,------'  |  '------.
         +----------+   |    +----------+
         | L2 table |  ...   | L2 table |
         +----------+        +----------+
     ,------'  |  '------.
+----------+   |    +----------+
|   Data   |  ...   |   Data   |
+----------+        +----------+

A table is made up of one or more contiguous clusters. The table_size header field determines table size for an image file. For example, cluster_size=64 KB and table_size=4 results in 256 KB tables.

The logical image size must be less than or equal to the maximum possible size of clusters rooted by the L1 table:

header.image_size <= TABLE_NOFFSETS * TABLE_NOFFSETS * header.cluster_size

L1, L2, and data cluster offsets must be aligned to header.cluster_size. The following offsets have special meanings:

L2 table offsets

  • 0 - unallocated. The L2 table is not yet allocated.

Data cluster offsets

  • 0 - unallocated. The data cluster is not yet allocated.

  • 1 - zero. The data cluster contents are all zeroes and no cluster is allocated.

Future format extensions may wish to store per-offset information. The least significant 12 bits of an offset are reserved for this purpose and must be set to zero. Image files with cluster_size > 2^12 will have more unused bits which should also be zeroed.

Unallocated L2 tables and data clusters

Reads to an unallocated area of the image file access the backing file. If there is no backing file, then zeroes are produced. The backing file may be smaller than the image file and reads of unallocated areas beyond the end of the backing file produce zeroes.

Writes to an unallocated area cause a new data clusters to be allocated, and a new L2 table if that is also unallocated. The new data cluster is populated with data from the backing file (or zeroes if no backing file) and the data being written.

Zero data clusters

Zero data clusters are a space-efficient way of storing zeroed regions of the image.

Reads to a zero data cluster produce zeroes.

Note

The difference between an unallocated and a zero data cluster is that zero data clusters stop the reading of contents from the backing file.

Writes to a zero data cluster cause a new data cluster to be allocated. The new data cluster is populated with zeroes and the data being written.

Logical offset translation

Logical offsets are translated into cluster offsets as follows:

 table_bits table_bits    cluster_bits
 <--------> <--------> <--------------->
+----------+----------+-----------------+
| L1 index | L2 index |     byte offset |
+----------+----------+-----------------+

      Structure of a logical offset

offset_mask = ~(cluster_size - 1) # mask for the image file byte offset

def logical_to_cluster_offset(l1_index, l2_index, byte_offset):
  l2_offset = l1_table[l1_index]
  l2_table = load_table(l2_offset)
  cluster_offset = l2_table[l2_index] & offset_mask
  return cluster_offset + byte_offset

Consistency checking

This section is informational and included to provide background on the use of the QED_F_NEED_CHECK features bit.

The QED_F_NEED_CHECK bit is used to mark an image as dirty before starting an operation that could leave the image in an inconsistent state if interrupted by a crash or power failure. A dirty image must be checked on open because its metadata may not be consistent.

Consistency check includes the following invariants:

  • Each cluster is referenced once and only once. It is an inconsistency to have a cluster referenced more than once by L1 or L2 tables. A cluster has been leaked if it has no references.

  • Offsets must be within the image file size and must be cluster_size aligned.

  • Table offsets must at least table_size * cluster_size bytes from the end of the image file so that there is space for the entire table.

The consistency check process starts from l1_table_offset and scans all L2 tables. After the check completes with no other errors besides leaks, the QED_F_NEED_CHECK bit can be cleared and the image can be accessed.