123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103 |
- The PDB Serialized Hash Table Format
- ====================================
- .. contents::
- :local:
- .. _hash_intro:
- Introduction
- ============
- One of the design goals of the PDB format is to provide accelerated access to
- debug information, and for this reason there are several occasions where hash
- tables are serialized and embedded directly to the file, rather than requiring
- a consumer to read a list of values and reconstruct the hash table on the fly.
- The serialization format supports hash tables of arbitrarily large size and
- capacity, as well as value types and hash functions. The only supported key
- value type is a uint32. The only requirement is that the producer and consumer
- agree on the hash function. As such, the hash function can is not discussed
- further in this document, it is assumed that for a particular instance of a PDB
- file hash table, the appropriate hash function is being used.
- On-Disk Format
- ==============
- .. code-block:: none
- .--------------------.-- +0
- | Size |
- .--------------------.-- +4
- | Capacity |
- .--------------------.-- +8
- | Present Bit Vector |
- .--------------------.-- +N
- | Deleted Bit Vector |
- .--------------------.-- +M ─╮
- | Key | │
- .--------------------.-- +M+4 │
- | Value | │
- .--------------------.-- +M+4+sizeof(Value) │
- ... ├─ |Capacity| Bucket entries
- .--------------------. │
- | Key | │
- .--------------------. │
- | Value | │
- .--------------------. ─╯
- - **Size** - The number of values contained in the hash table.
- - **Capacity** - The number of buckets in the hash table. Producers should
- maintain a load factor of no greater than ``2/3*Capacity+1``.
- - **Present Bit Vector** - A serialized bit vector which contains information
- about which buckets have valid values. If the bucket has a value, the
- corresponding bit will be set, and if the bucket doesn't have a value (either
- because the bucket is empty or because the value is a tombstone value) the bit
- will be unset.
- - **Deleted Bit Vector** - A serialized bit vector which contains information
- about which buckets have tombstone values. If the entry in this bucket is
- deleted, the bit will be set, otherwise it will be unset.
- - **Keys and Values** - A list of ``Capacity`` hash buckets, where the first
- entry is the key (always a uint32), and the second entry is the value. The
- state of each bucket (valid, empty, deleted) can be determined by examining
- the present and deleted bit vectors.
- .. _hash_bit_vectors:
- Present and Deleted Bit Vectors
- ===============================
- The bit vectors indicating the status of each bucket are serialized as follows:
- .. code-block:: none
- .--------------------.-- +0
- | Word Count |
- .--------------------.-- +4
- | Word_0 | ─╮
- .--------------------.-- +8 │
- | Word_1 | │
- .--------------------.-- +12 ├─ |Word Count| values
- ... │
- .--------------------. │
- | Word_N | │
- .--------------------. ─╯
- The words, when viewed as a contiguous block of bytes, represent a bit vector
- with the following layout:
- .. code-block:: none
- .------------. .------------.------------.
- | Word_N | ... | Word_1 | Word_0 |
- .------------. .------------.------------.
- | | | | |
- +N*32 +(N-1)*32 +64 +32 +0
- where the k'th bit of this bit vector represents the status of the k'th bucket
- in the hash table.
|