The Log Structured Store
The log structured store (LSS) exports an abstraction of a record-oriented repository. Records can be variable sized and indexed by record number (remember VSAM anyone?). The LSS also supports compression of the record payload.
In this description of the LSS, we will often talk about the
application level. That refers to the
abstraction exported by the LSS module, and in this context the
application is usually the pstore. However,
because LSS is defined as it's own module,
, user programs can be built
directly on top of the LSS.
An LSS volume consists of a sequence of tagged LSS Records (not to be
confused with the record abstraction exported by LSS). On disk, each
record starts with a record header. The header corresponds to the
lssv3.h and is comprised of four words (native
byte order?) and describes the contents of the record.
See Figure 10.
The magic indicates what type of LSS Record is present, and is one of
[Link to subsequent sect2 on details]
The recnum is used in different ways depending on the record type. For DATA records, it is the record number that is exposed to the client application (i.e., the pstore).
An addressable offset in an LSS volume is a multiple of 16 bytes. A pointer to an LSS record is a 32-bit quantity. The top 4 bits denote the volume number (although an LSS repository may contain only 10 volumes, 0-9) and the low 28 bits specify the granule within the volume (each granule is 16 bytes). Hence, each volume may be up to 4GB in size and the total addressable disk space in LSS is 40GB.
A data segment record summarizes the records written since the previous data segment or EOF record. The length field is the total number of bytes in the data segment including the DSEG record.
record was originally
designed to support traversing the LSS in the backwards direction, in
case a commit record was not found at the EOF. This works because a
record is inserted just before a DSEG to ensure
appears in the last granule of a
512-byte IO block. Hence, in the backwards direction, you see a
records. However, this is not currently done because it was found to
be unreliable (probably because since a
multiple IO blocks, we could not be sure that the last IO block would
reliably contain a
. It could be the middle of
some payload that happens to look like a
course, that could be true for
A data record stores the payload of an application-level record (e.g., a heap page from the pstore or an indirect page descriptor). The low 4 bits of the space word identifies the compression algorithm used to compress the payload. The high 28 bits is the number of granules occupied by the data record, not counting the record header (which, note, is 1 granule in size).
record is used to provide padding so that
records can appear
in the last granule of an IO block.
record contains the path to
other volumes. Although a
to describe the volume in which it appears, it is not used.
record contains the string name
of a compression algorithm.
record represents a commit point
and stores all the information about the commit, including
a pointer to the master index record (
which is used to map application-level record numbers to
LSS pointers. See Section 12.3 for
record marks the end of the file. An
record always appears at the end of an IO
block that also contains a commit record (
The presence of an
record at the
end of the last IO block in a volume is how the LSS quickly finds the
most recent commit record in a volume. If the EOF record is not
present, then the fallback is to scan forward from the last known
commit record (as recorded in the volume header,
) until a new commit record is found.
record is a bucket in the hash
table which maps (application level) record numbers to
LSS record pointers. These records are loaded lazily into
structures in memory,
where it is called a cache line.
record contains the contents
of a hash table pointing to
record appears only at the beginning
of a volume file, and describes the volume as a whole.
The payload of the
record corresponds to the
structure, and is
mostly constant except for the
are updated occasionally to point to a recent commit record.
All the goodies... Metadata (generation #, version#, timestamp), bookmarks, diff list, midx pointer, prev CR ptr, VH fuel
How we find a commit record... how we find... (1) the last, if no gen# is specified, (2) a bookmarked one, (3) some other one (if gen# is specified), (4) one if EOF is messed up, (5) if there are multiple volumes
The LSS supports a concept of bookmarking particular checkpoints. The commit record maintains direct pointers to each of the bookmarked commit records in addition to the previous commit record.
Bookmarks are intended to support being able to easily roll back to particular significant points in history. For example, bookmark #1 is used by the pstore's online garbage collector to represent the snapshot against which the persistent traversal is taking place (since it employs a snapshot-at-beginning strategy, and multiple commits may happen between when the garbage collection cycle starts and when it completes).
The payload of data records are compressed using one of
16 defined compression algorithms. The commit record contains
a table of 15 pointers to the compression algorithm descriptors,
records. Algorithm 0 is always
the null algorith which denotes no compression.
The persistent store stores three kinds of records in the underlying LSS.
Root. The root record, which is always record number 0 (zero), stores information about the pstore as a whole. The contents correspond to the structure
Indirect Page Data. Certain kinds of indirect pages can contain information used to elaborate the actual objects at runtime. These record numbers are in the range
Heap Page Data. The contents of each page of the persistent heap are stored in records. Each record corresponds to one page, starting at record
Currently, for example, symbols that are referenced by the store use indirect page data to store the string repr of the symbol. When a indirect page is referenced by a page that is being loaded,
The contents of the root record (record number 0)
correspond to the structure
and it basically just stores a pointer to the default, or "main",
Each page of data in the persistent heap is stored in it's own
record in the underlying LSS. Record numbers for heap pages
(actually, in older
versions of RScheme they started at
[XXX which versions]).
Unlike in Texas, heap page data is explicitly serialized into the record, instead of being written directly to the underlying storage (with only the pointer rewriting transform). The serialized data is in turn compressed by LSS before being written to the disk. (See Figure 13)
The serialized data is a stream of words (sent to the LRU high-level compression model), and looks like: