lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 4 Dec 2019 11:03:18 -0700
From:   Andreas Dilger <>
To:     Daniel Phillips <>
Cc:     "Theodore Y. Ts'o" <>,,,,
        OGAWA Hirofumi <>
Subject: Re: [RFC] Thing 1: Shardmap fox Ext4

On Dec 1, 2019, at 6:45 PM, Daniel Phillips <> wrote:
> On 2019-11-27 6:25 a.m., Theodore Y. Ts'o wrote:
>> (3) It's not particularly well documented...
> We regard that as an issue needing attention. Here is a pretty picture
> to get started:

The shardmap diagram is good conceptually, but it would be useful
to add a legend on the empty side of the diagram that shows the on-disk

> This needs some explaining. The bottom part of the directory file is
> a simple linear range of directory blocks, with a freespace map block
> appearing once every 4K blocks or so. This freespace mapping needs a
> post of its own, it is somewhat subtle. This will be a couple of posts
> in the future.
> The Shardmap index appears at a higher logical address, sufficiently
> far above the directory base to accommodate a reasonable number of
> record entry blocks below it. We try not to place the index at so high
> an address that the radix tree gets extra levels, slowing everything
> down.
> When the index needs to be expanded, either because some shard exceeded
> a threshold number of entries, or the record entry blocks ran into the
> the bottom of the index, then a new index tier with more shards is
> created at a higher logical address. The lower index tier is not copied
> immediately to the upper tier, but rather, each shard is incrementally
> split when it hits the threshold because of an insert. This bounds the
> latency of any given insert to the time needed to split one shard, which
> we target nominally at less than one millisecond. Thus, Shardmap takes a
> modest step in the direction of real time response.
> Each index tier is just a simple array of shards, each of which fills
> up with 8 byte entries from bottom to top. The count of entries in each
> shard is stored separately in a table just below the shard array. So at
> shard load time, we can determine rapidly from the count table which
> tier a given shard belongs to. There are other advantages to breaking
> the shard counts out separately having to do with the persistent memory
> version of Shardmap, interesting details that I will leave for later.
> When all lower tier shards have been deleted, the lower tier may be
> overwritten by the expanding record entry block region. In practice,
> a Shardmap file normally has just one tier most of the time, the other
> tier existing only long enough to complete the incremental expansion
> of the shard table, insert by insert.
> There is a small header in the lowest record entry block, giving the
> positions of the one or two index tiers, count of entry blocks, and
> various tuning parameters such as maximum shard size and average depth
> of cache hash collision lists.
> That is it for media format. Very simple, is it not? My next post
> will explain the Shardmap directory block format, with a focus on
> deficiencies of the traditional Ext2 format that were addressed.
> Regards,
> Daniel

Cheers, Andreas

Download attachment "signature.asc" of type "application/pgp-signature" (874 bytes)

Powered by blists - more mailing lists