[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZcLB3pCSFRXRodzN@casper.infradead.org>
Date: Tue, 6 Feb 2024 23:33:50 +0000
From: Matthew Wilcox <willy@...radead.org>
To: Dave Chinner <david@...morbit.com>
Cc: JonasZhou-oc <JonasZhou-oc@...oxin.com>, viro@...iv.linux.org.uk,
brauner@...nel.org, jack@...e.cz, linux-fsdevel@...r.kernel.org,
linux-kernel@...r.kernel.org, CobeChen@...oxin.com,
LouisQi@...oxin.com, JonasZhou@...oxin.com
Subject: Re: [PATCH] fs/address_space: move i_mmap_rwsem to mitigate a false
sharing with i_mmap.
On Wed, Feb 07, 2024 at 08:35:59AM +1100, Dave Chinner wrote:
> > The solution to this problem is to change the interval tree data structure
> > from an Red-Black tree to a B-tree, or something similar where we use
> > an array of pointers instead of a single pointer.
>
> .... B-trees are not immune to pointer chasing problems, either.
> Most use binary searches within the nodes, and so we have the
> problem of unpredictable cacheline misses within the nodes as well
> as being unable to do depth based prefetch similar to rbtrees.
>
> Perhaps we should be looking at something like this:
>
> https://lore.kernel.org/linux-fsdevel/20240126220655.395093-2-kent.overstreet@linux.dev/
I need more data (and maybe Kent has it!)
Without any data, I believe that Eytzinger layout is an idea that was
a really good one in the 1990s/2000s and has now passed its expiration
date because of the changed nature of hardware.
In the mid-90s, we could do O(10) instructions in the time it took to
service a LLC miss. Today, it's more like O(2500). That means it is
far more important to be predictable than it is to execute the minimum
number of instructions. If your B-tree nodes are 256kB in size (I believe
that's what bacachefs is using?), then Eytzinger layout may make some
sense, but if you're using smaller nodes (which I further believe is
appropriate for an in-memory B-tree), then a straight 'load each index
and compare it' may outperform a search that jumps around inside a node.
I'm pontificating and will happily yield to someone who has data.
I've tried to mark my assumptions clearly above.
Something else I'm not aware of (and filesystem B-trees will not have
any experience of) is what research exists on efficiently adding new
entries to a balanced tree so as to minimise rebalances. Filesystems
are like the Maple Tree in that for every logical offset inside a file,
there is precisely one answer to "what physical block does this map to".
For the i_mmap tree, we want instead to answer the question "Which VMAs
have an intersection with this range of the file", and for the benchmark
in question, we will have a large number of entries that compare equal to
each other (they have the same start, and same end, but different values).
So they could be inserted at many different points in the tree. We'd like
to choose the point which causes the least amount of rebalancing.
Powered by blists - more mailing lists