[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20241108161118.GA42603@mit.edu>
Date: Fri, 8 Nov 2024 11:11:18 -0500
From: "Theodore Ts'o" <tytso@....edu>
To: Jan Kara <jack@...e.cz>
Cc: Li Dongyang <dongyangli@....com>, linux-ext4@...r.kernel.org,
Andreas Dilger <adilger@...ger.ca>,
Alex Zhuravlev <bzzz@...mcloud.com>
Subject: Re: [PATCH V2] jbd2: use rhashtable for revoke records during replay
On Fri, Nov 08, 2024 at 11:33:58AM +0100, Jan Kara wrote:
> > 1048576 records - 95 seconds
> > 2097152 records - 580 seconds
>
> These are really high numbers of revoke records. Deleting couple GB of
> metadata doesn't happen so easily. Are they from a real workload or just
> a stress test?
For context, the background of this is that this has been an
out-of-tree that's been around for a very long time, for use with
Lustre servers where apparently, this very large number of revoke
records is a real thing.
> If my interpretation is correct, then rhashtable is unnecessarily
> huge hammer for this. Firstly, as the big hash is needed only during
> replay, there's no concurrent access to the data
> structure. Secondly, we just fill the data structure in the
> PASS_REVOKE scan and then use it. Thirdly, we know the number of
> elements we need to store in the table in advance (well, currently
> we don't but it's trivial to modify PASS_SCAN to get that number).
>
> So rather than playing with rhashtable, I'd modify PASS_SCAN to sum
> up number of revoke records we're going to process and then prepare
> a static hash of appropriate size for replay (we can just use the
> standard hashing fs/jbd2/revoke.c uses, just with differently sized
> hash table allocated for replay and point journal->j_revoke to
> it). And once recovery completes jbd2_journal_clear_revoke() can
> free the table and point journal->j_revoke back to the original
> table. What do you think?
Hmm, that's a really nice idea; Andreas, what do you think?
- Ted
Powered by blists - more mailing lists