linux-ext4 - Re: [PATCH V2] jbd2: use rhashtable for revoke records during replay

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20241108161118.GA42603@mit.edu>
Date: Fri, 8 Nov 2024 11:11:18 -0500
From: "Theodore Ts'o" <tytso@....edu>
To: Jan Kara <jack@...e.cz>
Cc: Li Dongyang <dongyangli@....com>, linux-ext4@...r.kernel.org,
        Andreas Dilger <adilger@...ger.ca>,
        Alex Zhuravlev <bzzz@...mcloud.com>
Subject: Re: [PATCH V2] jbd2: use rhashtable for revoke records during replay

On Fri, Nov 08, 2024 at 11:33:58AM +0100, Jan Kara wrote:
> > 1048576 records - 95 seconds
> > 2097152 records - 580 seconds
> 
> These are really high numbers of revoke records. Deleting couple GB of
> metadata doesn't happen so easily. Are they from a real workload or just
> a stress test?

For context, the background of this is that this has been an
out-of-tree that's been around for a very long time, for use with
Lustre servers where apparently, this very large number of revoke
records is a real thing.

> If my interpretation is correct, then rhashtable is unnecessarily
> huge hammer for this. Firstly, as the big hash is needed only during
> replay, there's no concurrent access to the data
> structure. Secondly, we just fill the data structure in the
> PASS_REVOKE scan and then use it. Thirdly, we know the number of
> elements we need to store in the table in advance (well, currently
> we don't but it's trivial to modify PASS_SCAN to get that number).
> 
> So rather than playing with rhashtable, I'd modify PASS_SCAN to sum
> up number of revoke records we're going to process and then prepare
> a static hash of appropriate size for replay (we can just use the
> standard hashing fs/jbd2/revoke.c uses, just with differently sized
> hash table allocated for replay and point journal->j_revoke to
> it). And once recovery completes jbd2_journal_clear_revoke() can
> free the table and point journal->j_revoke back to the original
> table. What do you think?

Hmm, that's a really nice idea; Andreas, what do you think?

     	      	     	  		 - Ted