lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 25 May 2021 11:22:05 +0200
From:   Jan Kara <jack@...e.cz>
To:     "Theodore Y. Ts'o" <tytso@....edu>
Cc:     Jan Kara <jack@...e.cz>,
        Xing Zhengjun <zhengjun.xing@...ux.intel.com>,
        kernel test robot <oliver.sang@...el.com>,
        LKML <linux-kernel@...r.kernel.org>, lkp@...ts.01.org,
        lkp@...el.com
Subject: Re: [LKP] [ext4] 05c2c00f37: aim7.jobs-per-min -11.8% regression

On Fri 21-05-21 12:42:16, Theodore Y. Ts'o wrote:
> On Fri, May 21, 2021 at 11:27:30AM +0200, Jan Kara wrote:
> > 
> > OK, thanks for testing. So the orphan code is indeed the likely cause of
> > this regression but I probably did not guess correctly what is the
> > contention point there. Then I guess I need to reproduce and do more
> > digging why the contention happens...
> 
> Hmm... what if we only recalculate the superblock checksum when we do
> a commit, via the callback function from the jbd2 layer to file
> system?

I actually have to check whether the regression is there because of the
additional locking of the buffer_head (because that's the only thing that
was added to that code in fact, adding some atomic instructions, bouncing
another cacheline) or because of the checksum computation that moved from
ext4_handle_dirty_super() closer to actual superblock update under those
locks.

If the problem is indeed just the checksum computation under all those
locks, we can move that to transaction commit time (using the t_frozen
trigger - ocfs2 uses that for all metadata checksumming). But then we have
to be very careful with unjournaled sb updates that can be running in
parallel with the journaled ones because once you drop buffer lock, sb can
get clobbered and checksum invalidated. Also there's the question what to
do with nojournal mode - probably we would have to keep separate set of
places recomputing checksums just for nojournal mode which is quite error
prone but if it's just for sb, I guess it's manageable.

								Honza

-- 
Jan Kara <jack@...e.com>
SUSE Labs, CR

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ