lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20221208091523.t6ka6tqtclcxnsrp@quack3>
Date:   Thu, 8 Dec 2022 10:15:23 +0100
From:   Jan Kara <jack@...e.cz>
To:     Theodore Ts'o <tytso@....edu>
Cc:     Thorsten Leemhuis <regressions@...mhuis.info>,
        Andreas Dilger <adilger.kernel@...ger.ca>,
        Jan Kara <jack@...e.cz>, linux-ext4@...r.kernel.org,
        stable@...r.kernel.org, Thilo Fromm <t-lo@...ux.microsoft.com>,
        Jeremi Piotrowski <jpiotrowski@...ux.microsoft.com>,
        Andreas Gruenbacher <agruenba@...hat.com>
Subject: Re: [PATCH] ext4: Fix deadlock due to mbcache entry corruption

Hi Ted!

On Thu 08-12-22 00:55:55, Theodore Ts'o wrote:
> One thing which is completely unclear to me is how this relates to the
> claimed regression.  I understand that Jeremi and Thilo have asserted
> that the hang goes away if a backport commit 51ae846cff5 ("ext4: fix
> warning in ext4_iomap_begin as race between bmap and write") is not in
> their 5.15 product tree.

IMHO the bisection was flawed and somehow the test of a revert (which guys
claimed to have done) must have been lucky and didn't trip the race. This
is not that hard to imagine because firstly, the commit 51ae846cff5 got
included in the same stable kernel release as ext4 xattr changes
(65f8b80053 ("ext4: fix race when reusing xattr blocks") and related
mbcache changes) which likely made the race more likely. Secondly, the
mbcache state corruption is not that easy to hit because you need an
interaction of slab reclaim on mbcache entry with ext4 xattr code adding
reference to xattr block and just hitting the reference limit.

> However, the stack traces point to a problem in the extended attribute
> code, which has nothing to do with ext4_bmap(), and commit 51ae846cff5
> only changes the ext4's bmap function --- which these days gets used
> for the FIBMAP ioctl and very little else.
> 
> Furthermore, the fix which Jan provided, and which apparently fixes
> the user's problem, (a) doesn't touch the ext4_bmap function, and (b)
> has a fixes tag for the patch:
> 
>     Fixes: 6048c64b2609 ("mbcache: add reusable flag to cache entries")
> 
> ... which is a commit which dates back to 2016, and the v4.6 kernel.  ?!?

Yes. AFAICT the bitfield race in mbcache was introduced in this commit but
somehow ext4 was using mbcache in a way that wasn't tripping the race.
After 65f8b80053 ("ext4: fix race when reusing xattr blocks"), the race
became much more likely and users started to notice...

								Honza
-- 
Jan Kara <jack@...e.com>
SUSE Labs, CR

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ