lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20221208152011.GA12315@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net>
Date:   Thu, 8 Dec 2022 07:20:11 -0800
From:   Jeremi Piotrowski <jpiotrowski@...ux.microsoft.com>
To:     Jan Kara <jack@...e.cz>
Cc:     Theodore Ts'o <tytso@....edu>,
        Thorsten Leemhuis <regressions@...mhuis.info>,
        Andreas Dilger <adilger.kernel@...ger.ca>,
        linux-ext4@...r.kernel.org, stable@...r.kernel.org,
        Thilo Fromm <t-lo@...ux.microsoft.com>,
        Andreas Gruenbacher <agruenba@...hat.com>
Subject: Re: [PATCH] ext4: Fix deadlock due to mbcache entry corruption

On Thu, Dec 08, 2022 at 10:15:23AM +0100, Jan Kara wrote:
> Hi Ted!
> 
> On Thu 08-12-22 00:55:55, Theodore Ts'o wrote:
> > One thing which is completely unclear to me is how this relates to the
> > claimed regression.  I understand that Jeremi and Thilo have asserted
> > that the hang goes away if a backport commit 51ae846cff5 ("ext4: fix
> > warning in ext4_iomap_begin as race between bmap and write") is not in
> > their 5.15 product tree.
> 
> IMHO the bisection was flawed and somehow the test of a revert (which guys
> claimed to have done) must have been lucky and didn't trip the race. This
> is not that hard to imagine because firstly, the commit 51ae846cff5 got
> included in the same stable kernel release as ext4 xattr changes
> (65f8b80053 ("ext4: fix race when reusing xattr blocks") and related
> mbcache changes) which likely made the race more likely. Secondly, the
> mbcache state corruption is not that easy to hit because you need an
> interaction of slab reclaim on mbcache entry with ext4 xattr code adding
> reference to xattr block and just hitting the reference limit.
> 

Yeah, sorry about that, there was never a bisect that led to 51ae846cff5, it
was just a guess and at that point we were unable to reproduce it ourselves so
we just had information from a user stating that when they revert that commit
in their own test build the issue doesn't occur.

Once we were able to personally reproduce the actual bisect led to 65f8b80053,
which as Honza stated made sure that the corruption/inconsistency leads to a
busy loop which is harder to miss.

> > However, the stack traces point to a problem in the extended attribute
> > code, which has nothing to do with ext4_bmap(), and commit 51ae846cff5
> > only changes the ext4's bmap function --- which these days gets used
> > for the FIBMAP ioctl and very little else.
> > 
> > Furthermore, the fix which Jan provided, and which apparently fixes
> > the user's problem, (a) doesn't touch the ext4_bmap function, and (b)
> > has a fixes tag for the patch:
> > 
> >     Fixes: 6048c64b2609 ("mbcache: add reusable flag to cache entries")
> > 
> > ... which is a commit which dates back to 2016, and the v4.6 kernel.  ?!?
> 
> Yes. AFAICT the bitfield race in mbcache was introduced in this commit but
> somehow ext4 was using mbcache in a way that wasn't tripping the race.
> After 65f8b80053 ("ext4: fix race when reusing xattr blocks"), the race
> became much more likely and users started to notice...
> 
> 								Honza
> -- 
> Jan Kara <jack@...e.com>
> SUSE Labs, CR

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ