lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240913-ortsausgang-baustart-1dae9a18254d@brauner>
Date: Fri, 13 Sep 2024 14:11:22 +0200
From: Christian Brauner <brauner@...nel.org>
To: Linus Torvalds <torvalds@...ux-foundation.org>, 
	Pankaj Raghav <p.raghav@...sung.com>, Luis Chamberlain <mcgrof@...nel.org>
Cc: Jens Axboe <axboe@...nel.dk>, Matthew Wilcox <willy@...radead.org>, 
	Christian Theune <ct@...ingcircus.io>, linux-mm@...ck.org, 
	"linux-xfs@...r.kernel.org" <linux-xfs@...r.kernel.org>, linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org, 
	Daniel Dao <dqminh@...udflare.com>, Dave Chinner <david@...morbit.com>, clm@...a.com, 
	regressions@...ts.linux.dev, regressions@...mhuis.info
Subject: Re: Known and unfixed active data loss bug in MM + XFS with large
 folios since Dec 2021 (any kernel from 6.1 upwards)

On Thu, Sep 12, 2024 at 03:25:50PM GMT, Linus Torvalds wrote:
> On Thu, 12 Sept 2024 at 15:12, Jens Axboe <axboe@...nel.dk> wrote:
> >
> > When I saw Christian's report, I seemed to recall that we ran into this
> > at Meta too. And we did, and hence have been reverting it since our 5.19
> > release (and hence 6.4, 6.9, and 6.11 next). We should not be shipping
> > things that are known broken.
> 
> I do think that if we have big sites just reverting it as known broken
> and can't figure out why, we should do so upstream too.
> 
> Yes,  it's going to make it even harder to figure out what's wrong.
> Not great. But if this causes filesystem corruption, that sure isn't
> great either. And people end up going "I'll use ext4 which doesn't
> have the problem", that's not exactly helpful either.
> 
> And yeah, the reason ext4 doesn't have the problem is simply because
> ext4 doesn't enable large folios. So that doesn't pin anything down
> either (ie it does *not* say "this is an xfs bug" - it obviously might
> be, but it's probably more likely some large-folio issue).
> 
> Other filesystems do enable large folios (afs, bcachefs, erofs, nfs,
> smb), but maybe just not be used under the kind of load to show it.
> 
> Honestly, the fact that it hasn't been reverted after apparently
> people knowing about it for months is a bit shocking to me. Filesystem
> people tend to take unknown corruption issues as a big deal. What
> makes this so special? Is it because the XFS people don't consider it
> an XFS issue, so...

So this issue it new to me as well. One of the items this cycle is the
work to enable support for block sizes that are larger than page sizes
via the large block size (LBS) series that's been sitting in -next for a
long time. That work specifically targets xfs and builds on top of the
large folio support.

If the support for large folios is going to be reverted in xfs then I
see no point to merge the LBS work now. So I'm holding off on sending
that pull request until a decision is made (for xfs). As far as I
understand, supporting larger block sizes will not be meaningful without
large folio support.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ