linux-kernel - Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAHk-=wiAQ23ongRsJTdYhpQRn2YP-2-Z4_NkWiSJRyv6wf_dxg@mail.gmail.com>
Date: Tue, 24 Sep 2024 12:24:14 -0700
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Chris Mason <clm@...a.com>
Cc: Matthew Wilcox <willy@...radead.org>, Jens Axboe <axboe@...nel.dk>, 
	Dave Chinner <david@...morbit.com>, Christian Theune <ct@...ingcircus.io>, linux-mm@...ck.org, 
	"linux-xfs@...r.kernel.org" <linux-xfs@...r.kernel.org>, linux-fsdevel@...r.kernel.org, 
	linux-kernel@...r.kernel.org, Daniel Dao <dqminh@...udflare.com>, 
	regressions@...ts.linux.dev, regressions@...mhuis.info
Subject: Re: Known and unfixed active data loss bug in MM + XFS with large
 folios since Dec 2021 (any kernel from 6.1 upwards)

On Tue, 24 Sept 2024 at 12:18, Chris Mason <clm@...a.com> wrote:
>
> A few days of load later and some extra printks, it turns out that
> taking the writer lock in __filemap_add_folio() makes us dramatically
> more likely to just return EEXIST than go into the xas_split_alloc() dance.

.. and that sounds like a good thing, except for the test coverage, I guess.

Which you seem to have fixed:

> With the changes in 6.10, we only get into that xas_destroy() case above
> when the conflicting entry is a shadow entry, so I changed my repro to
> use memory pressure instead of fadvise.
>
> I also added a schedule_timeout(1) after the split alloc, and with all
> of that I'm able to consistently make the xas_destroy() case trigger
> without causing any system instability.  Kairui Song's patches do seem
> to have fixed things nicely.

<confused thumbs up / fingers crossed emoji>

              Linus