linux-kernel - Re: Known and unfixed active data loss bug in MM + XFS with large folios since Dec 2021 (any kernel from 6.1 upwards)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-Id: <48D7686C-6BD4-4439-B7FD-411530802161@flyingcircus.io>
Date: Mon, 2 Dec 2024 11:44:11 +0100
From: Christian Theune <ct@...ingcircus.io>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Chris Mason <clm@...a.com>,
 Dave Chinner <david@...morbit.com>,
 Matthew Wilcox <willy@...radead.org>,
 Jens Axboe <axboe@...nel.dk>,
 linux-mm@...ck.org,
 "linux-xfs@...r.kernel.org" <linux-xfs@...r.kernel.org>,
 linux-fsdevel@...r.kernel.org,
 linux-kernel@...r.kernel.org,
 Daniel Dao <dqminh@...udflare.com>,
 regressions@...ts.linux.dev,
 regressions@...mhuis.info
Subject: Re: Known and unfixed active data loss bug in MM + XFS with large
 folios since Dec 2021 (any kernel from 6.1 upwards)

Hi,

waking this thread up again: we’ve been running the original fix on top of 6.11 for roughly 8 weeks now and have not had a single occurence of this. I’d be willing to call this as fixed. 

@Linus: we didn’t specify an actual deadline, but I guess 8 week without any hit is good enough?

My plan would be to migrate our fleet to 6.6 now. AFAICT the relevant patch series is the one in
https://lore.kernel.org/all/20240415171857.19244-4-ryncsn@gmail.com/T/#u and was released in 6.6.54.

I’d like to revive the discussion on the second issue, though, as it ended with Linus’ last post
and I couldn’t find whether this may have been followed up elsewhere or still needs to be worked on?

Christian

> On 12. Oct 2024, at 19:01, Linus Torvalds <torvalds@...ux-foundation.org> wrote:
> 
> On Fri, 11 Oct 2024 at 06:06, Chris Mason <clm@...a.com> wrote:
>> 
>> - Linus's starvation observation.  It doesn't feel like there's enough
>> load to cause this, especially given us sitting in truncate, where it
>> should be pretty unlikely to have multiple procs banging on the page in
>> question.
> 
> Yeah, I think the starvation can only possibly happen in
> fdatasync-like paths where it's waiting for existing writeback without
> holding the page lock. And while Christian has had those backtraces
> too, the truncate path is not one of them.
> 
> That said, just because I wanted to see how nasty it is, I looked into
> changing the rules for folio_wake_bit().
> 
> Christian, just to clarify, this is not for  you to test - this is
> very experimental - but maybe Willy has comments on it.
> 
> Because it *might* be possible to do something like the attached,
> where we do the page flags changes atomically but without any locks if
> there are no waiters, but if there is a waiter on the page, we always
> clear the page flag bit atomically under the waitqueue lock as we wake
> up the waiter.
> 
> I changed the name (and the return value) of the
> folio_xor_flags_has_waiters() function to just not have any
> possibility of semantic mixup, but basically instead of doing the xor
> atomically and unconditionally (and returning whether we had waiters),
> it now does it conditionally only if we do *not* have waiters, and
> returns true if successful.
> 
> And if there were waiters, it moves the flag clearing into the wakeup function.
> 
> That in turn means that the "while whiteback" loop can go back to be
> just a non-looping "if writeback", and folio_wait_writeback() can't
> get into any starvation with new writebacks always showing up.
> 
> The reason I say it *might* be possible to do something like this is
> that it changes __folio_end_writeback() to no longer necessarily clear
> the writeback bit under the XA lock. If there are waiters, we'll clear
> it later (after releasing the lock) in the caller.
> 
> Willy? What do you think? Clearly this now makes PG_writeback not
> synchronized with the PAGECACHE_TAG_WRITEBACK tag, but the reason I
> think it might be ok is that the code that *sets* the PG_writeback bit
> in __folio_start_writeback() only ever starts with a page that isn't
> under writeback, and has a
> 
>        VM_BUG_ON_FOLIO(folio_test_writeback(folio), folio);
> 
> at the top of the function even outside the XA lock. So I don't think
> these *need* to be synchronized under the XA lock, and I think the
> folio flag wakeup atomicity might be more important than the XA
> writeback tag vs folio writeback bit.
> 
> But I'm not going to really argue for this patch at all - I wanted to
> look at how bad it was, I wrote it, I'm actually running it on my
> machine now and it didn't *immediately* blow up in my face, so it
> *may* work just fine.
> 
> The patch is fairly simple, and apart from the XA tagging issue is
> seems very straightforward. I'm just not sure it's worth synchronizing
> one part just to at the same time de-synchronize another..
> 
>                   Linus
> <0001-Test-atomic-folio-bit-waiting.patch>

Liebe Grüße,
Christian Theune

-- 
Christian Theune · ct@...ingcircus.io · +49 345 219401 0
Flying Circus Internet Operations GmbH · https://flyingcircus.io
Leipziger Str. 70/71 · 06108 Halle (Saale) · Deutschland
HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, Christian Zagrodnick