lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <E6728F3E-374A-4A86-A5F2-C67CCECD6F7D@flyingcircus.io>
Date: Thu, 19 Sep 2024 08:34:37 +0200
From: Christian Theune <ct@...ingcircus.io>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Dave Chinner <david@...morbit.com>,
 Matthew Wilcox <willy@...radead.org>,
 Chris Mason <clm@...a.com>,
 Jens Axboe <axboe@...nel.dk>,
 linux-mm@...ck.org,
 "linux-xfs@...r.kernel.org" <linux-xfs@...r.kernel.org>,
 linux-fsdevel@...r.kernel.org,
 linux-kernel@...r.kernel.org,
 Daniel Dao <dqminh@...udflare.com>,
 regressions@...ts.linux.dev,
 regressions@...mhuis.info
Subject: Re: Known and unfixed active data loss bug in MM + XFS with large
 folios since Dec 2021 (any kernel from 6.1 upwards)


> On 19. Sep 2024, at 05:12, Linus Torvalds <torvalds@...ux-foundation.org> wrote:
> 
> On Thu, 19 Sept 2024 at 05:03, Linus Torvalds
> <torvalds@...ux-foundation.org> wrote:
>> 
>> I think we should just do the simple one-liner of adding a
>> "xas_reset()" to after doing xas_split_alloc() (or do it inside the
>> xas_split_alloc()).
> 
> .. and obviously that should be actually *verified* to fix the issue
> not just with the test-case that Chris and Jens have been using, but
> on Christian's real PostgreSQL load.
> 
> Christian?

Happy to! I see there’s still some back and forth on the specific patches. Let me know which kernel version and which patches I should start trying out. I’m loosing track while following the discussion. 

In preparation: I’m wondering whether the known reproducer gives insight how I might force my load to trigger it more easily? Would running the reproducer above and combining that with a running PostgreSQL benchmark make sense? 

Otherwise we’d likely only be getting insight after weeks of not seeing crashes … 

Christian

-- 
Christian Theune · ct@...ingcircus.io · +49 345 219401 0
Flying Circus Internet Operations GmbH · https://flyingcircus.io
Leipziger Str. 70/71 · 06108 Halle (Saale) · Deutschland
HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, Christian Zagrodnick


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ