lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 26 Aug 2022 13:04:08 +0200
From:   Vlastimil Babka <vbabka@...e.cz>
To:     Jan Kara <jack@...e.cz>,
        matoro <matoro_mailinglist_kernel@...oro.tk>
Cc:     Meelis Roos <mroos@...ux.ee>, Matthew Wilcox <willy@...radead.org>,
        "Theodore Y. Ts'o" <tytso@....edu>, linux-alpha@...r.kernel.org,
        LKML <linux-kernel@...r.kernel.org>, linux-block@...r.kernel.org,
        linux-mm@...ck.org, vbabka@...e.com
Subject: Re: ext4 corruption on alpha with 4.20.0-09062-gd8372ba8ce28

On 8/26/22 12:55, Jan Kara wrote:
> On Thu 25-08-22 11:05:48, matoro wrote:
>> Hello all, I know this is quite an old thread.  I recently acquired some
>> alpha hardware and have run into this exact same problem on the latest
>> stable kernel (5.18 and 5.19).  CONFIG_COMPACTION seems to be totally broken
>> and causes userspace to be extremely unstable - random segfaults, corruption
>> of glibc data structures, gcc ICEs etc etc - seems most noticable during
>> tasks with heavy I/O load.
>> 
>> My hardware is a DS15 (Titan), so only slightly newer than the Tsunamis
>> mentioned earlier.  The problem is greatly exacerbated when using a
>> machine-optimized kernel (CONFIG_ALPHA_TITAN) over one with
>> CONFIG_ALPHA_GENERIC.  But it still doesn't go away on a generic kernel,
>> just pops up less often, usually very I/O heavy tasks like checking out a
>> tag in the kernel repo.
>> 
>> However all of this seems to be dependent on CONFIG_COMPACTION.  With this
>> toggled off all problems disappear, regardless of other options.  I tried
>> reverting the commit 88dbcbb3a4847f5e6dfeae952d3105497700c128 mentioned
>> earlier in the thread (the structure has moved to a different file but was
>> otherwise the same), but it unfortunately did not make a difference.
>> 
>> Since this doesn't seem to have a known cause or an easy fix, would it be
>> reasonable to just add a Kconfig dep to disable it automatically on alpha?
> 
> Thanks for report. I guess this just confirms that migration of pagecache
> pages is somehow broken on Alpha. Maybe we are missing to flush some cache
> specific for Alpha? Or maybe the page migration code is not safe wrt the
> peculiar memory ordering Alpha has... I think this will need someone with
> Alpha HW and willingness to dive into MM internals to debug this. Added
> Vlasta to CC mostly for awareness and in case it rings some bells :).

Hi, doesn't ring any bells unfortunately. Does corruption also happen when
mmapping a file and applying mbind() with MPOL_MF_MOVE or migrate_pages()?
That should allow more controlled migration experimens than through
compaction. But that would also need a NUMA machine or a fakenuma support,
dunno if alpha has that?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ