linux-kernel - Re: [PATCH v5 6/9] slub: Delay freezing of partial slabs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <8810621b-6711-dca5-db34-3b12b73a2316@suse.cz>
Date:   Wed, 22 Nov 2023 09:52:43 +0100
From:   Vlastimil Babka <vbabka@...e.cz>
To:     Mark Brown <broonie@...nel.org>,
        Chengming Zhou <chengming.zhou@...ux.dev>
Cc:     cl@...ux.com, penberg@...nel.org, rientjes@...gle.com,
        iamjoonsoo.kim@....com, akpm@...ux-foundation.org,
        roman.gushchin@...ux.dev, 42.hyeyoo@...il.com, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org,
        Chengming Zhou <zhouchengming@...edance.com>
Subject: Re: [PATCH v5 6/9] slub: Delay freezing of partial slabs

On 11/21/23 19:21, Mark Brown wrote:
> On Tue, Nov 21, 2023 at 11:47:26PM +0800, Chengming Zhou wrote:
> 
>> Ah yes, there is no NMI on ARM, so CPU 3 maybe running somewhere with
>> interrupts disabled. I searched the full log, but still haven't a clue.
>> And there is no any WARNING or BUG related to SLUB in the log.
> 
> Yeah, nor anything else particularly.  I tried turning on some debug
> options:
> 
> CONFIG_SOFTLOCKUP_DETECTOR=y
> CONFIG_DETECT_HUNG_TASK=y
> CONFIG_WQ_WATCHDOG=y
> CONFIG_DEBUG_PREEMPT=y
> CONFIG_DEBUG_LOCKING=y
> CONFIG_DEBUG_ATOMIC_SLEEP=y
> 
> https://validation.linaro.org/scheduler/job/4017828
> 
> which has some additional warnings related to clock changes but AFAICT
> those come from today's -next rather than the debug stuff:
> 
> https://validation.linaro.org/scheduler/job/4017823
> 
> so that's not super helpful.

For the record (and to help debugging focus) on IRC we discussed that with
CONFIG_SLUB_CPU_PARTIAL=n the problem persists:
https://validation.linaro.org/scheduler/job/4017863
Which limits the scope of where to look so that's good :)

>> I wonder how to reproduce it locally with a Qemu VM since I don't have
>> the ARM machine.
> 
> There's sample qemu jobs available from for example KernelCI:
> 
>    https://storage.kernelci.org/next/master/next-20231120/arm/multi_v7_defconfig/gcc-10/lab-baylibre/baseline-qemu_arm-virt-gicv3.html
> 
> (includes the command line, though it's not using Debian testing like my
> test was).  Note that I'm testing a bunch of platforms with the same
> kernel/rootfs combination and it was only the Raspberry Pi 3 which blew
> up.  It is a bit tight for memory which might have some influence?
> 
> I'm really suspecting this may have made some underlying platform bug
> more obvious :/