[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8810621b-6711-dca5-db34-3b12b73a2316@suse.cz>
Date: Wed, 22 Nov 2023 09:52:43 +0100
From: Vlastimil Babka <vbabka@...e.cz>
To: Mark Brown <broonie@...nel.org>,
Chengming Zhou <chengming.zhou@...ux.dev>
Cc: cl@...ux.com, penberg@...nel.org, rientjes@...gle.com,
iamjoonsoo.kim@....com, akpm@...ux-foundation.org,
roman.gushchin@...ux.dev, 42.hyeyoo@...il.com, linux-mm@...ck.org,
linux-kernel@...r.kernel.org,
Chengming Zhou <zhouchengming@...edance.com>
Subject: Re: [PATCH v5 6/9] slub: Delay freezing of partial slabs
On 11/21/23 19:21, Mark Brown wrote:
> On Tue, Nov 21, 2023 at 11:47:26PM +0800, Chengming Zhou wrote:
>
>> Ah yes, there is no NMI on ARM, so CPU 3 maybe running somewhere with
>> interrupts disabled. I searched the full log, but still haven't a clue.
>> And there is no any WARNING or BUG related to SLUB in the log.
>
> Yeah, nor anything else particularly. I tried turning on some debug
> options:
>
> CONFIG_SOFTLOCKUP_DETECTOR=y
> CONFIG_DETECT_HUNG_TASK=y
> CONFIG_WQ_WATCHDOG=y
> CONFIG_DEBUG_PREEMPT=y
> CONFIG_DEBUG_LOCKING=y
> CONFIG_DEBUG_ATOMIC_SLEEP=y
>
> https://validation.linaro.org/scheduler/job/4017828
>
> which has some additional warnings related to clock changes but AFAICT
> those come from today's -next rather than the debug stuff:
>
> https://validation.linaro.org/scheduler/job/4017823
>
> so that's not super helpful.
For the record (and to help debugging focus) on IRC we discussed that with
CONFIG_SLUB_CPU_PARTIAL=n the problem persists:
https://validation.linaro.org/scheduler/job/4017863
Which limits the scope of where to look so that's good :)
>> I wonder how to reproduce it locally with a Qemu VM since I don't have
>> the ARM machine.
>
> There's sample qemu jobs available from for example KernelCI:
>
> https://storage.kernelci.org/next/master/next-20231120/arm/multi_v7_defconfig/gcc-10/lab-baylibre/baseline-qemu_arm-virt-gicv3.html
>
> (includes the command line, though it's not using Debian testing like my
> test was). Note that I'm testing a bunch of platforms with the same
> kernel/rootfs combination and it was only the Raspberry Pi 3 which blew
> up. It is a bit tight for memory which might have some influence?
>
> I'm really suspecting this may have made some underlying platform bug
> more obvious :/
Powered by blists - more mailing lists