lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b3bc868f-bf83-4b86-bcf0-13e99d0b7c7e@linux.dev>
Date:   Tue, 21 Nov 2023 23:47:26 +0800
From:   Chengming Zhou <chengming.zhou@...ux.dev>
To:     Mark Brown <broonie@...nel.org>
Cc:     vbabka@...e.cz, cl@...ux.com, penberg@...nel.org,
        rientjes@...gle.com, iamjoonsoo.kim@....com,
        akpm@...ux-foundation.org, roman.gushchin@...ux.dev,
        42.hyeyoo@...il.com, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org,
        Chengming Zhou <zhouchengming@...edance.com>
Subject: Re: [PATCH v5 6/9] slub: Delay freezing of partial slabs

On 2023/11/21 09:29, Mark Brown wrote:
> On Tue, Nov 21, 2023 at 08:58:40AM +0800, Chengming Zhou wrote:
>> On 2023/11/21 02:49, Mark Brown wrote:
>>> On Thu, Nov 02, 2023 at 03:23:27AM +0000, chengming.zhou@...ux.dev wrote:
> 
>>> When we see problems we see RCU stalls while logging in, for example:
> 
>>> [   46.453323] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
>>> [   46.459361] rcu: 	3-...0: (1 GPs behind) idle=def4/1/0x40000000 softirq=1304/1304 fqs=951
>>> [   46.467669] rcu: 	(detected by 0, t=2103 jiffies, g=1161, q=499 ncpus=4)
>>> [   46.474472] Sending NMI from CPU 0 to CPUs 3:
> 
>> IIUC, here should print the backtrace of CPU 3, right? It looks like CPU 3 is the cause,
>> but we couldn't see what it's doing from the log.
> 
> AIUI yes, but it looks like we've just completely lost the CPU - there's
> more attempts to talk to it visible in the log:
> 
>>> A full log for that run can be seen at:
>>>
>>>    https://validation.linaro.org/scheduler/job/4017095
> 
> but none of them appear to cause CPU 3 to respond.  Note that 32 bit ARM
> is just using a regular IPI rather than something that's actually a NMI
> so this isn't hugely out of the ordinary, I'd guess it's stuck with
> interrupts masked.

Ah yes, there is no NMI on ARM, so CPU 3 maybe running somewhere with
interrupts disabled. I searched the full log, but still haven't a clue.
And there is no any WARNING or BUG related to SLUB in the log.

I wonder how to reproduce it locally with a Qemu VM since I don't have
the ARM machine.

Thanks!

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ