lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 02 Aug 2022 10:30:02 +0100
From:   Valentin Schneider <vschneid@...hat.com>
To:     paulmck@...nel.org
Cc:     linux-kernel@...r.kernel.org, mingo@...hat.com,
        peterz@...radead.org, juri.lelli@...hat.com,
        vincent.guittot@...aro.org, dietmar.eggemann@....com,
        rostedt@...dmis.org, bsegall@...gle.com, mgorman@...e.de,
        bristot@...hat.com
Subject: Re: "Dying CPU not properly vacated" splat

On 05/07/22 10:23, Paul E. McKenney wrote:
> The second of these occurred near shutdown, but the first was quite some
> time before shutdown.  In case that makes a difference.
>
> I have not seen this since.
>
> Any other diagnostics I should add?
>

Sorry, I let this get buried to the bottom of my inbox :(

I've had another look at rcutorture.c but just like for
rcu_torture_reader(), I don't see any obvious culprit (no
kthread_set_per_cpu() usage)).

One thing I think would help is a scheduling trace (say sched_switch,
sched_wakeup and cpuhp*, combined with ftrace_dump_on_oops + panic_on_warn
?) - that should at least tell us if the issue is in the wakeup placement
(if the task gets placed on a dying CPU *after* CPUHP_AP_ACTIVE), or in the
balance_push() mechanism (the task was *already* on the CPU when it started
dying and never moved away).

Neither make sense to me, but it has to be somewhere in there...

>                                                       Thanx, Paul

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ