linux-kernel - Re: "Dying CPU not properly vacated" splat

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <c06ea3f3-4f07-42fb-9ad1-a227e9534bb1@paulmck-laptop>
Date:   Wed, 6 Sep 2023 06:08:16 -0700
From:   "Paul E. McKenney" <paulmck@...nel.org>
To:     Valentin Schneider <vschneid@...hat.com>
Cc:     linux-kernel@...r.kernel.org, mingo@...hat.com,
        peterz@...radead.org, juri.lelli@...hat.com,
        vincent.guittot@...aro.org, dietmar.eggemann@....com,
        rostedt@...dmis.org, bsegall@...gle.com, mgorman@...e.de,
        bristot@...hat.com
Subject: Re: "Dying CPU not properly vacated" splat

On Tue, Aug 02, 2022 at 10:30:02AM +0100, Valentin Schneider wrote:
> On 05/07/22 10:23, Paul E. McKenney wrote:
> > The second of these occurred near shutdown, but the first was quite some
> > time before shutdown.  In case that makes a difference.
> >
> > I have not seen this since.
> >
> > Any other diagnostics I should add?
> >
> 
> Sorry, I let this get buried to the bottom of my inbox :(
> 
> I've had another look at rcutorture.c but just like for
> rcu_torture_reader(), I don't see any obvious culprit (no
> kthread_set_per_cpu() usage)).
> 
> One thing I think would help is a scheduling trace (say sched_switch,
> sched_wakeup and cpuhp*, combined with ftrace_dump_on_oops + panic_on_warn
> ?) - that should at least tell us if the issue is in the wakeup placement
> (if the task gets placed on a dying CPU *after* CPUHP_AP_ACTIVE), or in the
> balance_push() mechanism (the task was *already* on the CPU when it started
> dying and never moved away).
> 
> Neither make sense to me, but it has to be somewhere in there...

And given that it has been more than a year since I have seen this,
I am considering it to be fixed, whether purposefully or accidentally.

							Thanx, Paul