[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <877cn625tn.ffs@tglx>
Date: Sat, 28 Oct 2023 16:13:24 +0200
From: Thomas Gleixner <tglx@...utronix.de>
To: David Woodhouse <dwmw2@...radead.org>,
Peter Zijlstra <peterz@...radead.org>,
Ingo Molnar <mingo@...hat.com>, Will Deacon <will@...nel.org>,
Waiman Long <longman@...hat.com>,
Boqun Feng <boqun.feng@...il.com>,
linux-kernel <linux-kernel@...r.kernel.org>
Cc: Juergen Gross <jgross@...e.com>
Subject: Re: [PATCH] lockdep: add lockdep_cleanup_dead_cpu()
On Sat, Oct 28 2023 at 12:14, David Woodhouse wrote:
> From: David Woodhouse <dwmw@...zon.co.uk>
>
> Add a function to check that an offlone CPU left the tracing infrastructure
> in a sane state. The acpi_idle_play_dead() function was recently observed
> calling safe_halt() instead of raw_safe_halt(), which had the side-effect
> of setting the hardirqs_enabled flag for the offline CPU. On x86 this
> triggered lockdep warnings when the CPU came back online, but too early
> for the exception to be handled correctly, leading to a triple-fault.
>
> Add lockdep_cleanup_dead_cpu() to check for this kind of failure mode,
> print the events leading up to it, and correct it so that the CPU can
> come online again correctly.
>
> [ 61.556652] smpboot: CPU 1 is now offline
> [ 61.556769] CPU 1 left hardirqs enabled!
> [ 61.556915] irq event stamp: 128149
> [ 61.556965] hardirqs last enabled at (128149): [<ffffffff81720a36>] acpi_idle_play_dead+0x46/0x70
> [ 61.557055] hardirqs last disabled at (128148): [<ffffffff81124d50>] do_idle+0x90/0xe0
> [ 61.557117] softirqs last enabled at (128078): [<ffffffff81cec74c>] __do_softirq+0x31c/0x423
> [ 61.557199] softirqs last disabled at (128065): [<ffffffff810baae1>] __irq_exit_rcu+0x91/0x100
>
> Signed-off-by: David Woodhouse <dwmw@...zon.co.uk>
Reviewed-by: Thomas Gleixner <tglx@...utronix.de>
Powered by blists - more mailing lists