[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87eegdy0qh.fsf@nanos.tec.linutronix.de>
Date: Wed, 17 Mar 2021 21:38:30 +0100
From: Thomas Gleixner <tglx@...utronix.de>
To: Vitaly Kuznetsov <vkuznets@...hat.com>, x86@...nel.org
Cc: Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
"H. Peter Anvin" <hpa@...or.com>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH RFC 2/2] genirq/matrix: WARN_ON_ONCE() when cm->allocated/m->total_allocated go negative
On Fri, Feb 19 2021 at 12:31, Vitaly Kuznetsov wrote:
> When irq_matrix_assign()/irq_matrix_free() calls get unsynced, weird
> effects are possible, e.g. when cm->allocated goes negative CPU hotplug
> may get blocked. Add WARN_ON_ONCE() to simplify detecting such situations.
>
> Signed-off-by: Vitaly Kuznetsov <vkuznets@...hat.com>
> ---
> kernel/irq/matrix.c | 11 +++++++++--
> 1 file changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/irq/matrix.c b/kernel/irq/matrix.c
> index 651a4ad6d711..2438a4f9d726 100644
> --- a/kernel/irq/matrix.c
> +++ b/kernel/irq/matrix.c
> @@ -189,7 +189,9 @@ void irq_matrix_assign_system(struct irq_matrix *m, unsigned int bit,
> set_bit(bit, m->system_map);
> if (replace) {
> BUG_ON(!test_and_clear_bit(bit, cm->alloc_map));
> + WARN_ON_ONCE(!cm->allocated);
> cm->allocated--;
> + WARN_ON_ONCE(!m->total_allocated);
This hunk is not really useful. It already dies when the bit is not set
in the alloc map.
> m->total_allocated--;
> }
> if (bit >= m->alloc_start && bit < m->alloc_end)
> @@ -424,12 +426,17 @@ void irq_matrix_free(struct irq_matrix *m, unsigned int cpu,
> return;
>
> clear_bit(bit, cm->alloc_map);
> + WARN_ON_ONCE(!cm->allocated);
> cm->allocated--;
WARN and then decrement is not necessarily any better than just
decrementing unconditionally. It's just more noisy.
Why would you let the counter wrap into negative space if you already
know it's 0?
There is a way more useful way to handle this. In such a case the bit is
NOT set in the alloc map. So:
if (!WARN_ON_ONCE(test_and_clear_bit(bit, cm->alloc_map)))
return;
would have caught the problem at hand nicely and let the machine survive
while just throwing warns and continuing is broken to begin with.
Thanks,
tglx
Powered by blists - more mailing lists