[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e4237c07-f00a-1820-d36a-7dbee4ea0297@cn.fujitsu.com>
Date: Mon, 4 Jun 2018 15:56:37 +0800
From: Dou Liyang <douly.fnst@...fujitsu.com>
To: Thomas Gleixner <tglx@...utronix.de>,
Song Liu <songliubraving@...com>
CC: Song Liu <liu.song.a23@...il.com>,
Tariq Toukan <tariqt@...lanox.com>,
Dmitry Safonov <0x7f454c46@...il.com>,
open list <linux-kernel@...r.kernel.org>,
Maor Gottlieb <maorg@...lanox.com>,
Kernel Team <Kernel-team@...com>
Subject: Re: WARNING and PANIC in irq_matrix_free
Hi Thomas,
Sorry to ask the questions at this series, my mailbox was kicked out of
the mailing list a few days ago, and didn't receive the e-mail.
please see below
At 05/29/2018 04:09 AM, Thomas Gleixner wrote:
> On Mon, 28 May 2018, Song Liu wrote:
>> This doesn't fix the issue with bnxt. Here is a trace with this patch:
[...]
>
> Thanks for providing the data!
>
> > 19d... 1610359248us : vector_deactivate: irq=31 is_managed=0
> can_reserve=1 reserve=0
> > 19d... 1610359248us : vector_clear: irq=31 vector=33 cpu=20
> prev_vector=0 prev_cpu=2
> > 19d... 1610359249us : irq_matrix_free: bit=33 cpu=20 online=1
> avl=201 alloc=0 managed=0 online_maps=56 global_avl=11241,
> global_rsvd=25, total_alloc=15
>
> Here IRQ 31 is shutdown and the vector freed.
>
> > 19d... 1610359249us : irq_matrix_reserve: online_maps=56
> global_avl=11241, global_rsvd=26, total_alloc=15
> > 19d... 1610359249us : vector_reserve: irq=31 ret=0
> > 19d... 1610359249us : vector_config: irq=31 vector=239 cpu=0
> apicdest=0x00000000
>
> And set to the magic reservation vector 239 to catch spurious interrupts.
>
> > 20dN.. 1610366654us : vector_activate: irq=31 is_managed=0
> can_reserve=1 reserve=0
> > 20dN.. 1610366654us : vector_alloc: irq=31 vector=4294967268
> reserved=1 ret=0
> > 20dN.. 1610366655us : irq_matrix_alloc: bit=33 cpu=9 online=1
> avl=200 alloc=1 managed=0 online_maps=56 global_avl=11240,
> global_rsvd=28, total_alloc=16
> > 20dN.. 1610366655us : vector_update: irq=31 vector=33 cpu=9
> prev_vector=0 prev_cpu=20
^^^^^^^^^^^^
this means there is no associated previous vector.
> > 20dN.. 1610366656us : vector_alloc: irq=31 vector=33 reserved=1 ret=0
> > 20dN.. 1610366656us : vector_config: irq=31 vector=33 cpu=9
> apicdest=0x00000014
>
> So here it gets initialized again and targets CPU9 now.
>
> > 20dN.. 1610366662us : irq_matrix_alloc: bit=33 cpu=20 online=1
> avl=200 alloc=1 managed=0 online_maps=56 global_avl=11240,
> global_rsvd=28, total_alloc=16
> > 20dN.. 1610366662us : vector_update: irq=31 vector=33 cpu=20
> prev_vector=33 prev_cpu=9
> > 20dN.. 1610366662us : vector_alloc: irq=31 vector=33 reserved=1 ret=0
> > 20dN.. 1610366662us : vector_config: irq=31 vector=33 cpu=20
> apicdest=0x0000002c
>
> Here it is reconfigured to CPU 20. Now that update schedules vector 33 on
> CPU9 for cleanup.
>
> > 20dN.. 1610366666us : irq_matrix_alloc: bit=34 cpu=2 online=1
> avl=199 alloc=2 managed=0 online_maps=56 global_avl=11239,
> global_rsvd=28, total_alloc=17
> > 20dN.. 1610366666us : vector_update: irq=31 vector=34 cpu=2
> prev_vector=33 prev_cpu=20
> > 20dN.. 1610366666us : vector_alloc: irq=31 vector=34 reserved=1 ret=0
> > 20dN.. 1610366666us : vector_config: irq=31 vector=34 cpu=2
> apicdest=0x00000004
>
> So here the shit hits the fan because that update schedules vector 33 on
> CPU20 for cleanup while the previous cleanup for CPU9 has not been done
> yet. Cute. or not so cute....
>
> > 20dNh. 1610366669us : vector_free_moved: irq=31 cpu=20 vector=33
> is_managed=0
> > 20dNh. 1610366670us : irq_matrix_free: bit=33 cpu=20 online=1
> avl=201 alloc=0 managed=0 online_maps=56 global_avl=11240,
> global_rsvd=28, total_alloc=16
>
> And frees the CPU 20 vector
>
> > 9d.h. 1610366696us : vector_free_moved: irq=31 cpu=20 vector=0
> is_managed=0
>
Here, why didn't we avoid this cleanup by
diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index a75de0792942..0cc59646755f 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -821,6 +821,9 @@ static void free_moved_vector(struct apic_chip_data
*apicd)
*/
WARN_ON_ONCE(managed);
+ if (!vector)
+ return;
+
trace_vector_free_moved(apicd->irq, cpu, vector, managed);
irq_matrix_free(vector_matrix, cpu, vector, managed);
per_cpu(vector_irq, cpu)[vector] = VECTOR_UNUSED;
Is there something I didn't consider with? ;-)
Thanks,
dou.
> And then CPU9 claims that it's queued for cleanup. Bah.
>
> I'm still working on a fix as the elegant solution refused to work.
>
> Thanks,
>
> tglx
Powered by blists - more mailing lists