lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 4 Jun 2018 15:56:37 +0800
From:   Dou Liyang <douly.fnst@...fujitsu.com>
To:     Thomas Gleixner <tglx@...utronix.de>,
        Song Liu <songliubraving@...com>
CC:     Song Liu <liu.song.a23@...il.com>,
        Tariq Toukan <tariqt@...lanox.com>,
        Dmitry Safonov <0x7f454c46@...il.com>,
        open list <linux-kernel@...r.kernel.org>,
        Maor Gottlieb <maorg@...lanox.com>,
        Kernel Team <Kernel-team@...com>
Subject: Re: WARNING and PANIC in irq_matrix_free

Hi Thomas,

Sorry to ask the questions at this series, my mailbox was kicked out of
the mailing list a few days ago, and didn't receive the e-mail.

please see below

At 05/29/2018 04:09 AM, Thomas Gleixner wrote:
> On Mon, 28 May 2018, Song Liu wrote:
>> This doesn't fix the issue with bnxt. Here is a trace with this patch:

[...]
> 
> Thanks for providing the data!
> 
>  >  19d... 1610359248us : vector_deactivate: irq=31 is_managed=0 
> can_reserve=1 reserve=0
>  >  19d... 1610359248us : vector_clear: irq=31 vector=33 cpu=20 
> prev_vector=0 prev_cpu=2
>  >  19d... 1610359249us : irq_matrix_free: bit=33 cpu=20 online=1 
> avl=201 alloc=0 managed=0 online_maps=56 global_avl=11241, 
> global_rsvd=25, total_alloc=15
> 
> Here IRQ 31 is shutdown and the vector freed.
> 
>  >  19d... 1610359249us : irq_matrix_reserve: online_maps=56 
> global_avl=11241, global_rsvd=26, total_alloc=15
>  >  19d... 1610359249us : vector_reserve: irq=31 ret=0
>  >  19d... 1610359249us : vector_config: irq=31 vector=239 cpu=0 
> apicdest=0x00000000
> 
> And set to the magic reservation vector 239 to catch spurious interrupts.
> 
>  >  20dN.. 1610366654us : vector_activate: irq=31 is_managed=0 
> can_reserve=1 reserve=0
>  >  20dN.. 1610366654us : vector_alloc: irq=31 vector=4294967268 
> reserved=1 ret=0
>  >  20dN.. 1610366655us : irq_matrix_alloc: bit=33 cpu=9 online=1 
> avl=200 alloc=1 managed=0 online_maps=56 global_avl=11240, 
> global_rsvd=28, total_alloc=16
>  >  20dN.. 1610366655us : vector_update: irq=31 vector=33 cpu=9 
> prev_vector=0 prev_cpu=20
   ^^^^^^^^^^^^
    this means there is no associated previous vector.

>  >  20dN.. 1610366656us : vector_alloc: irq=31 vector=33 reserved=1 ret=0
>  >  20dN.. 1610366656us : vector_config: irq=31 vector=33 cpu=9 
> apicdest=0x00000014
> 
> So here it gets initialized again and targets CPU9 now.
> 
>  >  20dN.. 1610366662us : irq_matrix_alloc: bit=33 cpu=20 online=1 
> avl=200 alloc=1 managed=0 online_maps=56 global_avl=11240, 
> global_rsvd=28, total_alloc=16
>  >  20dN.. 1610366662us : vector_update: irq=31 vector=33 cpu=20 
> prev_vector=33 prev_cpu=9
>  >  20dN.. 1610366662us : vector_alloc: irq=31 vector=33 reserved=1 ret=0
>  >  20dN.. 1610366662us : vector_config: irq=31 vector=33 cpu=20 
> apicdest=0x0000002c
> 
> Here it is reconfigured to CPU 20. Now that update schedules vector 33 on
> CPU9 for cleanup.
> 
>  >  20dN.. 1610366666us : irq_matrix_alloc: bit=34 cpu=2 online=1 
> avl=199 alloc=2 managed=0 online_maps=56 global_avl=11239, 
> global_rsvd=28, total_alloc=17
>  >  20dN.. 1610366666us : vector_update: irq=31 vector=34 cpu=2 
> prev_vector=33 prev_cpu=20
>  >  20dN.. 1610366666us : vector_alloc: irq=31 vector=34 reserved=1 ret=0
>  >  20dN.. 1610366666us : vector_config: irq=31 vector=34 cpu=2 
> apicdest=0x00000004
> 
> So here the shit hits the fan because that update schedules vector 33 on
> CPU20 for cleanup while the previous cleanup for CPU9 has not been done
> yet. Cute. or not so cute....
> 
>  >  20dNh. 1610366669us : vector_free_moved: irq=31 cpu=20 vector=33 
> is_managed=0
>  >  20dNh. 1610366670us : irq_matrix_free: bit=33 cpu=20 online=1 
> avl=201 alloc=0 managed=0 online_maps=56 global_avl=11240, 
> global_rsvd=28, total_alloc=16
> 
> And frees the CPU 20 vector
> 
>  >   9d.h. 1610366696us : vector_free_moved: irq=31 cpu=20 vector=0 
> is_managed=0
> 

Here, why didn't we avoid this cleanup by

diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index a75de0792942..0cc59646755f 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -821,6 +821,9 @@ static void free_moved_vector(struct apic_chip_data 
*apicd)
          */
         WARN_ON_ONCE(managed);

+       if (!vector)
+               return;
+
         trace_vector_free_moved(apicd->irq, cpu, vector, managed);
         irq_matrix_free(vector_matrix, cpu, vector, managed);
         per_cpu(vector_irq, cpu)[vector] = VECTOR_UNUSED;

Is there something I didn't consider with? ;-)

Thanks,
	dou.

> And then CPU9 claims that it's queued for cleanup. Bah.
> 
> I'm still working on a fix as the elegant solution refused to work.
> 
> Thanks,
> 
>      tglx


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ