lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Wed, 30 May 2018 23:56:09 +0200 (CEST)
From:   Thomas Gleixner <tglx@...utronix.de>
To:     Song Liu <songliubraving@...com>
cc:     Song Liu <liu.song.a23@...il.com>,
        Tariq Toukan <tariqt@...lanox.com>,
        Dmitry Safonov <0x7f454c46@...il.com>,
        open list <linux-kernel@...r.kernel.org>,
        Maor Gottlieb <maorg@...lanox.com>,
        Kernel Team <Kernel-team@...com>
Subject: Re: WARNING and PANIC in irq_matrix_free

Song,

On Tue, 29 May 2018, Song Liu wrote:
> > On May 29, 2018, at 1:35 AM, Thomas Gleixner <tglx@...utronix.de> wrote:
> >> Maybe we cannot enable all trace points under irq_vectors/ and irq_matrix.
> > 
> > Right. Sorry, I forgot to say that we only need the irq_vectors ones which
> > are related to vector allocation, i.e.: irq_vectors/vector_*
>
> Here is the ftrace dump:

Thanks for providing the data!

>  19d... 1610359248us : vector_deactivate: irq=31 is_managed=0 can_reserve=1 reserve=0
>  19d... 1610359248us : vector_clear: irq=31 vector=33 cpu=20 prev_vector=0 prev_cpu=2
>  19d... 1610359249us : irq_matrix_free: bit=33 cpu=20 online=1 avl=201 alloc=0 managed=0 online_maps=56 global_avl=11241, global_rsvd=25, total_alloc=15

Here IRQ 31 is shutdown and the vector freed.

>  19d... 1610359249us : irq_matrix_reserve: online_maps=56 global_avl=11241, global_rsvd=26, total_alloc=15
>  19d... 1610359249us : vector_reserve: irq=31 ret=0
>  19d... 1610359249us : vector_config: irq=31 vector=239 cpu=0 apicdest=0x00000000

And set to the magic reservation vector 239 to catch spurious interrupts.

>  20dN.. 1610366654us : vector_activate: irq=31 is_managed=0 can_reserve=1 reserve=0
>  20dN.. 1610366654us : vector_alloc: irq=31 vector=4294967268 reserved=1 ret=0
>  20dN.. 1610366655us : irq_matrix_alloc: bit=33 cpu=9 online=1 avl=200 alloc=1 managed=0 online_maps=56 global_avl=11240, global_rsvd=28, total_alloc=16
>  20dN.. 1610366655us : vector_update: irq=31 vector=33 cpu=9 prev_vector=0 prev_cpu=20
>  20dN.. 1610366656us : vector_alloc: irq=31 vector=33 reserved=1 ret=0
>  20dN.. 1610366656us : vector_config: irq=31 vector=33 cpu=9 apicdest=0x00000014

So here it gets initialized again and targets CPU9 now.

>  20dN.. 1610366662us : irq_matrix_alloc: bit=33 cpu=20 online=1 avl=200 alloc=1 managed=0 online_maps=56 global_avl=11240, global_rsvd=28, total_alloc=16
>  20dN.. 1610366662us : vector_update: irq=31 vector=33 cpu=20 prev_vector=33 prev_cpu=9
>  20dN.. 1610366662us : vector_alloc: irq=31 vector=33 reserved=1 ret=0
>  20dN.. 1610366662us : vector_config: irq=31 vector=33 cpu=20 apicdest=0x0000002c

Here it is reconfigured to CPU 20. Now that update schedules vector 33 on
CPU9 for cleanup.

>  20dN.. 1610366666us : irq_matrix_alloc: bit=34 cpu=2 online=1 avl=199 alloc=2 managed=0 online_maps=56 global_avl=11239, global_rsvd=28, total_alloc=17
>  20dN.. 1610366666us : vector_update: irq=31 vector=34 cpu=2 prev_vector=33 prev_cpu=20
>  20dN.. 1610366666us : vector_alloc: irq=31 vector=34 reserved=1 ret=0
>  20dN.. 1610366666us : vector_config: irq=31 vector=34 cpu=2 apicdest=0x00000004

So here the shit hits the fan because that update schedules vector 33 on
CPU20 for cleanup while the previous cleanup for CPU9 has not been done
yet. Cute. or not so cute....

>  20dNh. 1610366669us : vector_free_moved: irq=31 cpu=20 vector=33 is_managed=0
>  20dNh. 1610366670us : irq_matrix_free: bit=33 cpu=20 online=1 avl=201 alloc=0 managed=0 online_maps=56 global_avl=11240, global_rsvd=28, total_alloc=16

And frees the CPU 20 vector

>   9d.h. 1610366696us : vector_free_moved: irq=31 cpu=20 vector=0 is_managed=0

And then CPU9 claims that it's queued for cleanup. Bah.

I'm still working on a fix as the elegant solution refused to work.

Thanks,

	tglx

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ