lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 25 Nov 2015 19:16:22 +0100 (CET)
From:	Thomas Gleixner <tglx@...utronix.de>
To:	Joe Lawrence <joe.lawrence@...atus.com>
cc:	LKML <linux-kernel@...r.kernel.org>,
	Jiang Liu <jiang.liu@...ux.intel.com>, x86@...nel.org
Subject: Re: irq_desc use-after-free in smp_irq_move_cleanup_interrupt

Joe,

On Mon, 23 Nov 2015, Joe Lawrence wrote:

Nice detective work!

> The irq_desc is in R15: ffff88102a8196f8
>
> This irq_desc is no longer allocated, it's been filled with the slub debug
> poison pattern (hence the spinlock is stuck):
> 
> The irq vector is in RBX: 0000000000000091
>
> But from a custom crash gdb script, the freed irq_desc can still be found
> in CPU 1's vector_irq[145]:
> 
>   cpu[1] vector_irq[145] irq_desc @ 0xffff88102a8196f8
> 
> Sifting through a heavily grepped and abbreviated trace log:
> 
>   - The irq moved from CPU 1, vector 145 to CPU 44, vector 134 
>   - Both CPUs skip cleaning up their vector_irq[] entries for this irq
>     because data->move_in_progress is set (is this normal?)

Yes. The cleanup interrupt is not targetting a particular vector. 

>   - The irq moves again to CPU 34, vector 123
>   - The underlying device is removed @ 593.106514 jiffies
>   - The irq_desc is freed @ 593.239879 jiffies
>   - CPU 1 is last heard from @ 1022.830083 jiffies
> 
>   [001]    22.936764: __assign_irq_vector : cpu 44 : vector=134 -> 0xffff88102a8196f8
>   [044]    35.209945: smp_irq_move_cleanup_interrupt : data->move_in_progress : vector=134 0xffff88102a8196f8
>   [001]    35.212370: smp_irq_move_cleanup_interrupt : data->move_in_progress : vector=145 0xffff88102a8196f8
>   [044]    35.674114: smp_irq_move_cleanup_interrupt : data->move_in_progress : vector=134 0xffff88102a8196f8
>   [001]    35.675395: smp_irq_move_cleanup_interrupt : data->move_in_progress : vector=145 0xffff88102a8196f8
>   [044]    36.265716: smp_irq_move_cleanup_interrupt : data->move_in_progress : vector=134 0xffff88102a8196f8
>   [044]    36.517785: smp_irq_move_cleanup_interrupt : data->move_in_progress : vector=134 0xffff88102a8196f8
>   [001]    36.520115: smp_irq_move_cleanup_interrupt : data->move_in_progress : vector=145 0xffff88102a8196f8
>   [001]    54.636651: smp_irq_move_cleanup_interrupt : data->move_in_progress : vector=145 0xffff88102a8196f8
>   [001]    54.636722: smp_irq_move_cleanup_interrupt : data->move_in_progress : vector=145 0xffff88102a8196f8

So I assume, that no interrupt happened so far.

>   [044]    61.670267: __assign_irq_vector : cpu 34 : vector=123 -> 0xffff88102a8196f8

Now it's moved again.

>   [001]    61.670979: smp_irq_move_cleanup_interrupt : data->move_in_progress : vector=145 0xffff88102a8196f8
>   [044]    61.696120: smp_irq_move_cleanup_interrupt : cpu (this) : vector=134 0xffff88102a8196f8 -> (nil)
					
>   [034]    69.086107: smp_irq_move_cleanup_interrupt : vector == data->cfg.vector && ... : vector=123 0xffff88102a8196f8
>   [000]   593.239811: clear_irq_vector : 1 : cpu 34 : vector=123 0xffff88102a8196f8 -> (nil)
>   [000]   593.239879: free_desc : desc @ 0xffff88102a8196f8
>   [001]  1022.830083: smp_irq_move_cleanup_interrupt : IS_ERR_OR_NULL : vector=144 (nil)

Ok. Here is issue. So I assume the following happens:

CPU0   	     CPU1	    	   CPU34		CPU44		CPUxx
									send_IPI(CLEANUP)
    	     smp_irq_move_cleanup_interrupt
     	       sees data->move_in_progress
						
			  	  device_irq happens
			  	  data->move_in_progress = 0
			  	  send_IPI(CLEANUP)

	     Does not receive another
	     cleanup IPI for whatever
	     reason .... See below.
				  			smp_irq_move_cleanup_interrupt
							vector = NULL	

free_vector on CPU34

Does not clean CPU1 because
data->move_in_progress is not set.

free_desc

									send_IPI(CLEANUP)
    	     smp_irq_move_cleanup_interrupt
     	       Accesses random memory
 
> Prior to a782a7e46bb5 "x86/irq: Store irq descriptor in vector array", the
> vector_irq array held irq values, those interested in the irq_desc would
> have to perform their own irq_to_desc() (ie, a radix_tree_lookup of the
> irq_desc_tree).  The radix tree is updated in free_desc(), so any
> subsequent lookups would fail.

Yes, that above race has existed forever and the irq_to_desc() check
papered over it. In case that the irq number was reassigned the thing
operated on the wrong descriptor. Pretty bad as well as it fiddles
with the wrong bits. Though it clears the vector ...

The problem is actually in the vector assignment code.

>   [001]    22.936764: __assign_irq_vector : cpu 44 : vector=134 -> 0xffff88102a8196f8

No interrupt happened so far. So nothing cleans up the vector on cpu 1

>   [044]    61.670267: __assign_irq_vector : cpu 34 : vector=123 -> 0xffff88102a8196f8
 
Now that moves it from 44 to 34 and ignores that cpu 1 still has the
vector assigned. __assign_irq_vector unconditionally overwrites
data->old_domain, so the bit of cpu 1 is lost ....

I'm staring into the code to figure out a fix ....

Thanks,

	tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ