linux-kernel - Re: latest -git: WARNING: at arch/x86/kernel/ipi.c:123 send_IPI_mask

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <48AE20B8.9000204@kernel.org>
Date:	Thu, 21 Aug 2008 19:13:12 -0700
From:	"H. Peter Anvin" <hpa@...nel.org>
To:	Dave Jones <davej@...hat.com>,
	Vegard Nossum <vegard.nossum@...il.com>,
	Andi Kleen <andi@...stfloor.org>,
	the arch/x86 maintainers <x86@...nel.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
CC:	Rusty Russell <rusty@...tcorp.com.au>
Subject: Re: latest -git: WARNING: at arch/x86/kernel/ipi.c:123	send_IPI_mask_bitmask+0xc3/0xe0()

Dave Jones wrote:
>  > 
>  > Hm. What you say is true, but this one in particular has nothing to do
>  > with oprofile! It has something to do with reading /dev/cpu/*/msr
>  > while hot-unplugging cpu1:
>  > 
>  >  [<c011733e>] msr_read+0x6e/0xa0
>  >  [<c01a87b4>] vfs_read+0x94/0x130
>  > 
>  > I wasn't using oprofile when this happened. So I think it should also
>  > be considered a separate issue. Though yes -- CPU hotplug in general
>  > tends to break a lot of things.
> 
> From my reading of the msr code, we check that the cpu is online in ->open,
> but we never check it again, and also, we make no guarantees that it
> won't go away before we ->read or even ->close it.
> 
> Would adding a get_cpu/put_cpu across the open/close solve this?
> Peter?
> 

A get_cpu/put_cpu across the whole open..close sequence would seem to 
be, ahem, rude, since userspace could hold it for an arbitrary amount of 
time (plus, there is no guarantee that they are invoked on the same CPU.)

The cpuid driver has the same problem, obviously.

get_online_cpus() and put_online_cpus() around the call to 
{rd,wr}msr_safe_on_cpu() should work; and the CPU hotplug documentation 
seems to claim that we can just disable preemption around those calls, 
which is exactly what get_cpu()..put_cpu() does, so I guess 
get_cpu()..put_cpu() here is fine.  Now, the big question is: should 
this really be done in the MSR/CPUID drivers, or should it be done in
smp_call_function_single(), which is the generic code invoked by this?

It seems to be that doing it in smp_call_function_single() would be more 
correct as it's already protected by get_cpu()..put_cpu() and a 
cpu_online() test in there should not be expensive in comparison to the 
whole rest of the code.

You may want to see if this patch fixes the problem; it does *NOT* have 
the correct error behaviour (some of the intervening layers don't 
propagate errors), but it should make the fault go away.

	-hpa

View attachment "smp-unplug-fault.patch" of type "text/x-patch" (1158 bytes)