linux-kernel - Re: qemu:arm test failure due to commit 8053871d0f7f (smp: Fix smp_call_function_single

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150420121729.GN5561@linux.vnet.ibm.com>
Date:	Mon, 20 Apr 2015 05:17:29 -0700
From:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:	Ingo Molnar <mingo@...nel.org>
Cc:	Linus Torvalds <torvalds@...ux-foundation.org>,
	Guenter Roeck <linux@...ck-us.net>,
	Rabin Vincent <rabin@....in>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Peter Zijlstra <peterz@...radead.org>,
	Thomas Gleixner <tglx@...utronix.de>
Subject: Re: qemu:arm test failure due to commit 8053871d0f7f (smp: Fix
 smp_call_function_single_async() locking)

On Mon, Apr 20, 2015 at 07:39:55AM +0200, Ingo Molnar wrote:
> 
> * Linus Torvalds <torvalds@...ux-foundation.org> wrote:
> 
> > On Sun, Apr 19, 2015 at 11:01 AM, Ingo Molnar <mingo@...nel.org> wrote:
> > >
> > > That's all fine and good, but why is an IPI sent to a non-existent 
> > > CPU? It's not like we don't know which CPU is up and down.
> > 
> > I agree that the qemu-arm behavior smells like a bug, plain and 
> > simple. Nobody sane should send random IPI's to CPU's that they 
> > don't even know are up or not.
> 
> Yeah, and a warning would have caught that bug a bit earlier, at the 
> cost of restricting the API:
> 
> > That said, I could imagine that we would have some valid case where 
> > we just do a cross-cpu call to (for example) do lock wakeup, and the 
> > code would use some optimistic algorithm that prods the CPU after 
> > the lock has been released, and there could be some random race 
> > where the lock data structure has already been released (ie imagine 
> > the kind of optimistic unlocked racy access to "sem->owner" that we 
> > discussed as part of the rwsem_spin_on_owner() thread recently).
> > 
> > So I _could_ imagine that somebody would want to do optimistic "prod 
> > other cpu" calls that in all normal cases are for existing cpus, but 
> > could be racy in theory.
> 
> Yes, and I don't disagree with such optimizations in principle (it 
> allows less references to be taken in the fast path), but is it really 
> safe?
> 
> If a CPU is going down and we potentially race against that, and send 
> off an IPI, the IPI might be 'in flight' for an indeterminate amount 
> of time, especially on wildly non-deterministic hardware like virtual 
> platforms.
> 
> So if the CPU goes down during that time, the typical way a CPU goes 
> down is that it ignores all IPIs that arrive after that. End result: 
> the sender of the IPI may hang indefinitely (for the synchronous API), 
> waiting for the CSD lock to be released...
> 
> For the non-wait API we could also hang, but in an even more 
> interesting way: we won't hang trying to send to the downed CPU, we'd 
> hang on the _next_ cross-call call, possibly sending to another CPU, 
> because the CSD_FLAG_LOCK of the sender CPU is never released by the 
> offlined CPU which was supposed to unlock it.
> 
> Also note the existing warning we already have in 
> flush_smp_call_function_queue():
> 
>         /* There shouldn't be any pending callbacks on an offline CPU. */
>         if (unlikely(warn_cpu_offline && !cpu_online(smp_processor_id()) &&
>                      !warned && !llist_empty(head))) {
>                 warned = true;
>                 WARN(1, "IPI on offline CPU %d\n", smp_processor_id());
> 
> Couldn't this trigger in the opportunistic, imprecise IPI case, if 
> IRQs are ever enabled when or after the CPU is marked offline?
> 
> My suggested warning would simply catch this kind of unsafe looking 
> race a bit sooner, instead of silently ignoring the cross-call 
> attempt.
> 
> Now your suggestion could be made to work by:
> 
>   - polling for CPU status in the CSD-lock code as well. (We don't do
>     this currently.)
> 
>   - making sure that if a late IPI arrives to an already-down CPU it
>     does not attempt to execute CSD functions. (I.e. silence the
>     above, already existing warning.)
> 
>   - auditing the CPU-down path to make sure it does not get surprised
>     by late external IPIs. (I think this should already be so, given
>     the existing warning.)
> 
>   - IPIs can be pending indefinitely, so make sure a pending IPI won't 
>     confuse the machinery after a CPU has been onlined again.
> 
>   - pending IPIs may consume hardware resources when not received 
>     properly. For example I think x86 will keep trying to resend it. 
>     Not sure there's any timeout mechanism on the hardware level - the 
>     sending APIC might even get stuck? Audit all hardware for this 
>     kind of possibility.
> 
> So unless I'm missing something, to me it looks like that the current 
> code is only safe to be used on offline CPUs if the 'offline' CPU is 
> never ever brought online in the first place.
> 
> > It doesn't sound like the qemu-arm case is that kind of situation, 
> > though. That one just sounds like a stupid "let's send an ipi to a 
> > cpu whether it exists or not".
> > 
> > But Icommitted it without any new warning, because I could in theory 
> > see it as being a valid use.
> 
> So my point is that we already have a 'late' warning, and I think it 
> actively hid the qemu-arm bug, because the 'offline' CPU was so 
> offline (due to being non-existent) that it never had a chance to 
> print a warning.
> 
> With an 'early' warning we could flush out such bugs (or code 
> uncleanlinesses: clearly the ARM code was at least functional before) 
> a bit sooner.
> 
> But I don't have strong feelings about it, SMP cross-call users are a 
> lot less frequent than say locking facility users, so the strength of 
> debugging isn't nearly as important and it's fine to me either way!

In the long term, support of "send an IPI to a CPU that might or might
not be leaving" will probably need an API that returns a success or
failure indication.  This will probably simplify things a bit, because
currently the caller needs to participate in the IPI's synchronization.
With a conditional primitive, callers could instead simply rely on the
return value.

							Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/