linux-kernel - Re: smp_call_function

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20150402190725.GA10570@gmail.com>
Date:	Thu, 2 Apr 2015 21:07:25 +0200
From:	Ingo Molnar <mingo@...nel.org>
To:	Chris J Arges <chris.j.arges@...onical.com>
Cc:	Linus Torvalds <torvalds@...ux-foundation.org>,
	Rafael David Tinoco <inaddy@...ntu.com>,
	Peter Anvin <hpa@...or.com>,
	Jiang Liu <jiang.liu@...ux.intel.com>,
	Peter Zijlstra <peterz@...radead.org>,
	LKML <linux-kernel@...r.kernel.org>,
	Jens Axboe <axboe@...nel.dk>,
	Frederic Weisbecker <fweisbec@...il.com>,
	Gema Gomez <gema.gomez-solano@...onical.com>,
	the arch/x86 maintainers <x86@...nel.org>
Subject: Re: smp_call_function_single lockups

* Chris J Arges <chris.j.arges@...onical.com> wrote:

> Whenever we look through the crashdump we see csd_lock_wait waiting 
> for CSD_FLAG_LOCK bit to be cleared.  Usually the signature leading 
> up to that looks like the following (in the openstack tempest on 
> openstack and nested VM stress case)
> 
> (qemu-system-x86 task)
> kvm_sched_in
>  -> kvm_arch_vcpu_load
>   -> vmx_vcpu_load
>    -> loaded_vmcs_clear
>     -> smp_call_function_single
> 
> (ksmd task)
> pmdp_clear_flush
>  -> flush_tlb_mm_range
>   -> native_flush_tlb_others
>     -> smp_call_function_many

So is this two separate smp_call_function instances, crossing each 
other, and none makes any progress, indefinitely - as if the two IPIs 
got lost?

The traces Rafael he linked to show a simpler scenario with two CPUs 
apparently locked up, doing this:

CPU0:

 #5 [ffffffff81c03e88] native_safe_halt at ffffffff81059386
 #6 [ffffffff81c03e90] default_idle at ffffffff8101eaee
 #7 [ffffffff81c03eb0] arch_cpu_idle at ffffffff8101f46f
 #8 [ffffffff81c03ec0] cpu_startup_entry at ffffffff810b6563
 #9 [ffffffff81c03f30] rest_init at ffffffff817a6067
#10 [ffffffff81c03f40] start_kernel at ffffffff81d4cfce
#11 [ffffffff81c03f80] x86_64_start_reservations at ffffffff81d4c4d7
#12 [ffffffff81c03f90] x86_64_start_kernel at ffffffff81d4c61c

This CPU is idle.

CPU1:

#10 [ffff88081993fa70] smp_call_function_single at ffffffff810f4d69
#11 [ffff88081993fb10] native_flush_tlb_others at ffffffff810671ae
#12 [ffff88081993fb40] flush_tlb_mm_range at ffffffff810672d4
#13 [ffff88081993fb80] pmdp_splitting_flush at ffffffff81065e0d
#14 [ffff88081993fba0] split_huge_page_to_list at ffffffff811ddd39
#15 [ffff88081993fc30] __split_huge_page_pmd at ffffffff811dec65
#16 [ffff88081993fcc0] unmap_single_vma at ffffffff811a4f03
#17 [ffff88081993fdc0] zap_page_range at ffffffff811a5d08
#18 [ffff88081993fe80] sys_madvise at ffffffff811b9775
#19 [ffff88081993ff80] system_call_fastpath at ffffffff817b8bad

This CPU is busy-waiting for the TLB flush IPI to finish.

There's no unexpected pattern here (other than it not finishing) 
AFAICS, the smp_call_function_single() is just the usual way we invoke 
the TLB flushing methods AFAICS.

So one possibility would be that an 'IPI was sent but lost'.

We could try the following trick: poll for completion for a couple of 
seconds (since an IPI is not held up by anything but irqs-off 
sections, it should arrive within microseconds typically - seconds of 
polling should be more than enough), and if the IPI does not arrive, 
print a warning message and re-send the IPI.

If the IPI was lost due to some race and there's no other failure mode 
that we don't understand, then this would work around the bug and 
would make the tests pass indefinitely - with occasional hickups and a 
handful of messages produced along the way whenever it would have 
locked up with a previous kernel.

If testing indeed confirms that kind of behavior we could drill down 
more closely to figure out why the IPI did not get to its destination.

Or if the behavior is different, we'd have some new behavior to look 
at. (for example the IPI sending mechanism might be wedged 
indefinitely for some reason, so that even a resend won't work.)

Agreed?

Thanks,

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/