lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Sat, 26 Jan 2013 12:06:01 -0800
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Wang YanQing <udknight@...il.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Peter Zijlstra <peterz@...radead.org>,
	Thomas Gleixner <tglx@...utronix.de>, mina86@...a86.org,
	"Srivatsa S. Bhat" <srivatsa.bhat@...ux.vnet.ibm.com>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	stable <stable@...r.kernel.org>, Ingo Molnar <mingo@...nel.org>,
	Mike Galbraith <efault@....de>,
	Jan Beulich <JBeulich@...ell.com>,
	Milton Miller <miltonm@....com>
Subject: Re: [PATCH]smp: Fix send func call IPI to empty cpu mask

On Fri, Jan 25, 2013 at 11:53 PM, Wang YanQing <udknight@...il.com> wrote:
> I get below warning every day with 3.7,
> one or two times per day.
>
> [ 2235.186027] WARNING: at /mnt/sda7/kernel/linux/arch/x86/kernel/apic/ipi.c:109 default_send_IPI_mask_logical+0x2f/0xb8()
> [ 2235.186030] Hardware name: Aspire 4741
> [ 2235.186032] empty IPI mask
> [ 2235.186079]  [<c1015cbc>] native_send_call_func_ipi+0x4f/0x57
> [ 2235.186087]  [<c1053453>] smp_call_function_many+0x191/0x1a9
> [ 2235.186097]  [<c101e074>] native_flush_tlb_others+0x21/0x24
> [ 2235.186101]  [<c101e0da>] flush_tlb_page+0x63/0x89
> [ 2235.186105]  [<c101d360>] ptep_set_access_flags+0x20/0x26
> [ 2235.186111]  [<c108fadd>] do_wp_page+0x234/0x502
> [ 2235.186121]  [<c1090825>] handle_pte_fault+0x50d/0x54c
> [ 2235.186148]  [<c1090934>] handle_mm_fault+0xd0/0xe2
> [ 2235.186153]  [<c12dd143>] __do_page_fault+0x411/0x42d
> [ 2235.186166]  [<c12dd167>] do_page_fault+0x8/0xa
> [ 2235.186170]  [<c12db31a>] error_code+0x5a/0x60
>
> This patch fix it.
>
> This patch also fix some system hang problem:
> If the data->cpumask been cleared after pass
>
>         if (WARN_ONCE(!mask, "empty IPI mask"))
>                 return;
> then the problem 83d349f3 fix will happen again.

Hmm. We have very consciously tried to avoid the extra copy, although
I'm not entirely sure why (it might possibly hurt on the MAXSMP
configuration).

See for example commit 723aae25d5cd ("smp_call_function_many: handle
concurrent clearing of mask") which fixed another version of this
problem.

But I do agree that it looks like the copy is required, simply because
- as you say - once we've done the "list_add_rcu()" to add it to the
queue, we can have (another) IPI to the target CPU that can now see it
and clear the mask.

So by the time we get to actually send the IPI, the mask might have
been cleared by another IPI. So I do agree that your patch seems
correct, but I really really want to run it by other people.

Guys? Original patch on lkml. The other possible fix might be to take
the &call_function.lock earlier in
generic_smp_call_function_interrupt(), so that we can never clear the
bit while somebody is adding entries to the list... But I think it
very much tries to avoid that on purpose right now, with only the last
CPU responding to that IPI taking the lock.

So copying the IPI mask seems to be the reasonable approach. Comments?

                Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ