[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20130127155043.GA6214@gmail.com>
Date: Sun, 27 Jan 2013 16:50:43 +0100
From: Ingo Molnar <mingo@...nel.org>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Wang YanQing <udknight@...il.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Peter Zijlstra <peterz@...radead.org>,
Thomas Gleixner <tglx@...utronix.de>, mina86@...a86.org,
"Srivatsa S. Bhat" <srivatsa.bhat@...ux.vnet.ibm.com>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
stable <stable@...r.kernel.org>, Mike Galbraith <efault@....de>,
Jan Beulich <JBeulich@...ell.com>,
Milton Miller <miltonm@....com>
Subject: Re: [PATCH]smp: Fix send func call IPI to empty cpu mask
* Linus Torvalds <torvalds@...ux-foundation.org> wrote:
> On Fri, Jan 25, 2013 at 11:53 PM, Wang YanQing <udknight@...il.com> wrote:
> > I get below warning every day with 3.7,
> > one or two times per day.
> >
> > [ 2235.186027] WARNING: at /mnt/sda7/kernel/linux/arch/x86/kernel/apic/ipi.c:109 default_send_IPI_mask_logical+0x2f/0xb8()
> > [ 2235.186030] Hardware name: Aspire 4741
> > [ 2235.186032] empty IPI mask
> > [ 2235.186079] [<c1015cbc>] native_send_call_func_ipi+0x4f/0x57
> > [ 2235.186087] [<c1053453>] smp_call_function_many+0x191/0x1a9
> > [ 2235.186097] [<c101e074>] native_flush_tlb_others+0x21/0x24
> > [ 2235.186101] [<c101e0da>] flush_tlb_page+0x63/0x89
> > [ 2235.186105] [<c101d360>] ptep_set_access_flags+0x20/0x26
> > [ 2235.186111] [<c108fadd>] do_wp_page+0x234/0x502
> > [ 2235.186121] [<c1090825>] handle_pte_fault+0x50d/0x54c
> > [ 2235.186148] [<c1090934>] handle_mm_fault+0xd0/0xe2
> > [ 2235.186153] [<c12dd143>] __do_page_fault+0x411/0x42d
> > [ 2235.186166] [<c12dd167>] do_page_fault+0x8/0xa
> > [ 2235.186170] [<c12db31a>] error_code+0x5a/0x60
> >
> > This patch fix it.
> >
> > This patch also fix some system hang problem:
> > If the data->cpumask been cleared after pass
> >
> > if (WARN_ONCE(!mask, "empty IPI mask"))
> > return;
> > then the problem 83d349f3 fix will happen again.
>
> Hmm. We have very consciously tried to avoid the extra copy, although
> I'm not entirely sure why (it might possibly hurt on the MAXSMP
> configuration).
>
> See for example commit 723aae25d5cd ("smp_call_function_many: handle
> concurrent clearing of mask") which fixed another version of this
> problem.
>
> But I do agree that it looks like the copy is required, simply because
> - as you say - once we've done the "list_add_rcu()" to add it to the
> queue, we can have (another) IPI to the target CPU that can now see it
> and clear the mask.
>
> So by the time we get to actually send the IPI, the mask might have
> been cleared by another IPI. So I do agree that your patch seems
> correct, but I really really want to run it by other people.
>
> Guys? Original patch on lkml. The other possible fix might be
> to take the &call_function.lock earlier in
> generic_smp_call_function_interrupt(), so that we can never
> clear the bit while somebody is adding entries to the list...
> But I think it very much tries to avoid that on purpose right
> now, with only the last CPU responding to that IPI taking the
> lock.
>
> So copying the IPI mask seems to be the reasonable approach.
> Comments?
Agreed, looks correct to me as well - I've queued the fix up in
tip:x86/urgent.
( I've added your Acked-by to the patch, please holler if you
disagree with how the final commit ended up looking like. )
Thanks,
Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists