[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1427149742.25985.84.camel@edumazet-glaptop2.roam.corp.google.com>
Date: Mon, 23 Mar 2015 15:29:02 -0700
From: Eric Dumazet <eric.dumazet@...il.com>
To: subashab@...eaurora.org
Cc: netdev@...r.kernel.org
Subject: Re: [PATCH] net: rps: fix data stall after hotplug
On Mon, 2015-03-23 at 22:16 +0000, subashab@...eaurora.org wrote:
> >> On Thu, 2015-03-19 at 14:50 -0700, Eric Dumazet wrote:
> >>
> >>> Are you seeing this race on x86 ?
> >>>
> >>> If IPI are not reliable on your arch, I am guessing you should fix
> >>> them.
> >>>
> >>> Otherwise, even without hotplug you'll have hangs.
> >>
> >> Please try instead this patch :
> >>
> >> diff --git a/net/core/dev.c b/net/core/dev.c
> >> index
> >> 5d43e010ef870a6ab92895297fe18d6e6a03593a..baa4bff9a6fbe0d77d7921865c038060cb5efffd
> >> 100644
> >> --- a/net/core/dev.c
> >> +++ b/net/core/dev.c
> >> @@ -4320,9 +4320,8 @@ static void net_rps_action_and_irq_enable(struct
> >> softnet_data *sd)
> >> while (remsd) {
> >> struct softnet_data *next = remsd->rps_ipi_next;
> >>
> >> - if (cpu_online(remsd->cpu))
> >> - smp_call_function_single_async(remsd->cpu,
> >> - &remsd->csd);
> >> + smp_call_function_single_async(remsd->cpu,
> >> + &remsd->csd);
> >> remsd = next;
> >> }
> >> } else
> >>
> >>
> > Thanks for the patch Eric. We are seeing this race on ARM.
> > I will try this and update.
> >
>
> Unfortunately, I am not able to reproduce data stall now with or without
> the patch. Could you tell me more about the patch and what issue you were
> suspecting?
>
> Based on the code, it looks like we BUG out on our arch if we try to call
> an IPI on an offline CPU. Since this condition is never hit, I feel that
> the IPI might not have failed.
>
> void smp_send_reschedule(int cpu)
> {
> BUG_ON(cpu_is_offline(cpu));
> smp_cross_call_common(cpumask_of(cpu), IPI_RESCHEDULE);
> }
The bug I am fixing is the following :
if (cpu_is_online(x))
target = x
...
queue packet on queue of cpu x
net_rps_action_and_irq_enable()
if (cpu_is_online(x)) [2]
smp_call_function_single_async(x, ...)
Problem is that first test in [1] can succeed, but second in [2] can
fail.
But we should still send this IPI.
We run in a softirq, so it is OK to deliver the IPI to the _about to be
offlined_ cpu.
We should test the cpu_is_online(x) once.
Doing this a second time is the bug.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists