[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <1427845460.25985.174.camel@edumazet-glaptop2.roam.corp.google.com>
Date: Tue, 31 Mar 2015 16:44:20 -0700
From: Eric Dumazet <eric.dumazet@...il.com>
To: subashab@...eaurora.org
Cc: netdev@...r.kernel.org
Subject: Re: [PATCH] net: rps: fix data stall after hotplug
On Tue, 2015-03-31 at 22:02 +0000, subashab@...eaurora.org wrote:
> > Listen, I would rather disable RPS on your arch, instead of messing with
> > it.
> >
> > Reset NAPI state as you did is in direct violation of the rules.
> >
> > Only cpu owning the bit is allowed to reset it.
> >
>
> Perhaps my understanding of the code in dev_cpu_callback() is incorrect?
> Please correct me if I am wrong.
>
> The poll list is copied from an offline cpu to an online cpu.
> Specifically for process_backlog, I was under the impression that
> the online cpu tries to reset the state of NAPI of the offline cpu.
> The process and input queues are then always copied to the
> online cpu.
>
> while (!list_empty(&oldsd->poll_list)) {
> struct napi_struct *napi = list_first_entry(&oldsd->poll_list,
> struct napi_struct,
> poll_list);
>
> list_del_init(&napi->poll_list);
> if (napi->poll == process_backlog)
> napi->state = 0;
> else
> ____napi_schedule(sd, napi);
> }
>
> My request was to know why it would be incorrect to clear the offline cpu
> backlog NAPI state unconditionally.
>
It is incorrect because the moment we chose to send an IPI to a cpu, we
effectively transfered NAPI bit ownership to this target cpu. Another
cpu cannot take over without risking tricky corruptions.
If your arch fails to send the IPI, we have no way to clear the bit in a
safe way.
Only target cpu is allowed to clear the bit by virtue of following being
called :
/* Called from hardirq (IPI) context */
static void rps_trigger_softirq(void *data)
{
struct softnet_data *sd = data;
____napi_schedule(sd, &sd->backlog);
sd->received_rps++;
}
If it is not called even in 0.000001% of the cases, all bets are off.
We are not going to add yet another atomic ops for every packet just to
solve this corner case. Just do not enable RPS on your arch, as it is by
default not enabled.
I am currently in vacations, I will not reply to another inquiry on this
topic.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists