[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CANn89i+Fh8n=Vre=5h=UzJtoixF=ayxJU+RKW80GibFAG1f0yQ@mail.gmail.com>
Date: Thu, 16 Jun 2016 09:55:27 -0700
From: Eric Dumazet <edumazet@...gle.com>
To: Paolo Abeni <pabeni@...hat.com>
Cc: LKML <linux-kernel@...r.kernel.org>,
Thomas Gleixner <tglx@...utronix.de>,
"David S. Miller" <davem@...emloft.net>,
Steven Rostedt <rostedt@...dmis.org>,
"Peter Zijlstra (Intel)" <peterz@...radead.org>,
Ingo Molnar <mingo@...nel.org>,
Hannes Frederic Sowa <hannes@...essinduktion.org>,
netdev <netdev@...r.kernel.org>
Subject: Re: [PATCH 4/5] netdev: implement infrastructure for threadable napi irq
>
> I guess you means 'consumer' here. The scheduler doesn't fail to migrate
> it: the consumer is actually migrated a lot of times, but on each cpu a
> competing and running ksoftirqd thread is found.
>
> The general problem is that under significant network load (not
> necessary udp flood, similar behavior is observed even with TCP_RR
> tests), with enough rx queue available and enough flows running, no
> single thread/process can use 100% of any cpu, even if the overall
> capacity would allow it.
>
Looks like a general process scheduler issue ?
Really, allowing the RX processing to be migrated among cpus is
problematic for TCP,
as it will increase reorders.
RFS for example has a very specific logic to avoid these problems as
much as possible.
/*
* If the desired CPU (where last recvmsg was done) is
* different from current CPU (one in the rx-queue flow
* table entry), switch if one of the following holds:
* - Current CPU is unset (>= nr_cpu_ids).
* - Current CPU is offline.
* - The current CPU's queue tail has advanced beyond the
* last packet that was enqueued using this table entry.
* This guarantees that all previous packets for the flow
* have been dequeued, thus preserving in order delivery.
*/
if (unlikely(tcpu != next_cpu) &&
(tcpu >= nr_cpu_ids || !cpu_online(tcpu) ||
((int)(per_cpu(softnet_data, tcpu).input_queue_head -
rflow->last_qtail)) >= 0)) {
tcpu = next_cpu;
rflow = set_rps_cpu(dev, skb, rflow, next_cpu);
}
Powered by blists - more mailing lists