netdev - Re: [PATCH net-next 0/5] implement kthread based napi poll

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20201001132607.21bcaa17@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com>
Date:   Thu, 1 Oct 2020 13:26:07 -0700
From:   Jakub Kicinski <kuba@...nel.org>
To:     Eric Dumazet <edumazet@...gle.com>
Cc:     Wei Wang <weiwan@...gle.com>,
        "David S . Miller" <davem@...emloft.net>,
        netdev <netdev@...r.kernel.org>,
        Hannes Frederic Sowa <hannes@...essinduktion.org>,
        Paolo Abeni <pabeni@...hat.com>, Felix Fietkau <nbd@....name>
Subject: Re: [PATCH net-next 0/5] implement kthread based napi poll

On Thu, 1 Oct 2020 09:52:45 +0200 Eric Dumazet wrote:
> On Wed, Sep 30, 2020 at 10:08 PM Jakub Kicinski <kuba@...nel.org> wrote:
> > On Wed, 30 Sep 2020 12:21:35 -0700 Wei Wang wrote:  
> > > With napi poll moved to kthread, scheduler is in charge of scheduling both
> > > the kthreads handling network load, and the user threads, and is able to
> > > make better decisions. In the previous benchmark, if we do this and we
> > > pin the kthreads processing napi poll to specific CPUs, scheduler is
> > > able to schedule user threads away from these CPUs automatically.
> > >
> > > And the reason we prefer 1 kthread per napi, instead of 1 workqueue
> > > entity per host, is that kthread is more configurable than workqueue,
> > > and we could leverage existing tuning tools for threads, like taskset,
> > > chrt, etc to tune scheduling class and cpu set, etc. Another reason is
> > > if we eventually want to provide busy poll feature using kernel threads
> > > for napi poll, kthread seems to be more suitable than workqueue.  
> >
> > As I said in my reply to the RFC I see better performance with the
> > workqueue implementation, so I would hold off until we have more
> > conclusive results there, as this set adds fairly strong uAPI that
> > we'll have to support for ever.  
> 
> We can make incremental changes, the kthread implementation looks much
> nicer to us.

Having done two implementation of something more wq-like now 
I can say with some confidence that it's quite likely not a 
simple extension of this model. And since we'll likely need
to support switching at runtime there will be a fast-path
synchronization overhead.

> The unique work queue is a problem on server class platforms, with
> NUMA placement.
> We now have servers with NIC on different NUMA nodes.

Are you saying that the wq code is less NUMA friendly than unpinned
threads?

> We can not introduce a new model that will make all workload better
> without any tuning.
> If you really think you can do that, think again.

Has Wei tested the wq implementation with real workloads?

All the cover letter has is some basic netperf runs and a vague
sentence saying "real workload also improved".

I think it's possible to get something that will be a better default
for 90% of workloads. Our current model predates SMP by two decades.
It's pretty bad.

I'm talking about upstream defaults, obviously, maybe you're starting
from a different baseline configuration than the rest of the world..

> Even the old ' fix'  (commit 4cd13c21b207e80ddb1144c576500098f2d5f882
> "softirq: Let ksoftirqd do its job" )
> had severe issues for latency sensitive jobs.
> 
> We need to be able to opt-in to threads, and let process scheduler
> take decisions.
> If we believe the process scheduler takes bad decision, it should be
> reported to scheduler experts.

I wouldn't expect that the scheduler will learn all by itself how to
group processes that run identical code for cache efficiency, and how
to schedule at 10us scale. I hope I'm wrong.

> I fully support this implementation, I do not want to wait for yet
> another 'work queue' model or scheduler classes.

I can't sympathize. I don't understand why you're trying to rush this.
And you're not giving me enough info about your target config to be able
to understand your thinking.