netdev - Re: [PATCH net-next 0/5] implement kthread based napi poll

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20201001164652.0e61b810@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com>
Date:   Thu, 1 Oct 2020 16:46:52 -0700
From:   Jakub Kicinski <kuba@...nel.org>
To:     Wei Wang <weiwan@...gle.com>
Cc:     Eric Dumazet <edumazet@...gle.com>,
        "David S . Miller" <davem@...emloft.net>,
        netdev <netdev@...r.kernel.org>,
        Hannes Frederic Sowa <hannes@...essinduktion.org>,
        Paolo Abeni <pabeni@...hat.com>, Felix Fietkau <nbd@....name>
Subject: Re: [PATCH net-next 0/5] implement kthread based napi poll

On Thu, 1 Oct 2020 15:12:20 -0700 Wei Wang wrote:
> Yes. I did a round of testing with workqueue as well. The "real
> workload" I mentioned is a google internal application benchmark which
> involves networking  as well as disk ops.
> There are 2 types of tests there.
> 1 is sustained tests, where the ops/s is being pushed to very high,
> and keeps the overall cpu usage to > 80%, with various sizes of
> payload.
> In this type of test case, I see a better result with the kthread
> model compared to workqueue in the latency metrics, and similar CPU
> savings, with some tuning of the kthreads. (e.g., we limit the
> kthreads to a pool of CPUs to run on, to avoid mixture with
> application threads. I did the same for workqueue as well to be fair.)

Can you share relative performance delta of this banchmark?

Could you explain why threads are slower than ksoftirqd if you pin the
application away? From your cover letter it sounded like you want the
scheduler to see the NAPI load, but then you say you pinned the
application away from the NAPI cores for the test, so I'm confused.

> The other is trace based tests, where the load is based on the actual
> trace taken from the real servers. This kind of test has less load and
> ops/s overall. (~25% total cpu usage on the host)
> In this test case, I observe a similar amount of latency savings with
> both kthread and workqueue, but workqueue seems to have better cpu
> saving here, possibly due to less # of threads woken up to process the
> load.
> 
> And one reason we would like to push forward with 1 kthread per NAPI,
> is we are also trying to do busy polling with the kthread. And it
> seems a good model to have 1 kthread dedicated to 1 NAPI to begin
> with.

And you'd pin those busy polling threads to a specific, single CPU, too?
1 cpu : 1 thread : 1 NAPI?