lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 1 Oct 2020 16:46:52 -0700
From:   Jakub Kicinski <kuba@...nel.org>
To:     Wei Wang <weiwan@...gle.com>
Cc:     Eric Dumazet <edumazet@...gle.com>,
        "David S . Miller" <davem@...emloft.net>,
        netdev <netdev@...r.kernel.org>,
        Hannes Frederic Sowa <hannes@...essinduktion.org>,
        Paolo Abeni <pabeni@...hat.com>, Felix Fietkau <nbd@....name>
Subject: Re: [PATCH net-next 0/5] implement kthread based napi poll

On Thu, 1 Oct 2020 15:12:20 -0700 Wei Wang wrote:
> Yes. I did a round of testing with workqueue as well. The "real
> workload" I mentioned is a google internal application benchmark which
> involves networking  as well as disk ops.
> There are 2 types of tests there.
> 1 is sustained tests, where the ops/s is being pushed to very high,
> and keeps the overall cpu usage to > 80%, with various sizes of
> payload.
> In this type of test case, I see a better result with the kthread
> model compared to workqueue in the latency metrics, and similar CPU
> savings, with some tuning of the kthreads. (e.g., we limit the
> kthreads to a pool of CPUs to run on, to avoid mixture with
> application threads. I did the same for workqueue as well to be fair.)

Can you share relative performance delta of this banchmark?

Could you explain why threads are slower than ksoftirqd if you pin the
application away? From your cover letter it sounded like you want the
scheduler to see the NAPI load, but then you say you pinned the
application away from the NAPI cores for the test, so I'm confused.

> The other is trace based tests, where the load is based on the actual
> trace taken from the real servers. This kind of test has less load and
> ops/s overall. (~25% total cpu usage on the host)
> In this test case, I observe a similar amount of latency savings with
> both kthread and workqueue, but workqueue seems to have better cpu
> saving here, possibly due to less # of threads woken up to process the
> load.
> 
> And one reason we would like to push forward with 1 kthread per NAPI,
> is we are also trying to do busy polling with the kthread. And it
> seems a good model to have 1 kthread dedicated to 1 NAPI to begin
> with.

And you'd pin those busy polling threads to a specific, single CPU, too?
1 cpu : 1 thread : 1 NAPI?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ