netdev - Re: [PATCH net-next 0/5] implement kthread based napi poll

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20201002155329.3bb56911@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com>
Date:   Fri, 2 Oct 2020 15:53:29 -0700
From:   Jakub Kicinski <kuba@...nel.org>
To:     Wei Wang <weiwan@...gle.com>
Cc:     Eric Dumazet <edumazet@...gle.com>,
        "David S . Miller" <davem@...emloft.net>,
        netdev <netdev@...r.kernel.org>,
        Hannes Frederic Sowa <hannes@...essinduktion.org>,
        Paolo Abeni <pabeni@...hat.com>, Felix Fietkau <nbd@....name>
Subject: Re: [PATCH net-next 0/5] implement kthread based napi poll

On Thu, 1 Oct 2020 18:44:40 -0700 Wei Wang wrote:
> > Can you share relative performance delta of this banchmark?
> >
> > Could you explain why threads are slower than ksoftirqd if you pin the
> > application away? From your cover letter it sounded like you want the
> > scheduler to see the NAPI load, but then you say you pinned the
> > application away from the NAPI cores for the test, so I'm confused.
> 
> No. We did not explicitly pin the application threads away.
> Application threads are free to run anywhere. What we do is we
> restrict the NAPI kthreads to only those CPUs handling rx interrupts.

Whatever. You pin the NAPI threads and hand-tune their number so the
load of the NAPI CPUs is always higher. If the workload changes the
system will get very unhappy.

> (For us, 8 cpus out of 56.) So the load on those CPUs are very high
> when running the test. And the scheduler is smart enough to avoid
> using those CPUs for the application threads automatically.
> Here is the results of 1 representative test result:
>                      cpu/op   50%tile     95%tile       99%tile
> base            71.47        417us      1.01ms          2.9ms
> kthread         67.84       396us      976us            2.4ms
> workqueue   69.68       386us      791us             1.9ms

Did you renice ksoftirqd in "base"?

> Actually, I remembered it wrong. It does seem workqueue is doing
> better on latencies. But cpu/op wise, kthread seems to be a bit
> better.

Q.E.D.