netdev - Re: [RFC PATCH net-next 0/6] implement kthread based napi poll

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20200925120652.10b8d7c5@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com>
Date:   Fri, 25 Sep 2020 12:06:52 -0700
From:   Jakub Kicinski <kuba@...nel.org>
To:     Magnus Karlsson <magnus.karlsson@...il.com>
Cc:     Wei Wang <weiwan@...gle.com>,
        "David S . Miller" <davem@...emloft.net>,
        Network Development <netdev@...r.kernel.org>,
        Eric Dumazet <edumazet@...gle.com>,
        Paolo Abeni <pabeni@...hat.com>,
        Hannes Frederic Sowa <hannes@...essinduktion.org>,
        Felix Fietkau <nbd@....name>,
        Björn Töpel <bjorn.topel@...el.com>
Subject: Re: [RFC PATCH net-next 0/6] implement kthread based napi poll

On Fri, 25 Sep 2020 15:48:35 +0200 Magnus Karlsson wrote:
> I really like this RFC and would encourage you to submit it as a
> patch. Would love to see it make it into the kernel.
> 
> I see the same positive effects as you when trying it out with AF_XDP
> sockets. Made some simple experiments where I sent 64-byte packets to
> a single AF_XDP socket. Have not managed to figure out how to do
> percentiles on my load generator, so this is going to be min, avg and
> max only. The application using the AF_XDP socket just performs a mac
> swap on the packet and sends it back to the load generator that then
> measures the round trip latency. The kthread is taskset to the same
> core as ksoftirqd would run on. So in each experiment, they always run
> on the same core id (which is not the same as the application).
> 
> Rate 12 Mpps with 0% loss.
>               Latencies (us)         Delay Variation between packets
>           min    avg    max      avg   max
> sofirq  11.0  17.1   78.4      0.116  63.0
> kthread 11.2  17.1   35.0     0.116  20.9
> 
> Rate ~58 Mpps (Line rate at 40 Gbit/s) with substantial loss
>               Latencies (us)         Delay Variation between packets
>           min    avg    max      avg   max
> softirq  87.6  194.9  282.6    0.062  25.9
> kthread  86.5  185.2  271.8    0.061  22.5
> 
> For the last experiment, I also get 1.5% to 2% higher throughput with
> your kthread approach. Moreover, just from the per-second throughput
> printouts from my application, I can see that the kthread numbers are
> more stable. The softirq numbers can vary quite a lot between each
> second, around +-3%. But for the kthread approach, they are nice and
> stable. Have not examined why.

Sure, it's better than status quo for AF_XDP but it's going to be far
inferior to well implemented busy polling.

We already discussed the potential scheme with Bjorn, since you prompted
me again, let me shoot some code from the hip at ya:

diff --git a/net/core/dev.c b/net/core/dev.c
index 74ce8b253ed6..8dbdfaeb0183 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -6668,6 +6668,7 @@ static struct napi_struct *napi_by_id(unsigned int napi_id)
 
 static void busy_poll_stop(struct napi_struct *napi, void *have_poll_lock)
 {
+       unsigned long to;
        int rc;
 
        /* Busy polling means there is a high chance device driver hard irq
@@ -6682,6 +6683,13 @@ static void busy_poll_stop(struct napi_struct *napi, void *have_poll_lock)
        clear_bit(NAPI_STATE_MISSED, &napi->state);
        clear_bit(NAPI_STATE_IN_BUSY_POLL, &napi->state);
 
+       if (READ_ONCE(napi->dev->napi_defer_hard_irqs)) {
+               netpoll_poll_unlock(have_poll_lock);
+               to = ns_to_ktime(READ_ONCE(napi->dev->gro_flush_timeout));
+               hrtimer_start(&n->timer, to, HRTIMER_MODE_REL_PINNED);
+               return;
+       }
+
        local_bh_disable();
 
        /* All we really want here is to re-enable device interrupts.


With basic busy polling implemented for AF_XDP this is all** you need
to make busy polling work very well.

** once bugs are fixed :D I haven't even compiled this

Eric & co. already implemented hard IRQ deferral. All we need to do is
push the timer away when application picks up frames. I think.

Please, no loose threads for AF_XDP apps (or other busy polling apps).
Let the application burn 100% of the core :(