netdev - Re: [PATCH net-next 1/2] net: napi: wake up ksoftirqd if needed after scheduling NAPI

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20220208073520.7197d638@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com>
Date:   Tue, 8 Feb 2022 07:35:20 -0800
From:   Jakub Kicinski <kuba@...nel.org>
To:     Sebastian Andrzej Siewior <bigeasy@...utronix.de>
Cc:     Yannick Vignon <yannick.vignon@....nxp.com>,
        Eric Dumazet <edumazet@...gle.com>,
        Giuseppe Cavallaro <peppe.cavallaro@...com>,
        Alexandre Torgue <alexandre.torgue@...s.st.com>,
        Jose Abreu <joabreu@...opsys.com>,
        "David S. Miller" <davem@...emloft.net>,
        Maxime Coquelin <mcoquelin.stm32@...il.com>,
        Antoine Tenart <atenart@...nel.org>,
        Alexander Lobakin <alexandr.lobakin@...el.com>,
        Paolo Abeni <pabeni@...hat.com>, Wei Wang <weiwan@...gle.com>,
        Kumar Kartikeya Dwivedi <memxor@...il.com>,
        Yunsheng Lin <linyunsheng@...wei.com>,
        Arnd Bergmann <arnd@...db.de>, netdev <netdev@...r.kernel.org>,
        Vladimir Oltean <olteanv@...il.com>,
        Xiaoliang Yang <xiaoliang.yang_1@....com>, mingkai.hu@....com,
        Joakim Zhang <qiangqing.zhang@....com>,
        sebastien.laveze@....com, Yannick Vignon <yannick.vignon@....com>,
        Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [PATCH net-next 1/2] net: napi: wake up ksoftirqd if needed
 after scheduling NAPI

On Tue, 8 Feb 2022 12:51:07 +0100 Sebastian Andrzej Siewior wrote:
> On 2022-02-04 10:50:35 [-0800], Jakub Kicinski wrote:
> > On Fri, 4 Feb 2022 19:03:31 +0100 Sebastian Andrzej Siewior wrote:  
> > > I had to look into NAPI-threads for some reason.
> > > What I dislike is that after enabling it via sysfs I have to:
> > > - adjust task priority manual so it is preferred over other threads.
> > >   This is usually important on RT. But then there is no overload
> > >   protection.
> > > 
> > > - set an affinity-mask for the thread so it does not migrate from one
> > >   CPU to the other. This is worse for a RT task where the scheduler
> > >   tries to keep the task running.
> > > 
> > > Wouldn't it work to utilize the threaded-IRQ API and use that instead
> > > the custom thread? Basically the primary handler would what it already
> > > does (disable the interrupt) and the threaded handler would feed packets
> > > into the stack. In the overload case one would need to lower the
> > > thread-priority.  
> > 
> > Sounds like an interesting direction if you ask me! That said I have
> > not been able to make threaded NAPI useful in my experiments / with my
> > workloads so I'd defer to Wei for confirmation.
> > 
> > To be clear -- are you suggesting that drivers just switch to threaded
> > NAPI, or a more dynamic approach where echo 1 > /proc/irq/$n/threaded
> > dynamically engages a thread in a generic fashion?  
> 
> Uhm, kind of, yes.
> 
> Now you have
> 	request_irq(, handler_irq);
> 	netif_napi_add(, , handler_napi);
> 
> The handler_irq() disables the interrupt line and schedules the softirq
> to process handler_napi(). Once handler_napi() is it re-enables the
> interrupt line otherwise it will be processed again on the next tick.
> 
> If you enable threaded NAPI then you end up with a thread and the
> softirq is no longer used. I don't know what the next action is but I
> guess you search for that thread and pin it manually to CPU and assign a
> RT priority (probably, otherwise it will compete with other tasks for
> CPU resources).

FWIW I don't think servers would want RT prio.

> Instead we could have
> 	request_threaded_irq(, handler_irq, handler_napi);
> 
> And we would have basically the same outcome. Except that handler_napi()
> runs that SCHED_FIFO/50 and has the same CPU affinity as the IRQ (and
> the CPU affinity is adjusted if the IRQ-affinity is changed).
> We would still have to work out the details what handler_irq() is
> allowed to do and how to handle one IRQ and multiple handler_napi().
> 
> If you wrap request_threaded_irq() in something like request_napi_irq()
> the you could switch between the former (softirq) and later (thread)
> based NAPI handling (since you have all the needed details).

One use case to watch out for is drivers which explicitly moved 
to threaded NAPI because they want to schedule multiple threads 
(NAPIs) from a single IRQ to spread processing across more cores.