netdev - Re: [PATCH net-next 1/2] net: napi: wake up ksoftirqd if needed after scheduling NAPI

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <078bffa8-6feb-9637-e874-254b6d4b188e@oss.nxp.com>
Date:   Fri, 4 Feb 2022 18:15:40 +0100
From:   Yannick Vignon <yannick.vignon@....nxp.com>
To:     Jakub Kicinski <kuba@...nel.org>,
        Sebastian Andrzej Siewior <bigeasy@...utronix.de>
Cc:     Eric Dumazet <edumazet@...gle.com>,
        Giuseppe Cavallaro <peppe.cavallaro@...com>,
        Alexandre Torgue <alexandre.torgue@...s.st.com>,
        Jose Abreu <joabreu@...opsys.com>,
        "David S. Miller" <davem@...emloft.net>,
        Maxime Coquelin <mcoquelin.stm32@...il.com>,
        Antoine Tenart <atenart@...nel.org>,
        Alexander Lobakin <alexandr.lobakin@...el.com>,
        Paolo Abeni <pabeni@...hat.com>, Wei Wang <weiwan@...gle.com>,
        Kumar Kartikeya Dwivedi <memxor@...il.com>,
        Yunsheng Lin <linyunsheng@...wei.com>,
        Arnd Bergmann <arnd@...db.de>, netdev <netdev@...r.kernel.org>,
        Vladimir Oltean <olteanv@...il.com>,
        Xiaoliang Yang <xiaoliang.yang_1@....com>, mingkai.hu@....com,
        Joakim Zhang <qiangqing.zhang@....com>,
        sebastien.laveze@....com, Yannick Vignon <yannick.vignon@....com>
Subject: Re: [PATCH net-next 1/2] net: napi: wake up ksoftirqd if needed after
 scheduling NAPI

On 2/4/2022 4:43 PM, Jakub Kicinski wrote:
> On Fri, 4 Feb 2022 09:19:22 +0100 Sebastian Andrzej Siewior wrote:
>> On 2022-02-03 17:09:01 [-0800], Jakub Kicinski wrote:
>>> Let's be clear that the problem only exists when switching to threaded
>>> IRQs on _non_ PREEMPT_RT kernel (or old kernels). We already have a
>>> check in __napi_schedule_irqoff() which should handle your problem on
>>> PREEMPT_RT.
>>
>> It does not. The problem is the missing bh-off/on around the call. The
>> forced-threaded handler has this. His explicit threaded-handler does not
>> and needs it.
> 
> I see, what I was getting at is on PREEMPT_RT IRQs are already threaded
> so I thought the patch was only targeting non-RT, I didn't think that
> explicitly threading IRQ is advantageous also on RT.
> 

Something I forgot to mention is that the final use case I care about 
uses threaded NAPI (because of the improvement it gives when processing 
latency-sensitive network streams). And in that case, __napi_schedule is 
simply waking up the NAPI thread, no softirq is needed, and my 
controversial change isn't even needed for the whole system to work 
properly.

>>> We should slap a lockdep warning for non-irq contexts in
>>> ____napi_schedule(), I think, it was proposed by got lost.
>>
>> Something like this perhaps?:
>>
>> diff --git a/net/core/dev.c b/net/core/dev.c
>> index 1baab07820f65..11c5f003d1591 100644
>> --- a/net/core/dev.c
>> +++ b/net/core/dev.c
>> @@ -4217,6 +4217,9 @@ static inline void ____napi_schedule(struct softnet_data *sd,
>>   {
>>   	struct task_struct *thread;
>>   
>> +	lockdep_assert_once(hardirq_count() | softirq_count());
>> +	lockdep_assert_irqs_disabled();
>> +
>>   	if (test_bit(NAPI_STATE_THREADED, &napi->state)) {
>>   		/* Paired with smp_mb__before_atomic() in
>>   		 * napi_enable()/dev_set_threaded().
> 
> 👍 maybe with a comment above the first one saying that we want to make
> sure softirq will be handled somewhere down the callstack. Possibly push
> it as a helper in lockdep.h called "lockdep_assert_softirq_will_run()"
> so it's self-explanatory?
> 
>> Be aware that this (the first assert) will trigger in dev_cpu_dead() and
>> needs a bh-off/on around. I should have something in my RT tree :)
> 
> Or we could push the asserts only into the driver-facing helpers
> (__napi_schedule(), __napi_schedule_irqoff()).

As I explained above, everything is working fine when using threaded 
NAPI. Why then forbid such a use case?

How about something like this instead:
in the (stmmac) threaded interrupt handler:
if (test_bit(NAPI_STATE_THREADED, &napi->state))
	__napi_schedule();
else {
	local_bh_disable();
	__napi_schedule();
	local_bh_enable();
}

Then in __napi_schedule, add the lockdep checks, but __below__ the "if 
(threaded) { ... }" block.

Would that be an acceptable change? Because really, the whole point of 
my patchqueue is to remove latencies imposed on network interrupts by 
bh_disable/enable sections. If moving to explicitly threaded IRQs means 
the bh_disable/enable section is simply moved down the path and around 
__napi_schedule, there is just no point.