linux-kernel - Re: Remove __napi_schedule

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <87ft6aa4k7.fsf@nanos.tec.linutronix.de>
Date:   Mon, 19 Oct 2020 12:33:12 +0200
From:   Thomas Gleixner <tglx@...utronix.de>
To:     Jakub Kicinski <kuba@...nel.org>,
        Heiner Kallweit <hkallweit1@...il.com>
Cc:     Eric Dumazet <edumazet@...gle.com>,
        Eric Dumazet <eric.dumazet@...il.com>,
        David Miller <davem@...emloft.net>,
        "netdev\@vger.kernel.org" <netdev@...r.kernel.org>,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: Remove __napi_schedule_irqoff?

On Sun, Oct 18 2020 at 10:19, Jakub Kicinski wrote:
> On Sun, 18 Oct 2020 10:20:41 +0200 Heiner Kallweit wrote:
>> >> Otherwise a non-solution could be to make IRQ_FORCED_THREADING
>> >> configurable.  
>> > 
>> > I have to say I do not understand why we want to defer to a thread the
>> > hard IRQ that we use in NAPI model.
>> >   
>> Seems like the current forced threading comes with the big hammer and
>> thread-ifies all hard irq's. To avoid this all NAPI network drivers
>> would have to request the interrupt with IRQF_NO_THREAD.

In a !RT kernel, forced threading (via commandline option) is mostly a
debug aid. It's pretty useful when something crashes in hard interrupt
context which usually takes the whole machine down. It's rather unlikely
to be used on production systems, and if so then the admin surely should
know what he's doing.

> Right, it'd work for some drivers. Other drivers try to take spin locks
> in their IRQ handlers.

I checked a few which do and some of these spinlocks just protect
register access and are not used for more complex serialization. So
these could be converted to raw spinlocks because their scope is short
and limited. But yes, you are right that this might be an issue in
general.

> What gave me a pause was that we have a busy loop in napi_schedule_prep:
>
> bool napi_schedule_prep(struct napi_struct *n)
> {
> 	unsigned long val, new;
>
> 	do {
> 		val = READ_ONCE(n->state);
> 		if (unlikely(val & NAPIF_STATE_DISABLE))
> 			return false;
> 		new = val | NAPIF_STATE_SCHED;
>
> 		/* Sets STATE_MISSED bit if STATE_SCHED was already set
> 		 * This was suggested by Alexander Duyck, as compiler
> 		 * emits better code than :
> 		 * if (val & NAPIF_STATE_SCHED)
> 		 *     new |= NAPIF_STATE_MISSED;
> 		 */
> 		new |= (val & NAPIF_STATE_SCHED) / NAPIF_STATE_SCHED *
> 						   NAPIF_STATE_MISSED;
> 	} while (cmpxchg(&n->state, val, new) != val);
>
> 	return !(val & NAPIF_STATE_SCHED);
> }
>
>
> Dunno how acceptable this is to run in an IRQ handler on RT..

In theory it's bad, but I don't think it's a big deal in reality.

Thanks,

        tglx