netdev - Re: RE: Query about the impact of using CONFIG_PREEMPT_RT on locking mechanisms within networking drivers

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20251015162340.i7K71rpM@linutronix.de>
Date: Wed, 15 Oct 2025 18:23:40 +0200
From: "bigeasy@...utronix.de" <bigeasy@...utronix.de>
To: Fabrizio Castro <fabrizio.castro.jz@...esas.com>
Cc: Steven Rostedt <rostedt@...dmis.org>,
	"davem@...emloft.net" <davem@...emloft.net>,
	"edumazet@...gle.com" <edumazet@...gle.com>,
	"kuba@...nel.org" <kuba@...nel.org>,
	"pabeni@...hat.com" <pabeni@...hat.com>,
	"andrew@...n.ch" <andrew@...n.ch>,
	"clrkwllms@...nel.org" <clrkwllms@...nel.org>,
	"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
	"linux-rt-devel@...ts.linux.dev" <linux-rt-devel@...ts.linux.dev>,
	Prabhakar Mahadev Lad <prabhakar.mahadev-lad.rj@...renesas.com>
Subject: Re: RE: Query about the impact of using CONFIG_PREEMPT_RT on locking
 mechanisms within networking drivers

On 2025-10-15 15:48:46 [+0000], Fabrizio Castro wrote:
> > The reason for the spin locks conversion to mutexes is simply to allow for
> > more preemption. A raw spin lock can not be preempted. If a lock is held
> > for more than a microsecond, you can consider it too long. There's a few
> 
> That actually gives us a good way of gauging when holding a lock is not
> appropriate. Thanks for this.

Other than that, the other thing is that if you have acquired a
raw_spinlock_t the resulting API of what can be used is minimized. You
can't invoke any function that acquires a spinlock_t which includes
something like kmalloc(, GFP_ATOMIC).

> > places that may hold locks longer (like the scheduler) but there's no
> > choice.
> > 
> > To allow spin locks to become mutexes, interrupts are also converted into
> > threads (including softirqs). There are also "local locks" that are used
> > for places that need to protect per-cpu data that is usually protected by
> > preempt_disable().
> > 
> > What issues are you having? It's likely that it can be tweaked so that you
> > do not have issues with PREEMPT_RT.
> 
> The first issue (which is the one that sparked this discussion) has been
> addressed by a patch that was sent out today.
> While the issue addressed by that patch is not related in any way to locking,
> it sparked a series of discussions within my team about locking because when
> PREEMPT_RT is used there are cases where the driver gets preempted at
> inconvenient times (while holding a spin lock, that gets translated to an
> rtmutex with PREEMPT_RT), and the issue itself is fully masked when using
> raw spin locks (and that's because the code doesn't get preempted, making
> the issue a lot less likely to show up).

The driver shouldn't get preempted under normal circumstances. It (the
threaded interrupt where the NAPI callback gets invoked) runs by default
at SCHED_FIFO 50 which is higher than any user thread (there are some HI
priority kernel threads, yes). Unless there is a user thread with a
higher priority there is no preemption.
If it gets preempted due to $reason, the "time out check function"
should check if the condition is true before reporting a failure due to
timeout. (Which mean if the timeout is exceeded but the waiting
condition is met then it should not report an error due to timeout).

> The above picked our curiosity, and therefore we had a look at what's
> under `drivers/net` and there doesn't seem to be much code using raw spin
> locks directly, hence the question.  
> 
> Here is a link to the patch I was mentioning (although not relevant to
> locking):
> https://lore.kernel.org/netdev/20251015150026.117587-4-prabhakar.mahadev-lad.rj@bp.renesas.com/T/#u
> 
> Another issue we have seen is around CPU stalling on a couple of drivers
> when PREEMPT_RT is enabled:
> https://lore.kernel.org/all/CA+V-a8tWytDVmsk-PK23e4gChXH0pMDR9cKc_xEO4WXpNtr3eA@mail.gmail.com/
> 
> The above is more luckily related to locking issues, even though we didn't
> have the time to dive into it just yet, so we are not 100% sure about what's
> happening just yet.

It shouldn't stall. From the backtrace, stmmac_tx_clean() blocks on
spinlock_t so someone should own it. Why is the owner not making
progress?
The second backtrace originates from trylock from within in
mem_cgroup_sk_charge()/ try_charge_memcg(). As far as I remember the
code, it does trylock once and if it fails it moves on. So it could show
up by chance in the backtrace while it got preempted. If it spins on the
lock via trylock then it will lockup as in the backtrace.
If it got preempted, then there needs to be a thread with higher
priority. 

LOCKDEP and CONFIG_DEBUG_ATOMIC_SLEEP might show where something goes
wrong.

> Again, thanks a lot for your answer.
> 
> Kind regards,
> Fab

Sebastian