netdev - RE: RE: Query about the impact of using CONFIG_PREEMPT_RT on locking mechanisms within networking drivers

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID:
 <TYCPR01MB120937363DC557306A11EA863C2E9A@TYCPR01MB12093.jpnprd01.prod.outlook.com>
Date: Thu, 16 Oct 2025 12:27:04 +0000
From: Fabrizio Castro <fabrizio.castro.jz@...esas.com>
To: "bigeasy@...utronix.de" <bigeasy@...utronix.de>
CC: Steven Rostedt <rostedt@...dmis.org>, "davem@...emloft.net"
	<davem@...emloft.net>, "edumazet@...gle.com" <edumazet@...gle.com>,
	"kuba@...nel.org" <kuba@...nel.org>, "pabeni@...hat.com" <pabeni@...hat.com>,
	"andrew@...n.ch" <andrew@...n.ch>, "clrkwllms@...nel.org"
	<clrkwllms@...nel.org>, "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
	"linux-rt-devel@...ts.linux.dev" <linux-rt-devel@...ts.linux.dev>, Prabhakar
 Mahadev Lad <prabhakar.mahadev-lad.rj@...renesas.com>
Subject: RE: RE: Query about the impact of using CONFIG_PREEMPT_RT on locking
 mechanisms within networking drivers

> From: bigeasy@...utronix.de <bigeasy@...utronix.de>
> Sent: 15 October 2025 17:24
> To: Fabrizio Castro <fabrizio.castro.jz@...esas.com>
> Cc: Steven Rostedt <rostedt@...dmis.org>; davem@...emloft.net; edumazet@...gle.com; kuba@...nel.org;
> pabeni@...hat.com; andrew@...n.ch; clrkwllms@...nel.org; netdev@...r.kernel.org; linux-rt-
> devel@...ts.linux.dev; Prabhakar Mahadev Lad <prabhakar.mahadev-lad.rj@...renesas.com>
> Subject: Re: RE: Query about the impact of using CONFIG_PREEMPT_RT on locking mechanisms within
> networking drivers
> 
> On 2025-10-15 15:48:46 [+0000], Fabrizio Castro wrote:
> > > The reason for the spin locks conversion to mutexes is simply to allow for
> > > more preemption. A raw spin lock can not be preempted. If a lock is held
> > > for more than a microsecond, you can consider it too long. There's a few
> >
> > That actually gives us a good way of gauging when holding a lock is not
> > appropriate. Thanks for this.
> 
> Other than that, the other thing is that if you have acquired a
> raw_spinlock_t the resulting API of what can be used is minimized. You
> can't invoke any function that acquires a spinlock_t which includes
> something like kmalloc(, GFP_ATOMIC).

That's another very good point. Thank you!

> 
> > > places that may hold locks longer (like the scheduler) but there's no
> > > choice.
> > >
> > > To allow spin locks to become mutexes, interrupts are also converted into
> > > threads (including softirqs). There are also "local locks" that are used
> > > for places that need to protect per-cpu data that is usually protected by
> > > preempt_disable().
> > >
> > > What issues are you having? It's likely that it can be tweaked so that you
> > > do not have issues with PREEMPT_RT.
> >
> > The first issue (which is the one that sparked this discussion) has been
> > addressed by a patch that was sent out today.
> > While the issue addressed by that patch is not related in any way to locking,
> > it sparked a series of discussions within my team about locking because when
> > PREEMPT_RT is used there are cases where the driver gets preempted at
> > inconvenient times (while holding a spin lock, that gets translated to an
> > rtmutex with PREEMPT_RT), and the issue itself is fully masked when using
> > raw spin locks (and that's because the code doesn't get preempted, making
> > the issue a lot less likely to show up).
> 
> The driver shouldn't get preempted under normal circumstances. It (the
> threaded interrupt where the NAPI callback gets invoked) runs by default
> at SCHED_FIFO 50 which is higher than any user thread (there are some HI
> priority kernel threads, yes). Unless there is a user thread with a
> higher priority there is no preemption.
> If it gets preempted due to $reason, the "time out check function"
> should check if the condition is true before reporting a failure due to
> timeout. (Which mean if the timeout is exceeded but the waiting
> condition is met then it should not report an error due to timeout).

With respect to this issue specifically, we are going into timeout
for other reasons. There is a bug with the driver that only shows up
when the driver gets preempted in a specific place in the code, the
DMA picks things up when the memory contains the wrong data, after that
it cannot recover anymore, hence the timeout. So the timeout is related
to the fallout of breaking the multi-description process, with the problem
being a bug triggered by the driver being preempted.

> 
> > The above picked our curiosity, and therefore we had a look at what's
> > under `drivers/net` and there doesn't seem to be much code using raw spin
> > locks directly, hence the question.
> >
> > Here is a link to the patch I was mentioning (although not relevant to
> > locking):
> > https://lore.kernel.org/netdev/20251015150026.117587-4-prabhakar.mahadev-lad.rj@bp.renesas.com/T/#u
> >
> > Another issue we have seen is around CPU stalling on a couple of drivers
> > when PREEMPT_RT is enabled:
> > https://lore.kernel.org/all/CA+V-a8tWytDVmsk-PK23e4gChXH0pMDR9cKc_xEO4WXpNtr3eA@mail.gmail.com/
> >
> > The above is more luckily related to locking issues, even though we didn't
> > have the time to dive into it just yet, so we are not 100% sure about what's
> > happening just yet.
> 
> It shouldn't stall. From the backtrace, stmmac_tx_clean() blocks on
> spinlock_t so someone should own it. Why is the owner not making
> progress?
> The second backtrace originates from trylock from within in
> mem_cgroup_sk_charge()/ try_charge_memcg(). As far as I remember the
> code, it does trylock once and if it fails it moves on. So it could show
> up by chance in the backtrace while it got preempted. If it spins on the
> lock via trylock then it will lockup as in the backtrace.
> If it got preempted, then there needs to be a thread with higher
> priority.
> 
> LOCKDEP and CONFIG_DEBUG_ATOMIC_SLEEP might show where something goes
> wrong.

Thank you for the pointers on this, we'll certainly give this a go!

Kind regards,
Fab

> 
> > Again, thanks a lot for your answer.
> >
> > Kind regards,
> > Fab
> 
> Sebastian