linux-kernel - Re: [PATCH] drm/sched: Fix UB in spsc

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ee9fe54f3764bc0ee4ebafe5c10ad4afe748ef19.camel@mailbox.org>
Date: Mon, 10 Nov 2025 13:27:22 +0100
From: Philipp Stanner <phasta@...lbox.org>
To: Christian König <christian.koenig@....com>, Philipp
 Stanner <phasta@...nel.org>, David Airlie <airlied@...il.com>, Simona
 Vetter <simona@...ll.ch>,  Alex Deucher <alexander.deucher@....com>, Andrey
 Grodzovsky <Andrey.Grodzovsky@....com>, dakr@...nel.org,  Matthew Brost
 <matthew.brost@...el.com>
Cc: dri-devel@...ts.freedesktop.org, linux-kernel@...r.kernel.org, 
	stable@...r.kernel.org
Subject: Re: [PATCH] drm/sched: Fix UB in spsc_queue

Please don't top-post :(
FDFY:


> On 11/10/25 09:19, Philipp Stanner wrote:
> > The spsc_queue is an unlocked, highly asynchronous piece of
> > infrastructure. Its inline function spsc_queue_peek() obtains the head
> > entry of the queue.
> > 
> > This access is performed without READ_ONCE() and is, therefore,
> > undefined behavior. In order to prevent the compiler from ever
> > reordering that access, or even optimizing it away, a READ_ONCE() is
> > strictly necessary. This is easily proven by the fact that
> > spsc_queue_pop() uses this very pattern to access the head.
> > 
> > Add READ_ONCE() to spsc_queue_peek().
> > 
> > Cc: stable@...r.kernel.org # v4.16+
> > Fixes: 27105db6c63a ("drm/amdgpu: Add SPSC queue to scheduler.")
> > Signed-off-by: Philipp Stanner <phasta@...nel.org>
> > ---
> > I think this makes it less broken, but I'm not even sure if it's enough
> > or more memory barriers or an rcu_dereference() would be correct. The
> > spsc_queue is, of course, not documented and the existing barrier
> > comments are either false or not telling.
> > 
> > If someone has an idea, shoot us the info. Otherwise I think this is the
> > right thing to do for now.
> > 
> > P.
> > ---
> >  include/drm/spsc_queue.h | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/include/drm/spsc_queue.h b/include/drm/spsc_queue.h
> > index ee9df8cc67b7..39bada748ffc 100644
> > --- a/include/drm/spsc_queue.h
> > +++ b/include/drm/spsc_queue.h
> > @@ -54,7 +54,7 @@ static inline void spsc_queue_init(struct spsc_queue *queue)
> >  
> >  static inline struct spsc_node *spsc_queue_peek(struct spsc_queue *queue)
> >  {
> > - return queue->head;
> > + return READ_ONCE(queue->head);
> >  }

On Mon, 2025-11-10 at 12:24 +0100, Christian König wrote:
> As far as I can see that is not correct or rather not complete.

It cannot be incorrect by definition, because it simply ensures that
the load will actually take place there.

Incomplete it can be.

> 
> The peek function should only be used to opportunistically look at the top of the queue. It would only be problematic if it returns a non NULL value once and then a NULL value later.
> 
> The whole idea of the SPSC is that it is barrier-free and the signaling of new entries to the consumer side is providing the barrier.
> 
> So basically on the provider side you have
> spsc_push(entry)
> wake_up(consumer)
> 
> And on the consumer side you have:
> 
> woken_up_by_provider() {
>  entry = spsc_peek();
>  ...
>  spsc_pop();
> }

Well no, the scheduler can pick up the next job whenever it feels like
it. Restarting it for example will have it peek into your queue,
regardless of wake events.

In any case this is a highly fragile design. See below, too.


> 
> The problem we are facing here is that the spsc only provides the guarantee that you see the entry pointer, but not the content of entry itself.
> 
> So use cases like:
> 
> woken_up_by_provider() {
>  while (entry = spsc_peek()) {
>  ...
>  spsc_pop();
>  }
> }
> 
> Are illegal since you don't have the correct memory barriers any more.

I can't follow. Are you saying that spsc_queue_peek() is correct
because you know for sure that when it's used no one else might be
changing that pointer?

Even if that were true this design is highly fragile.

> 
> Took me an eternity to understand that as well, so bear with me that I didn't previously explained that.

s/explain/document :)

As discussed few weeks ago with Sima and Tvrtko, what we really need to
move to in all of DRM is this development pattern:

   1. For parallel code, at first by default use a boring, horribly
      slow (sarcasm) spinlock. BTW I'm not even convinced that a
      spinlock is slower than lockless tricks. Paul's book states that
      a CAS atomic instruction takes about 60 cycles, and taking a lock
      takes 100.
   2. If you want to do parallelism without locks, you need to justify
      it really well. "rmb() so list_empty() works without a lock"
      doesn't qualify, but real use case speedups.
   3. If you write lockless infrastructure, you need to document it
      really well. In particular you need to state:
         1. How it works
         2. What the rules are

See llist.h as an example. It clearly states when you need a lock and
when you don't. Or RCU. No one could use it without such good
documentation.

I have no idea whether spsc_queue is correctly implemented (I doubt
it), and if even a highly experienced dev like you takes "an eternity"
(quote) to understand it, one is really tempted to dream of spinlock_t,
which has very clear semantics and is easily understood even by
beginners.

> 
> Question is what should we do?

Easy!

Mid-term, we should replace spsc_queue with a boring, locked, super-
slow linked list ^_^

The spsc_queue was designed and – perhaps – perf-measured when RR was
the only scheduling policy.

Since the FIFO rework, where FIFO became the default policy, we now
access our super efficient super fast lockless queue most of the time
with the spinlock being taken immediately afterwards anyways. So we
almost have a locked lockless queue now.

https://elixir.bootlin.com/linux/v6.18-rc4/source/drivers/gpu/drm/scheduler/sched_entity.c#L502


Only push_job() often (?) still runs locklessly, but I'm not at all
convinced that this is worth it. Not even performance-wise.

If you look at spsc_queue_push() you see that it
   1. disables preemption,
   2. uses atomic instructions and
   3. has a ton of memory barries

– in other words, this is almost literally a manual re-implementation
of a spinlock, just without mutual exclusion…


And even if something like spsc_queue would make sense and be actually
worth it, then it should be provided to the entire kernel as common
infrastructure, like llist.h, not hidden somewhere in DRM.

P.