linux-kernel - Re: [PATCH] drm/sched: Fix UB in spsc

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2c72eb6e-7792-4212-b06f-5300bc9a42f9@amd.com>
Date: Mon, 10 Nov 2025 15:07:28 +0100
From: Christian König <christian.koenig@....com>
To: phasta@...nel.org, David Airlie <airlied@...il.com>,
 Simona Vetter <simona@...ll.ch>, Alex Deucher <alexander.deucher@....com>,
 Andrey Grodzovsky <Andrey.Grodzovsky@....com>, dakr@...nel.org,
 Matthew Brost <matthew.brost@...el.com>
Cc: dri-devel@...ts.freedesktop.org, linux-kernel@...r.kernel.org,
 stable@...r.kernel.org
Subject: Re: [PATCH] drm/sched: Fix UB in spsc_queue

On 11/10/25 13:27, Philipp Stanner wrote:
> Please don't top-post :(
> FDFY:
> 
> 
>> On 11/10/25 09:19, Philipp Stanner wrote:
>>> The spsc_queue is an unlocked, highly asynchronous piece of
>>> infrastructure. Its inline function spsc_queue_peek() obtains the head
>>> entry of the queue.
>>>
>>> This access is performed without READ_ONCE() and is, therefore,
>>> undefined behavior. In order to prevent the compiler from ever
>>> reordering that access, or even optimizing it away, a READ_ONCE() is
>>> strictly necessary. This is easily proven by the fact that
>>> spsc_queue_pop() uses this very pattern to access the head.
>>>
>>> Add READ_ONCE() to spsc_queue_peek().
>>>
>>> Cc: stable@...r.kernel.org # v4.16+
>>> Fixes: 27105db6c63a ("drm/amdgpu: Add SPSC queue to scheduler.")
>>> Signed-off-by: Philipp Stanner <phasta@...nel.org>
>>> ---
>>> I think this makes it less broken, but I'm not even sure if it's enough
>>> or more memory barriers or an rcu_dereference() would be correct. The
>>> spsc_queue is, of course, not documented and the existing barrier
>>> comments are either false or not telling.
>>>
>>> If someone has an idea, shoot us the info. Otherwise I think this is the
>>> right thing to do for now.
>>>
>>> P.
>>> ---
>>>  include/drm/spsc_queue.h | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/include/drm/spsc_queue.h b/include/drm/spsc_queue.h
>>> index ee9df8cc67b7..39bada748ffc 100644
>>> --- a/include/drm/spsc_queue.h
>>> +++ b/include/drm/spsc_queue.h
>>> @@ -54,7 +54,7 @@ static inline void spsc_queue_init(struct spsc_queue *queue)
>>>  
>>>  static inline struct spsc_node *spsc_queue_peek(struct spsc_queue *queue)
>>>  {
>>> - return queue->head;
>>> + return READ_ONCE(queue->head);
>>>  }
> 
> On Mon, 2025-11-10 at 12:24 +0100, Christian König wrote:
>> As far as I can see that is not correct or rather not complete.
> 
> It cannot be incorrect by definition, because it simply ensures that
> the load will actually take place there.
> 
> Incomplete it can be.
> 
>>
>> The peek function should only be used to opportunistically look at the top of the queue. It would only be problematic if it returns a non NULL value once and then a NULL value later.
>>
>> The whole idea of the SPSC is that it is barrier-free and the signaling of new entries to the consumer side is providing the barrier.
>>
>> So basically on the provider side you have
>> spsc_push(entry)
>> wake_up(consumer)
>>
>> And on the consumer side you have:
>>
>> woken_up_by_provider() {
>>  entry = spsc_peek();
>>  ...
>>  spsc_pop();
>> }
> 
> Well no, the scheduler can pick up the next job whenever it feels like
> it. Restarting it for example will have it peek into your queue,
> regardless of wake events.
> 
> In any case this is a highly fragile design. See below, too.
> 
> 
>>
>> The problem we are facing here is that the spsc only provides the guarantee that you see the entry pointer, but not the content of entry itself.
>>
>> So use cases like:
>>
>> woken_up_by_provider() {
>>  while (entry = spsc_peek()) {
>>  ...
>>  spsc_pop();
>>  }
>> }
>>
>> Are illegal since you don't have the correct memory barriers any more.
> 
> I can't follow. Are you saying that spsc_queue_peek() is correct
> because you know for sure that when it's used no one else might be
> changing that pointer?

Correct, yes. That's the whole idea. I mean SPSC stands for single producer single consumer.

> 
> Even if that were true this design is highly fragile.
> 
>>
>> Took me an eternity to understand that as well, so bear with me that I didn't previously explained that.
> 
> s/explain/document :)
> 
> As discussed few weeks ago with Sima and Tvrtko, what we really need to
> move to in all of DRM is this development pattern:
> 
>    1. For parallel code, at first by default use a boring, horribly
>       slow (sarcasm) spinlock. BTW I'm not even convinced that a
>       spinlock is slower than lockless tricks. Paul's book states that
>       a CAS atomic instruction takes about 60 cycles, and taking a lock
>       takes 100.

The problem isn't the burned CPU cycles, but rather the cache lines moved between CPUs.

Keep in mind that you can rather do a fused multiple add for a full 4x4 matrix before you take a single cache line miss.

>    2. If you want to do parallelism without locks, you need to justify
>       it really well. "rmb() so list_empty() works without a lock"
>       doesn't qualify, but real use case speedups.
>    3. If you write lockless infrastructure, you need to document it
>       really well. In particular you need to state:
>          1. How it works
>          2. What the rules are
> 
> See llist.h as an example. It clearly states when you need a lock and
> when you don't.

The llist.h is actually pretty similar to the SPSC. I'm wondering why they don't have the same issues? E.g. is xchg() providing the memory barriers?


> Or RCU. No one could use it without such good
> documentation.
> 
> I have no idea whether spsc_queue is correctly implemented (I doubt
> it), and if even a highly experienced dev like you takes "an eternity"
> (quote) to understand it, one is really tempted to dream of spinlock_t,
> which has very clear semantics and is easily understood even by
> beginners.
> 
>>
>> Question is what should we do?
> 
> Easy!
> 
> Mid-term, we should replace spsc_queue with a boring, locked, super-
> slow linked list ^_^

That's what the scheduler started with and the reason why that linked list was replaced with first a KFIFO and than the SPSC was because of lacking performance.

We could go back to the KFIFO design again, but a (double?) linked list is clearly going to show the same performance problems which originally triggered moving away from it.

> 
> The spsc_queue was designed and – perhaps – perf-measured when RR was
> the only scheduling policy.
> 
> Since the FIFO rework, where FIFO became the default policy, we now
> access our super efficient super fast lockless queue most of the time
> with the spinlock being taken immediately afterwards anyways. So we
> almost have a locked lockless queue now.
> 
> https://elixir.bootlin.com/linux/v6.18-rc4/source/drivers/gpu/drm/scheduler/sched_entity.c#L502

That is not that relevant.

> Only push_job() often (?) still runs locklessly, but I'm not at all
> convinced that this is worth it. Not even performance-wise.

That is what is relevant. And yes the difference was totally measurable, that's why this was originally implemented.

> 
> If you look at spsc_queue_push() you see that it
>    1. disables preemption,
>    2. uses atomic instructions and
>    3. has a ton of memory barries
> 
> – in other words, this is almost literally a manual re-implementation
> of a spinlock, just without mutual exclusion…

The problem is really to separate the push from the pop side so that as few cache lines as possible are transferred from one CPU to another. 

Regards,
Christian.

> 
> 
> And even if something like spsc_queue would make sense and be actually
> worth it, then it should be provided to the entire kernel as common
> infrastructure, like llist.h, not hidden somewhere in DRM.
> 
> P.