linux-kernel - Re: [PATCH v4 1/2] sched_ext: Add lockless peek operation for DSQs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CACYs5Aax6X_8hrYtjC4yAMy5K6e3CwxSjEiW3mK8wfePq3oyBg@mail.gmail.com>
Date: Wed, 15 Oct 2025 10:05:28 -0400
From: Ryan Newton <rrnewton@...il.com>
To: Christian Loehle <christian.loehle@....com>
Cc: linux-kernel@...r.kernel.org, sched-ext@...ts.linux.dev, tj@...nel.org, 
	arighi@...dia.com, newton@...a.com
Subject: Re: [PATCH v4 1/2] sched_ext: Add lockless peek operation for DSQs

On Wed, Oct 15, 2025 at 5:46 AM Christian Loehle
<christian.loehle@....com> wrote:
> > +/**
> > + * scx_bpf_dsq_peek - Lockless peek at the first element.
> > + * @dsq_id: DSQ to examine.
> > + *
> > + * Read the first element in the DSQ. This is semantically equivalent to using
> > + * the DSQ iterator, but is lockfree.
> > + *
> > + * Returns the pointer, or NULL indicates an empty queue OR internal error.
> > + */
> > +__bpf_kfunc struct task_struct *scx_bpf_dsq_peek(u64 dsq_id)
> > +{
>
> Obviously there's no guarantee that scx_bpf_dsq_peek() will return the task you're gonna
> get when moving a task to another DSQ or a local one. It's pretty obvious from the kernel
> perspective, but I wonder if it's worth documenting here too.

Hello Christian, thanks for looking at the patches.

Absolutely, whenever using a lockless operation you have to watch out
for TOCTOU bugs. In the schedulers we've experimented with, doing a
sweep of peeks just gives you a higher chance of getting what you're
looking for when you subsequently lock the remote DSQ for
popping/stealing a task. I can add a reminder in the comment on the
function.

> > +/*
> > + * v6.19: Introduce lockless peek API for user DSQs.
> > + *
> > + * Preserve the following macro until v6.21.
> > + */
> > +static inline struct task_struct *__COMPAT_scx_bpf_dsq_peek(u64 dsq_id)
> > +{
> > +     struct task_struct *p = NULL;
> > +     struct bpf_iter_scx_dsq it;
> > +
> > +     if (bpf_ksym_exists(scx_bpf_dsq_peek))
> > +             return scx_bpf_dsq_peek(dsq_id);
> > +     if (!bpf_iter_scx_dsq_new(&it, dsq_id, 0))
> > +             p = bpf_iter_scx_dsq_next(&it);
> > +     bpf_iter_scx_dsq_destroy(&it);
> > +     return p;
> > +}
> > +
> I think there's an argument to be made for this to just return NULL on
> !bpf_ksym_exists(scx_bpf_dsq_peek), the caller needs to handle that anyway
> and at least it's guaranteed to be lockfree then?
> I don't have a strong leaning either way, probably is fine if documented.

We went back and forth on this patch a lot, and did explore versions
where peek is "best effort" and its semantics allow it to return NULL
for any reason. But that provides much less information to the caller,
because now NULL is supposed to indicate that the queue was empty at a
point in time. For that reason we ended up with this version where
peek provides both positive info (something is/was there) and negative
(nothing is/was there).

Best,
 -Ryan