[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Zn8xzgG4f8vByVL3@slm.duckdns.org>
Date: Fri, 28 Jun 2024 11:57:34 -1000
From: Tejun Heo <tj@...nel.org>
To: Alexei Starovoitov <alexei.starovoitov@...il.com>
Cc: Alexei Starovoitov <ast@...nel.org>,
LKML <linux-kernel@...r.kernel.org>, bpf <bpf@...r.kernel.org>,
David Vernet <void@...ifault.com>
Subject: Re: [PATCH sched_ext/for-6.11 2/2] sched_ext: Implement
scx_bpf_consume_task()
Hello, Alexei.
On Thu, Jun 27, 2024 at 06:34:14PM -0700, Alexei Starovoitov wrote:
...
> > +__bpf_kfunc bool __scx_bpf_consume_task(unsigned long it, struct task_struct *p)
> > +{
> > + struct bpf_iter_scx_dsq_kern *kit = (void *)it;
> > + struct scx_dispatch_q *dsq, *kit_dsq;
> > + struct scx_dsp_ctx *dspc = this_cpu_ptr(scx_dsp_ctx);
> > + struct rq *task_rq;
> > + u64 kit_dsq_seq;
> > +
> > + /* can't trust @kit, carefully fetch the values we need */
> > + if (get_kernel_nofault(kit_dsq, &kit->dsq) ||
> > + get_kernel_nofault(kit_dsq_seq, &kit->dsq_seq)) {
> > + scx_ops_error("invalid @it 0x%lx", it);
> > + return false;
> > + }
>
> With scx_bpf_consume_task() it's only a compile time protection from bugs.
> Since kfunc doesn't dereference any field in kit_dsq it won't crash
> immediately, but let's figure out how to make it work properly.
>
> Since kit_dsq and kit_dsq_seq are pretty much anything in this implementation
> can they be passed as two scalars instead ?
> I guess not, since tricking dsq != kit_dsq and
> time_after64(..,kit_dsq_seq) can lead to real issues ?
That actually should be okay. It can lead to real but not crashing issues.
The system integrity is going to be fine no matter what the passed in seq
value is. It can just lead to confusing behaviors from the BPF scheduler's
POV, so it's fine to put the onus on the BPF scheduler.
> Can some of it be mitigated by passing dsq into kfunc that
> was used to init the iter ?
> Then kfunc will read dsq->seq from it instead of kit->dsq_seq ?
I don't quite follow this part. bpf_iter_scx_dsq_new() takes @dsq_id. The
function looks up the matching DSQ and then the iterator remembers the
current dsq->seq which serves as the threshold (tasks queued afterwards are
ignored). ie. The value needs to be copied at that point to guarantee that
iteration ignores tasks that are queued after the iteration started.
> > + /*
> > + * Did someone else get to it? @p could have already left $dsq, got
> > + * re-enqueud, or be in the process of being consumed by someone else.
> > + */
> > + if (unlikely(p->scx.dsq != dsq ||
> > + time_after64(p->scx.dsq_seq, kit_dsq_seq) ||
>
> In the previous patch you do:
> (s32)(p->scx.dsq_seq - kit->dsq_seq) > 0
> and here
> time_after64().
> Close enough, but 32 vs 64 and equality difference?
Sorry about the sloppiness. It was originally u64 and then I forgot to
update here after changing them to u32. I'll add a helper for the comparison
and update both sites.
Going back to the sequence number barrier, it's a sort of scoping and one
way to solve it is adding an explicit helper to fetch the target DSQ's
sequence number and then pass it to the consume_task function. ie. sth like:
barrier_seq = scx_bpf_dsq_seq(dsq_id);
bpf_for_each(scx_dsq, p, dsq_id, 0) {
...
scx_bpf_consume_task(p, dsq_id, barrier_seq);
}
This should work but it's not as neat in that it now involves three dsq_id
-> DSQ lookups. Also, there's extra subtlety arising from @barrier_seq being
different from the barrier seq that the scx_dsq iterator would be using.
As a DSQ iteration needs to have its own barrier sequence, maybe the answer
is to require passing it in as an explicit parameter. ie.:
barrier_seq = scx_bpf_dsq_seq(dsq_id);
bpf_for_each(scx_dsq, p, dsq_id, barrier_seq, 0) {
...
scx_bpf_consume_task(p, dsq_id, barrier_seq);
}
There still are three dsq_id lookups but at least there is just one sequence
number in play. It is more cumbersome tho compared to the current interface:
bpf_for_each(scx_dsq, p, dsq_id, 0) {
...
scx_bpf_consume_task(BPF_FOR_EACH_ITER, p);
}
What do you think?
Thanks.
--
tejun
Powered by blists - more mailing lists