linux-kernel - Re: [PATCH 1/7] sched_ext: Implement event counter infrastructure and add an event

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Z4qETRVdvJLDSUHg@slm.duckdns.org>
Date: Fri, 17 Jan 2025 06:24:45 -1000
From: Tejun Heo <tj@...nel.org>
To: Changwoo Min <changwoo@...lia.com>
Cc: void@...ifault.com, arighi@...dia.com, kernel-dev@...lia.com,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/7] sched_ext: Implement event counter infrastructure
 and add an event

Hello,

On Fri, Jan 17, 2025 at 04:08:45PM +0900, Changwoo Min wrote:
> $COMPONENT_$EVENT sounds more systematic. Thanks for the
> suggestion. What about SELECT_CPU_FALLBACK?

Sounds good.

> > This feels a bit odd to me. Why does it need both the struct fields and
> > indices? Can we do either one of those?
> 
> Before submitting the patch set, I tested one approach using
> purely enums storing the actual counters in an array and another
> (current) approach using struct fields. I rejected the first
> approach because the BPF verifier does not allow changing the
> array size, causing the BPF binary compatibility problem. The

That makes sense. It'd be useful to note that on top of the struct
definition.

> second approach is satisfactory regarding the BPF binary
> compatibility by CO-RE. However, it is a little bit cumbersome to
> iterate all the events, so I added the enums. I know the enums
> are only usable in the kernel code due to the binary
> compatibility, but I thought adding the enums rather than
> bloating the scx_dump_state() code at the end would be better.
> Does it make sense to you?

Let's just use the structs and open code dumping. We can pack the dump
output better that way too and I don't think BPF scheds would have much use
for enum iterations.

...
> > > +/**
> > > + * scx_add_event - Increase an event counter for 'name' by 'cnt'
> > > + * @name: an event name defined in struct scx_event_stat
> > > + * @cnt: the number of the event occured
> > > + */
> > > +#define scx_add_event(name, cnt) ({						\
> > > +	struct scx_event_stat *__e;						\
> > > +	__e = get_cpu_ptr(&event_stats);					\
> > > +	WRITE_ONCE(__e->name, __e->name+ (cnt));				\
> > > +	put_cpu_ptr(&event_stats);						\
> > 
> > this_cpu_add()?
> 
> That's handier. I will change it.

Note that there's __this_cpu_add() too which can be used from sections of
code that already disables preemption. On x86, both variants cost the same
but on arm __this_cpu_add() is cheaper as it can skip preemption off/on.
Given that most of these counters are modified while holding rq lock, it may
make sense to provide __ version too.

...
> > Also, I'm not sure this is a useful event to count. False ops_cpu_valid()
> > indicates that the returned CPU is not even possible and the scheduler is
> > ejected right away. What's more interesting is
> > kernel/sched/core.c::select_task_rq() tripping on !is_cpu_allowed() and
> > falling back using select_fallback_rq().
> > 
> > We can either hook into core.c or just compare the ops.select_cpu() picked
> > CPU against the CPU the task ends up on in enqueue_task_scx().
> 
> Modifying core.c will be more direct and straightforward. Also,
> I think it would be better to separate this commit into two: one
> for the infra-structure and another for the SELECT_CPU_FALLBACK
> event, which touches core.c. I will move the necessary code in
> the infrastructure into kernel/sched/sched.h, so we can use
> scx_add_event() in core.c.

Hmm.... yeah, both approaches have pros and cons but I kinda like the idea
of restricting the events within ext.c and here detecting the fallback
condition is pretty trivial. I don't know.

Thanks.

-- 
tejun