linux-kernel - Re: [PATCH 1/7] sched_ext: Implement event counter infrastructure and add an event

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <0c77afb9-1768-4141-8d92-d38097c4df0b@igalia.com>
Date: Sat, 18 Jan 2025 09:27:18 +0900
From: Changwoo Min <changwoo@...lia.com>
To: Tejun Heo <tj@...nel.org>
Cc: void@...ifault.com, arighi@...dia.com, kernel-dev@...lia.com,
 linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/7] sched_ext: Implement event counter infrastructure and
 add an event

Hello,

On 25. 1. 18. 01:24, Tejun Heo wrote:
>> second approach is satisfactory regarding the BPF binary
>> compatibility by CO-RE. However, it is a little bit cumbersome to
>> iterate all the events, so I added the enums. I know the enums
>> are only usable in the kernel code due to the binary
>> compatibility, but I thought adding the enums rather than
>> bloating the scx_dump_state() code at the end would be better.
>> Does it make sense to you?
> 
> Let's just use the structs and open code dumping. We can pack the dump
> output better that way too and I don't think BPF scheds would have much use
> for enum iterations.

Currently, enums are not exposed to BPF schedulers. Also, BPF
schedulers should not rely on the enums because the enums are
coupled with a specific layout of the struct, so that would cause
a BPF binary compatibility problem.

Could you explain more on the code dumping? I want at least print
the event string for easier interpretation.

> ...
>>>> +/**
>>>> + * scx_add_event - Increase an event counter for 'name' by 'cnt'
>>>> + * @name: an event name defined in struct scx_event_stat
>>>> + * @cnt: the number of the event occured
>>>> + */
>>>> +#define scx_add_event(name, cnt) ({						\
>>>> +	struct scx_event_stat *__e;						\
>>>> +	__e = get_cpu_ptr(&event_stats);					\
>>>> +	WRITE_ONCE(__e->name, __e->name+ (cnt));				\
>>>> +	put_cpu_ptr(&event_stats);						\
>>>
>>> this_cpu_add()?
>>
>> That's handier. I will change it.
> 
> Note that there's __this_cpu_add() too which can be used from sections of
> code that already disables preemption. On x86, both variants cost the same
> but on arm __this_cpu_add() is cheaper as it can skip preemption off/on.
> Given that most of these counters are modified while holding rq lock, it may
> make sense to provide __ version too.

Sounds good. I will add __scx_add_event() following the
convention of this_cpu_add() and __this_cpu_add().

> 
> ...
>>> Also, I'm not sure this is a useful event to count. False ops_cpu_valid()
>>> indicates that the returned CPU is not even possible and the scheduler is
>>> ejected right away. What's more interesting is
>>> kernel/sched/core.c::select_task_rq() tripping on !is_cpu_allowed() and
>>> falling back using select_fallback_rq().
>>>
>>> We can either hook into core.c or just compare the ops.select_cpu() picked
>>> CPU against the CPU the task ends up on in enqueue_task_scx().
>>
>> Modifying core.c will be more direct and straightforward. Also,
>> I think it would be better to separate this commit into two: one
>> for the infra-structure and another for the SELECT_CPU_FALLBACK
>> event, which touches core.c. I will move the necessary code in
>> the infrastructure into kernel/sched/sched.h, so we can use
>> scx_add_event() in core.c.
> 
> Hmm.... yeah, both approaches have pros and cons but I kinda like the idea
> of restricting the events within ext.c and here detecting the fallback
> condition is pretty trivial. I don't know.

Right, it will be possible track the same thing without touch
core.c (that's too much) by replicating the conditions in core.c
to ext.c. I will dig the code more to figure out what the best
approach is.

Regards,
Changwoo Min