linux-kernel - Re: [PATCH] sched_ext: Add trace point to track sched

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ea71a8d7-ba0f-4d43-9304-6544060a1bb6@igalia.com>
Date: Thu, 27 Feb 2025 19:23:23 +0900
From: Changwoo Min <changwoo@...lia.com>
To: Andrea Righi <arighi@...dia.com>, tj@...nel.org
Cc: void@...ifault.com, kernel-dev@...lia.com, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] sched_ext: Add trace point to track sched_ext core events

On 25. 2. 27. 17:19, Andrea Righi wrote:
> On Thu, Feb 27, 2025 at 05:05:54PM +0900, Changwoo Min wrote:
>> Hi Andrea,
>>
>> Thank you for the review!
>>
>> On 25. 2. 27. 16:38, Andrea Righi wrote:
>>> Hi Changwoo,
>>>
>>> On Wed, Feb 26, 2025 at 11:33:27PM +0900, Changwoo Min wrote:
>>>> Add tracing support, which may be useful for debugging sched_ext schedulers
>>>> that trigger a certain event.
>>>>
>>>> Signed-off-by: Changwoo Min <changwoo@...lia.com>
>>>> ---
>>>>    include/trace/events/sched_ext.h | 21 +++++++++++++++++++++
>>>>    kernel/sched/ext.c               |  4 ++++
>>>>    2 files changed, 25 insertions(+)
>>>>
>>>> diff --git a/include/trace/events/sched_ext.h b/include/trace/events/sched_ext.h
>>>> index fe19da7315a9..88527b9316de 100644
>>>> --- a/include/trace/events/sched_ext.h
>>>> +++ b/include/trace/events/sched_ext.h
>>>> @@ -26,6 +26,27 @@ TRACE_EVENT(sched_ext_dump,
>>>>    	)
>>>>    );
>>>> +TRACE_EVENT(sched_ext_add_event,
>>>> +	    TP_PROTO(const char *name, int offset, __u64 added),
>>>> +	    TP_ARGS(name, offset, added),
>>>> +
>>>> +	TP_STRUCT__entry(
>>>> +		__string(name, name)
>>>> +		__field(	int,		offset		)
>>>> +		__field(	__u64,		added		)
>>>> +	),
>>>> +
>>>> +	TP_fast_assign(
>>>> +		__assign_str(name);
>>>> +		__entry->offset		= offset;
>>>> +		__entry->added		= added;
>>>> +	),
>>>> +
>>>> +	TP_printk("name %s offset %d added %llu",
>>>> +		  __get_str(name), __entry->offset, __entry->added
>>>> +	)
>>>> +);
>>>
>>> Isn't the name enough to determine which event has been triggered? What are
>>> the benefits of reporting also the offset within struct scx_event_stats?
>>>
>>
>> @name and @offset are duplicated information. However, I thought
>> having two is more convenient from the users' point of view
>> because they have different pros and cons.
>>
>> @offset is quick to compare and can be used easily in the BPF
>> code, but the offset of an event can change across kernel
>> versions when new events are added. @offset would be good to
>> write a quick trace hook for debugging.
>>
>> On the other hand, @name won't change across kernel versions,
>> which is good. However, it requires more code to acutally read
>> the string in the BPF code (__data_loc for string is a 32-bit
>> integer encoding string length and location).
>>
>> Does it make sense to you?


> So, IMHO @offset to me would make sense if we guarantee that it won't
> change across kernel versions, and that's probably doable, we just need to
> make sure that we always add new events at the bottom of scx_event_stats.

Keeping the offset across versions is possible if we add new
events to the bottom. However, I am not sure if that is what we
want because we lose the nice logical grouping of the events in
the scx_event_stats struct.

> Otherwise there's the risk to break potential users of this tracepoint that
> may consider the offset like a portable ID.

Hmm... I agree. The @offset would be too low level and could the
potential source of confusion.

> Maybe we can call it @id or @event_id or similar and guarantee its
> portability? What do you think?

Now I think dropping @offset would be better in the long run
because we can maintain scx_event_stats clean and do not create
a source of confusion. Regarding the ease of using @name, adding
an code example in the commit message will suffice, something
like this:

struct tp_add_event {
	struct trace_entry ent;
	u32 __data_loc_name;
	u64 delta;
};

SEC("tracepoint/sched_ext/sched_ext_add_event")
int tp_add_event(struct tp_add_event *ctx)
{
	char event_name[128];
	unsigned short offset = ctx->__data_loc_name & 0xFFFF;
         bpf_probe_read_str((void *)event_name, 128, (char *)ctx + offset);

	bpf_printk("name %s   delta %llu", event_name, ctx->delta);
	return 0;
}

The downside of not having a numerical ID (@offset or @event_id)
is the cost of string comparison to distinguish an event type. If
we assume the probing the event is rare, it will be okay.

@Tejun, @Andrea -- What do you think? Should we provide
a portability-guaranteed @event_id after dropping @offset? Or
would it be more than sufficient to have a string-type event name?

Regards,
Changwoo Min