lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 4 Apr 2023 18:17:20 +0100
From:   Qais Yousef <qyousef@...alina.io>
To:     Lukasz Luba <lukasz.luba@....com>
Cc:     linux-kernel@...r.kernel.org, linux-trace-kernel@...r.kernel.org,
        rostedt@...dmis.org, mhiramat@...nel.org, mingo@...hat.com,
        peterz@...radead.org, juri.lelli@...hat.com,
        vincent.guittot@...aro.org, dietmar.eggemann@....com,
        bsegall@...gle.com, mgorman@...e.de, bristot@...hat.com,
        vschneid@...hat.com, delyank@...com, qyousef@...gle.com
Subject: Re: [PATCH 1/3] sched/tp: Add new tracepoint to track uclamp set
 from user-space

On 04/03/23 17:47, Lukasz Luba wrote:
> Hi Qais,
> 
> On 4/3/23 14:46, Qais Yousef wrote:
> > Hi Lukasz!
> > 
> > On 03/22/23 15:18, Lukasz Luba wrote:
> > > The user-space can set uclamp value for a given task. It impacts task
> > > placement decisions made by the scheduler. This is very useful information
> > > and helps to understand the system behavior or track improvements in
> > > middleware and applications which start using uclamp mechanisms and report
> > > better performance in tests.
> > 
> > We do have uclamp trace events in sched_tp, why are they not sufficient?
> > 
> > 	https://github.com/qais-yousef/sched_tp/blob/main/sched_events.h#L233
> > 
> > Do you really want to know the exact time the value has changed?
> 
> Yes, that's why these new are triggered instantly after userspace wanted
> to set the new uclamp values. We are going to have a few different
> uclamp implementations: one in mainline and X in Android vendor kernels.

This is not true. As you can see everyone tries to push fixes for issues they
find, but this not happening fast enough and they get forced to carry out of
tree stuff at the end. Out of tree stuff for the broken bits that is.

> The goal is to have only one... We will have to experiment to find

This statement gives a very strong message to everyone out here. And I'd like
to stress strongly that this is NOT true. Broken bits, yes. But essential bits
that works are used the same.

We can do a better job and fix things faster upstream. I am committed to
sorting all these stuff out anyway.

> the behavior of those algorithms and understand the differences. Since
> uclamp is part of this 'control-chain' of CPU frequency and also
> task placement - it would be really tricky to figure our everything.
> The analysis on traces are crucial for this.

I think the existing uclamp trace event is enough to be honest. But up to the
maintainer if they want to add this new specific one. The two tps seem a bit of
a clutter to me. With kprobes and bpf a lot can be done on the fly if you want
to reverse engineer some stuff.

> 
> > 
> > Would it make sense to introduce a generic sched_setscheduler tracepoint
> > instead? Although this might not be necessary as I think we can use
> > register_kprobe() to register a callback and create a new event without any
> > additional tracepoint. sched_setscheduler() is not inlined so should be easy to
> > hook into and create events, no?
> 
> This looks very complex and we already have our LISA tool with the

It's not. Here's a PoC that only does a trace_printk(), it's simple. You don't
need to do it in a module even, see below.

	https://github.com/qais-yousef/sched_tp/compare/main...kprobe-poc

It did highlight that sugov_should_update_freq() ends up actually being inlined
though :(. It should work for sched_setscheduler(). You'd want to use
register_kprobe() instead for that.

> module to change the tracepoints into trace events and build them.
> I wanted to be aligned with that design, which might look a bit
> old-fashion but is simple IMO.

trace-cmd, bpf and I believe perf, all can do the same; and they support
kprobes not just tracepoints.

And they all boil down to the same underlying mechanism

	https://www.kernel.org/doc/html/v6.1/trace/kprobetrace.html

No need to re-invent a new wheel ;-)

> The 'sched_setscheduler tracepoint' might be a too big for this
> purpose.

Sorry I am usually supportive for more tracepoints, but I feel skeptical this
time. That said, I really don't mind if the maintainers are okay with it. So
while I'm not convinced, but I don't have any objection either.

> Thanks for the comments :)

Thanks!

--
Qais Yousef

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ