linux-kernel - Re: [RESEND][PATCH v2 1/3] sched/tp: Add new tracepoint to track uclamp set from user-space

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20230630114950.zoocytnpvdrxgnss@airbuntu>
Date:   Fri, 30 Jun 2023 12:49:50 +0100
From:   Qais Yousef <qyousef@...alina.io>
To:     Masami Hiramatsu <mhiramat@...nel.org>
Cc:     Lukasz Luba <lukasz.luba@....com>, linux-kernel@...r.kernel.org,
        linux-trace-kernel@...r.kernel.org, rafael@...nel.org,
        linux-pm@...r.kernel.org, rostedt@...dmis.org, mingo@...hat.com,
        peterz@...radead.org, juri.lelli@...hat.com,
        vincent.guittot@...aro.org, dietmar.eggemann@....com,
        bsegall@...gle.com, mgorman@...e.de, bristot@...hat.com,
        vschneid@...hat.com, delyank@...com, qyousef@...gle.com
Subject: Re: [RESEND][PATCH v2 1/3] sched/tp: Add new tracepoint to track
 uclamp set from user-space

On 06/21/23 12:25, Masami Hiramatsu wrote:
> On Wed, 31 May 2023 19:26:29 +0100
> Qais Yousef <qyousef@...alina.io> wrote:
> 
> > On 05/22/23 15:57, Lukasz Luba wrote:
> > > The user-space can set uclamp value for a given task. It impacts task
> > > placement decisions made by the scheduler. This is very useful information
> > > and helps to understand the system behavior or track improvements in
> > > middleware and applications which start using uclamp mechanisms and report
> > > better performance in tests.
> > 
> > Do you mind adding a generic one instead please? And explain why we can't just
> > attach to the syscall via kprobes? I think you want to bypass the permission
> > checks, so maybe a generic tracepoint after that might be justifiable?
> 
> Could you tell me more about this point? I would like to know what kind of
> permission checks can be bypassed with tracepoints.

Sorry bad usage of English from my end.

The syscall can fail if the caller doesn't have permission to change the
attribute (some of them are protected with CAP_NICE) or if the boundary check
fails. The desire here is to emit a tracepoint() when the user successfully
changes an attribute of a task.

Lukasz would like to have this tracepoint to help debug and analyse workloads.
We are not really bypassing anything. So to rephrase, emit the tracepointn if
the syscall is successfully changing an attribute.

> 
> > Then anyone can use it to track how userspace has changed any attributes for
> > a task, not just uclamp.
> 
> I guess Uclamp is not controlled by syscall but from kernel internal
> sched_setattr/setscheduler() too. Anyway I agree that it can be more generic
> tracepoint, something like trace_sched_set_scheduer(task, attr).

Yes. Which is something worries me and I had a series in the past to hide it.
The uclamp range is abstracted and has no meaning in general and should be set
specifically to each system. e.g: 512 means half the system performance level,
but if the system is over powered this could be too fast, and if it's
underpowered it could be too slow. It must be set by userspace; though not sure
if kernel threads need to manage their performance level how this can be
achieved.

Thanks!

--
Qais Yousef