lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 4 Sep 2019 11:43:33 +0100
From:   Qais Yousef <qais.yousef@....com>
To:     Joel Fernandes <joel@...lfernandes.org>
Cc:     Valentin Schneider <valentin.schneider@....com>,
        Radim Krčmář <rkrcmar@...hat.com>,
        linux-kernel@...r.kernel.org, Ingo Molnar <mingo@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Borislav Petkov <bp@...en8.de>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        Steven Rostedt <rostedt@...dmis.org>,
        "H. Peter Anvin" <hpa@...or.com>,
        Andy Lutomirski <luto@...nel.org>,
        Jirka Hladký <jhladky@...hat.com>,
        Jiří Vozár <jvozar@...hat.com>,
        x86@...nel.org
Subject: Re: [PATCH 2/2] sched/debug: add sched_update_nr_running tracepoint

On 09/04/19 00:23, Joel Fernandes wrote:
> On Tue, Sep 03, 2019 at 05:05:47PM +0100, Valentin Schneider wrote:
> > On 03/09/2019 16:43, Radim Krčmář wrote:
> > > The paper "The Linux Scheduler: a Decade of Wasted Cores" used several
> > > custom data gathering points to better understand what was going on in
> > > the scheduler.
> > > Red Hat adapted one of them for the tracepoint framework and created a
> > > tool to plot a heatmap of nr_running, where the sched_update_nr_running
> > > tracepoint is being used for fine grained monitoring of scheduling
> > > imbalance.
> > > The tool is available from https://github.com/jirvoz/plot-nr-running.
> > > 
> > > The best place for the tracepoints is inside the add/sub_nr_running,
> > > which requires some shenanigans to make it work as they are defined
> > > inside sched.h.
> > > The tracepoints have to be included from sched.h, which means that
> > > CREATE_TRACE_POINTS has to be defined for the whole header and this
> > > might cause problems if tree-wide headers expose tracepoints in sched.h
> > > dependencies, but I'd argue it's the other side's misuse of tracepoints.
> > > 
> > > Moving the import sched.h line lower would require fixes in s390 and ppc
> > > headers, because they don't include dependecies properly and expect
> > > sched.h to do it, so it is simpler to keep sched.h there and
> > > preventively undefine CREATE_TRACE_POINTS right after.
> > > 
> > > Exports of the pelt tracepoints remain because they don't need to be
> > > protected by CREATE_TRACE_POINTS and moving them closer would be
> > > unsightly.
> > > 
> > 
> > Pure trace events are frowned upon in scheduler world, try going with
> > trace points. Qais did something very similar recently:
> > 
> > https://lore.kernel.org/lkml/20190604111459.2862-1-qais.yousef@arm.com/
> > 
> > You'll have to implement the associated trace events in a module, which
> > lets you define your own event format and doesn't form an ABI :).
> 
> Is that really true? eBPF programs loaded from userspace can access
> tracepoints through BPF_RAW_TRACEPOINT_OPEN, which is UAPI:
> https://github.com/torvalds/linux/blob/master/include/uapi/linux/bpf.h#L103
> 
> I don't have a strong opinion about considering tracepoints as ABI / API or
> not, but just want to get the facts straight :)

It is actually true. But you need to make the distinction between a tracepoint
and a trace event first. What Valentin is talking about here is the *bare*
tracepoint without any event associated with them like the one I added to the
scheduler recently. These ones are not accessible via eBPF, unless something
has changed since I last tried.

The current infrastructure needs to be expanded to allow eBPF to attach these
bare tracepoints. Something similar to what I have in [1] is needed - but
instead of creating a new macro it needs to expand the current macro. [2] might
give full context of when I was trying to come up with alternatives to using
trace events.

[1] https://github.com/qais-yousef/linux/commit/fb9fea29edb8af327e6b2bf3bc41469a8e66df8b
[2] https://lore.kernel.org/lkml/20190415144945.tumeop4djyj45v6k@e107158-lin.cambridge.arm.com/

HTH

--
Qais Yousef

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ