linux-kernel - Re: Tracehooks in scheduler

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190426102635.almrj7bbjqlbt77n@queper01-lin>
Date:   Fri, 26 Apr 2019 11:26:38 +0100
From:   Quentin Perret <quentin.perret@....com>
To:     Qais Yousef <qais.yousef@....com>
Cc:     rostedt@...dmis.org, peterz@...radead.org,
        dietmar.eggemann@....com, bristot@...hat.com,
        juri.lelli@...hat.com, williams@...hat.com,
        linux-kernel@...r.kernel.org
Subject: Re: Tracehooks in scheduler

Hi Qais,

On Monday 15 Apr 2019 at 15:49:45 (+0100), Qais Yousef wrote:
> Hi Steve, Peter
> 
> > On 04/07/19 18:52, Qais Yousef wrote:
> > > Hi Steve, Peter
> > > 
> > > I know the topic has sprung up in the past but I couldn't find anything that
> > > points into any conclusion.
> > > 
> > > As far as I understand new TRACE_EVENTS() in the scheduler (and probably other
> > > subsystems) isn't desirable as it intorduces a sort of ABI that can be painful
> > > to maintain.
> > > 
> > > But for us to be able to test various aspect of EAS, we rely on some events
> > > that track load_avg, util_avg and some other metrics in the scheduler.
> > > Example of such patches that are in android and we maintain out of tree can be
> > > found here:
> > > 
> > > https://android.googlesource.com/kernel/common/+/42903694913697da88a4ac627a92bbfdf44f0a2e
> > > https://android.googlesource.com/kernel/common/+/6dfaed989ea4ca223f0913dfc11cdafd9664fc1c
> > > 
> > > Dietmar and Quentin pointed me to a discussion you guys had with Daniel Bristot
> > > in the last LPC when he had a similar need. So it is something that could
> > > benefit other users as well.
> > > 
> > > What is the best way forward to be able to add tracehooks into the scheduler
> > > and any other subsystem for that matters?
> > > 
> > > We tried using DECLARE_TRACE() to create a tracepoint which doesn't export
> > > anything in /sys/kernel/debug/tracing/events and hoped that we can use eBPF or
> > > a kernel module to attach to this tracepoint and access the args to inject our
> > > own trace_printks() but this didn't work. The glue logic necessary to attach
> > > to this tracepoint in a similar manner to how RAW_TRACEPOINT() in eBPF works
> > > isn't there AFAICT.
> > > 
> > > I can post the full example if the above doesn't make sense. I am still
> > > familiarizing myself with the different aspects of this code as well. There
> > > might be support for what we want but I failed to figure out the magic
> > > combination to get it to work.
> > > 
> > > If I got this glue logic done, would this be an acceptable solution? If not, do
> > > you have any suggestions on how to progress?
> 
> I have written some patches in hope it'll clarify further what we are trying to
> achieve here and what would be the best possible approach about it.
> 
> I have taken two approaches to solve the problem.
> 
> 
> 1.
> 
> 	https://github.com/qais-yousef/linux/commit/e7d0aa7ff1328195f314b0730c4cc744dec4261e
> 
> 	In this approach everything we need is already available and we just
> 	need to create new tracepoints as described in
> 	Documentation/trace/tracepoints.rst and export it with
> 	EXPORT_TRACEPOINT_SYMBOL_GPL().
> 
> 	A user then can have an out of tree module to probe this tp and
> 	manipulate it as they like.
> 
> 	Example of such a module is here, the pelt_se tp is to demo the
> 	approach:
> 
> 	https://github.com/qais-yousef/tracepoints-helpers/blob/master/module-pelt-se/probe_tp_pelt_se.c
> 
> 	Googling around I can see that the use of
> 	EXPORT_TRACEPOINT_SYMBOL_GPL() is not desired unless the module is
> 	in-tree which I doubt will be the case here.
> 
> 	https://lore.kernel.org/lkml/20150422130052.4996e231@gandalf.local.home/
> 
> 2.
> 	https://github.com/qais-yousef/linux/commit/fb9fea29edb8af327e6b2bf3bc41469a8e66df8b
> 	https://github.com/qais-yousef/linux/commit/edd2498c5bbfca1a26acd151a4e3323e511f3455
> 
> 	In this approach I try to allow attaching to a TP using eBPF. Sadly the
> 	current infrastructure is lacking so I hacked the above up to create a
> 	new DECLARE_TRACE_HOOK() macro which will allow using eBPF but without
> 	exporting anything in debugfs that can constitute an ABI.
> 
> 	The following eBPF program can be used then to attach and access some
> 	info at the TP:
> 
> 	https://github.com/qais-yousef/tracepoints-helpers/blob/master/bpf/tp_trace_printk_pelt_se
> 
> 
> Does any of the above approaches make sense?

For the EAS-testing use-case you mentioned earlier, it's really for
debugging so we don't actually need the eBPF safety. None of this is
supposed to run in production I would say. So I tend to prefer option 1
if that works for everybody interested in this thing.

And then what would be the story ? We would carry a module out-of-tree
in our test suite to extract scheduler data and then post-process it in
userspace or something ? Since that would be an out-of-tree module,
upstream doesn't commit to anything to userspace, so perhaps that could
work.

Another thing, should these sched tracepoints be guarded by sched_debug ?

Thanks,
Quentin