linux-kernel - Re: [PATCH] perf/core: generate overflow signal when samples are dropped (WAS: Re: [REGRESSION] perf/core: PMU interrupts dropped if we entered the kernel in the "skid" region)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20170704090313.xyb5lntyy55ga7dm@hirez.programming.kicks-ass.net>
Date:   Tue, 4 Jul 2017 11:03:13 +0200
From:   Peter Zijlstra <peterz@...radead.org>
To:     Kyle Huey <me@...ehuey.com>
Cc:     Mark Rutland <mark.rutland@....com>,
        Vince Weaver <vincent.weaver@...ne.edu>,
        "Jin, Yao" <yao.jin@...ux.intel.com>,
        Ingo Molnar <mingo@...nel.org>, stable@...r.kernel.org,
        Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
        Arnaldo Carvalho de Melo <acme@...hat.com>,
        Jiri Olsa <jolsa@...hat.com>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Namhyung Kim <namhyung@...nel.org>,
        Stephane Eranian <eranian@...gle.com>,
        Thomas Gleixner <tglx@...utronix.de>, acme@...nel.org,
        jolsa@...nel.org, kan.liang@...el.com,
        Will Deacon <will.deacon@....com>, gregkh@...uxfoundation.org,
        Robert O'Callahan <robert@...llahan.org>,
        open list <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] perf/core: generate overflow signal when samples are
 dropped (WAS: Re: [REGRESSION] perf/core: PMU interrupts dropped if we
 entered the kernel in the "skid" region)

On Wed, Jun 28, 2017 at 03:55:07PM -0700, Kyle Huey wrote:

> > Having thought about this some more, I think Vince does make a good
> > point that throwing away samples is liable to break stuff, e.g. that
> > which only relies on (non-sensitive) samples.
> >
> > It still seems wrong to make up data, though.

It is something we do in other places as well though. For example the
printk() %pK thing fakes NULL pointers when kptr_restrict is set.

Faking data gets a wee bit tricky in how much data we need to clear
through, its not only IP, pretty much everything we get from the
interrupt context, like the branch stack and registers is also suspect.

> > Maybe for exclude_kernel && !exclude_user events we can always generate
> > samples from the user regs, rather than the exception regs. That's going
> > to be closer to what the user wants, regardless. I'll take a look
> > tomorrow.
> 
> I'm not very familiar with the kernel internals, but the reason I
> didn't suggest this originally is it seems like it will be difficult
> to determine what the "correct" userspace registers are.  For example,
> what happens if a performance counter is fixed to a given tid, the
> interrupt fires during a context switch from that task to another that
> is not being monitored, and the kernel is far enough along in the
> context switch that the current task struct has been switched out?
> Reporting the new task's registers seems as bad as reporting the
> kernel's registers.  But maybe this is easier than I imagine for
> whatever reason.

If the counter is fixed to a task then its scheduled along with the
task. We'll schedule out the event before doing the actual task switch
and switch in the new event after.

That said, with a per-cpu event the TID sample value is indeed subject
to skid like you describe.