linux-kernel - Re: [REGRESSION] perf/core: PMU interrupts dropped if we entered the kernel in the "skid" region

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20170628101248.GB5981@leverpostej>
Date:   Wed, 28 Jun 2017 11:12:48 +0100
From:   Mark Rutland <mark.rutland@....com>
To:     Kyle Huey <me@...ehuey.com>
Cc:     "Jin, Yao" <yao.jin@...ux.intel.com>,
        Ingo Molnar <mingo@...nel.org>,
        "Peter Zijlstra (Intel)" <peterz@...radead.org>,
        stable@...r.kernel.org,
        Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
        Arnaldo Carvalho de Melo <acme@...hat.com>,
        Jiri Olsa <jolsa@...hat.com>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Namhyung Kim <namhyung@...nel.org>,
        Stephane Eranian <eranian@...gle.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Vince Weaver <vincent.weaver@...ne.edu>, acme@...nel.org,
        jolsa@...nel.org, kan.liang@...el.com,
        Will Deacon <will.deacon@....com>, gregkh@...uxfoundation.org,
        Robert O'Callahan <robert@...llahan.org>,
        open list <linux-kernel@...r.kernel.org>
Subject: Re: [REGRESSION] perf/core: PMU interrupts dropped if we entered the
 kernel in the "skid" region

On Tue, Jun 27, 2017 at 09:51:00PM -0700, Kyle Huey wrote:
> On Tue, Jun 27, 2017 at 7:09 PM, Jin, Yao <yao.jin@...ux.intel.com> wrote:
> > Hi,
> >
> > In theory, the PMI interrupts in skid region should be dropped, right?
> 
> No, why would they be dropped?
> 
> My understanding of the situation is as follows:
> 
> There is some time, call it t_0, where the hardware counter overflows.
> The PMU triggers an interrupt, but this is not instantaneous.  Call
> the time when the interrupt is actually delivered t_1.  Then t_1 - t_0
> is the "skid".
> 
> Note that if the counter is `exclude_kernel`, then at t_0 the CPU
> *must* be running a userspace program.  But by t_1, the CPU may be
> doing something else.  Your patch changed things so that if at t_1 the
> CPU is in the kernel, then the interrupt is discarded.  But rr has
> programmed the counter to deliver a signal on overflow (via F_SETSIG
> on the fd returned by perf_event_open).  This change results in the
> signal never being delivered, because the interrupt was ignored.
> (More accurately, the signal is delivered the *next* time the counter
> overflows, which is far past where we wanted to inject our
> asynchronous event into our tracee.

Yes, this is a bug.

As we're trying to avoid smapling state, I think we can move the check
into perf_prepare_sample() or __perf_event_output(), where that state is
actually sampled. I'll take a look at that momentarily.

Just to clarify, you don't care about the sample state at all? i.e. you
don't need the user program counter?

Is that signal delivered to the tracee, or to a different process that
traces it? If the latter, what ensures that the task is stopped
sufficiently quickly?

> It seems to me that it might be reasonable to ignore the interrupt if
> the purpose of the interrupt is to trigger sampling of the CPUs
> register state.  But if the interrupt will trigger some other
> operation, such as a signal on an fd, then there's no reason to drop
> it.

Agreed. I'll try to have a patch for this soon.

I just need to figure out exactly where that overflow signal is
generated by the perf core.

Thanks,
Mark.