linux-kernel - Re: [RFC PATCH 1/6] perf: Move mlock accounting to ring buffer allocation

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20160923202830.GQ5008@twins.programming.kicks-ass.net>
Date:   Fri, 23 Sep 2016 22:28:30 +0200
From:   Peter Zijlstra <peterz@...radead.org>
To:     Andi Kleen <ak@...ux.intel.com>
Cc:     Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
        Ingo Molnar <mingo@...hat.com>, linux-kernel@...r.kernel.org,
        vince@...ter.net, eranian@...gle.com,
        Arnaldo Carvalho de Melo <acme@...radead.org>,
        tglx@...utronix.de
Subject: Re: [RFC PATCH 1/6] perf: Move mlock accounting to ring buffer
 allocation

On Fri, Sep 23, 2016 at 10:26:15AM -0700, Andi Kleen wrote:
> > Afaict there's no actual need to hide the AUX buffer for this sampling
> > stuff; the user knows about all this and can simply mmap() the AUX part.
> > The sample could either point to locations in the AUX buffer, or (as I
> > think this code does) memcpy bits out.
> 
> This would work for perf, but not for the core dump case below.
> 
> > Ideally we'd pass the AUX-event into the syscall, that way you avoid all
> > the find_aux_event crud. I'm not sure we want to overload the group_fd
> > thing more (its already very hard to create counter groups in a cgroup
> > for example) ..
> > 
> > Coredump was mentioned somewhere, but I'm not sure I've seen
> > code/interfaces for that. How was that envisioned to work?
> 
> The idea was to have a rlimit that enables PT running as a ring buffer
> in the background.  If something crashes the ring buffer is dumped
> as part of the core dump, and then gdb can tell you how you crashed.
> This extends what gdb already does explicitly today using perf
> API calls.

Well, we could 'force' inject a VMA into the process's address space, we
do that for a few other things as well. It also makes for less
exceptions with the actual core dumping.

But the worry I have is the total amount of pinned memory. If you want
to inherit this on fork(), as is a reasonable expectation, then its
possible to quickly exceed the total amount of pinnable memory.

At which point we _should_ start failing fork(), which is a somewhat
unexpected, and undesirable side-effect.

Ideally we'd unpin the old buffers and repin the new buffers on context
switch, but that's impossible since faulting needs scheduling,
recursion, we loose.

I really want to see something sensible before we go do that.