linux-kernel - Re: [PATCH 1/8] perf: Allow to block process in syscall tracepoints

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20181213112922.GD731@krava>
Date:   Thu, 13 Dec 2018 12:29:22 +0100
From:   Jiri Olsa <jolsa@...hat.com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     Steven Rostedt <rostedt@...dmis.org>,
        "Dmitry V. Levin" <ldv@...linux.org>, Jiri Olsa <jolsa@...nel.org>,
        Arnaldo Carvalho de Melo <acme@...nel.org>,
        Ingo Molnar <mingo@...nel.org>,
        Namhyung Kim <namhyung@...nel.org>,
        Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        "Luis Claudio R. Goncalves" <lclaudio@...g.org>,
        Eugene Syromyatnikov <esyr@...hat.com>,
        Frederic Weisbecker <fweisbec@...il.com>,
        lkml <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 1/8] perf: Allow to block process in syscall tracepoints

On Thu, Dec 13, 2018 at 11:01:49AM +0100, Peter Zijlstra wrote:
> On Wed, Dec 12, 2018 at 08:26:39PM -0500, Steven Rostedt wrote:
> > On Thu, 13 Dec 2018 03:39:38 +0300
> > "Dmitry V. Levin" <ldv@...linux.org> wrote:
> > 
> > > btw, I didn't ask for the implementation to be ugly.
> > > You don't have to introduce polling into the kernel if you don't want to,
> > > userspace is perfectly capable of invoking wait4(2) in a loop.
> > > Just block the tracee, notify the tracer, and let it pick up the pieces.
> > 
> > Note, there's been some discussion offlist to only have perf set a flag
> > when it dropped an event and have the ptrace code do the heavy lifting
> > of blocking the task and waking it back up. I think that would be a
> > cleaner solution and wont muck with perf as badly.
> 
> It's still really horrid -- the question is not if we can come up with
> something, anything, to make strace work. The question is if we can
> extend something in a sane and maintainable manner to allow this.
> 
> So there's a whole bunch of problems I see with all this, in no
> particular order:
> 
>  - we cannot block when writing to the actual buffer, and have to unroll
>    the callstack and bolt on the blocking manualy in a few specific
>    sites. This is ugly, inconsistent and maintenance heavy.
> 
>  - it only works for some 'magic' events that got the treatment, but not
>    for many other you might expect it to work for with no real
>    indication which and why.
> 
>  - the wakeups side is icky; the best I can come up with is making the
>    data page R/O and single stepping on write fault, but that isn't
>    multi-threading safe.
> 
>    Another alternative would be keeping the whole page R/O and
>    using write(2) or an ioctl() to update the head pointer.
> 
> Again, if we're going to do this; it needs to be done well and
> consistent and not as a special hack to enable strace-like
> functionality. And without clean and sane solutions to the above I just
> don't see it happening.
> 
> Note that the first 2 points are equally true for ftrace; so I don't see
> how we could sanely add it there either.
> 
> 
> One, very big maybe, would be to add a new tracepoint type that includes
> a might_sleep() and we very carefully undo all the preempt_disable and
> go sleep where we should. That also gives the tracepoint crud the
> information it needs to publish the capability to userspace.

nice, I like this one.. seems like the most clean solution

jirka