linux-kernel - Re: [PATCH] tracing: Choose static tp_printk buffer by explicit nesting count

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CALCETrXGrtjpg4xBUaFrAxHUPObgRRRifGJPJx-uiaf6-iWEKA@mail.gmail.com>
Date:	Wed, 25 May 2016 13:17:37 -0700
From:	Andy Lutomirski <luto@...capital.net>
To:	Peter Zijlstra <peterz@...radead.org>
Cc:	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Ingo Molnar <mingo@...hat.com>,
	Steven Rostedt <rostedt@...dmis.org>
Subject: Re: [PATCH] tracing: Choose static tp_printk buffer by explicit
 nesting count

On May 25, 2016 6:16 AM, "Peter Zijlstra" <peterz@...radead.org> wrote:
>
> On Tue, May 24, 2016 at 03:52:28PM -0700, Andy Lutomirski wrote:
> > Currently, the trace_printk code chooses which static buffer to use based
> > on what type of atomic context (NMI, IRQ, etc) it's in.  Simplify the
> > code and make it more robust: simply count the nesting depth and choose
> > a buffer based on the current nesting depth.
> >
> > The new code will only drop an event if we nest more than 4 deep,
> > and the old code was guaranteed to malfunction if that happened.
> >
> > Signed-off-by: Andy Lutomirski <luto@...nel.org>
> > ---
> >  kernel/trace/trace.c | 83 +++++++++++++++-------------------------------------
> >  1 file changed, 24 insertions(+), 59 deletions(-)
> >
> > diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> > index a2f0b9f33e9b..4508f3bf4a97 100644
> > --- a/kernel/trace/trace.c
> > +++ b/kernel/trace/trace.c
> > @@ -1986,83 +1986,41 @@ static void __trace_userstack(struct trace_array *tr, unsigned long flags)
> >
> >  /* created for use with alloc_percpu */
> >  struct trace_buffer_struct {
> > -     char buffer[TRACE_BUF_SIZE];
> > +     int nesting;
> > +     char buffer[4][TRACE_BUF_SIZE];
> >  };
> >
> >  static struct trace_buffer_struct *trace_percpu_buffer;
> >  /*
> > + * Thise allows for lockless recording.  If we're nested too deeply, then
> > + * this returns NULL.
> >   */
> >  static char *get_trace_buf(void)
> >  {
> > +     struct trace_buffer_struct *buffer = this_cpu_ptr(trace_percpu_buffer);
> >
> > +     if (!buffer || buffer->nesting >= 4)
> >               return NULL;
>
> This is buggy fwiw; you need to unconditionally increment
> buffer->nesting to match the unconditional decrement.
>
> Otherwise 5 'increments' and 5 decrements will land you at -1.

I did indeed mess up the error handling.  I'll fix it.

>
> >
> > +     return &buffer->buffer[buffer->nesting++][0];
> > +}
> > +
> > +static void put_trace_buf(void)
> > +{
> > +     this_cpu_dec(trace_percpu_buffer->nesting);
> >  }
>
> So I don't know about tracing; but for perf this construct would not
> work 'properly'.
>
> The per context counter -- which is lost in this scheme -- guards
> against in-context recursion.
>
> Only if we nest from another context do we allow generation of a new
> event.

What's the purpose of this feature?

I'm guessing that the idea is to prevent events that are triggered
synchronously during processing of another event.  So, for example, if
you get a page fault or trigger a data breakpoint while generating a
callchain, it's not terribly helpful to emit events due to that fault
or breakpoint.  In this respect, my patch is an improvement:
watchpoints are synchronous events.

If that's the goal, then the current heuristic may be fairly good after all.

--Andy