linux-kernel - Re: WARNING in event_function

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20190213101644.GN32534@hirez.programming.kicks-ass.net>
Date:   Wed, 13 Feb 2019 11:16:44 +0100
From:   Peter Zijlstra <peterz@...radead.org>
To:     Kees Cook <keescook@...omium.org>
Cc:     Steven Rostedt <rostedt@...dmis.org>,
        Dmitry Vyukov <dvyukov@...gle.com>,
        syzbot <syzbot+352bd10e338d9a90e5e0@...kaller.appspotmail.com>,
        Abderrahmane Benbachir <abderrahmane.benbachir@...ymtl.ca>,
        Arnaldo Carvalho de Melo <acme@...nel.org>,
        Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
        Jiri Olsa <jolsa@...hat.com>,
        LKML <linux-kernel@...r.kernel.org>,
        Ingo Molnar <mingo@...hat.com>,
        Namhyung Kim <namhyung@...nel.org>,
        syzkaller-bugs <syzkaller-bugs@...glegroups.com>
Subject: Re: WARNING in event_function_local

On Wed, Feb 13, 2019 at 10:57:26AM +0100, Peter Zijlstra wrote:
> On Wed, Feb 13, 2019 at 10:51:58AM +0100, Peter Zijlstra wrote:
> > On Tue, Feb 12, 2019 at 07:40:12PM -0800, Kees Cook wrote:
> > 
> > > > > Is this maybe just an unlucky condition with the event loop running in
> > > > > an IRQ? Should the WARN be expected, or is running under an IRQ
> > > > > unexpected?
> > > 
> > > Is perf expected to fire during an IRQ? The task == current test seems
> > > suspicious if so...
> > 
> > So the only possible callchain here is:
> > 
> >   <PMI>
> >     ...
> >       perf_event_disable_inatomic()
> >         irq_work_queue()
> > 
> >   <irq_work-IPI>
> >     perf_pending_event()
> >       perf_event_disable_local()
> >         event_function_local()
> > 
> > 
> > The assertion states that:
> > 
> >   if the event is a task event; and the context is active, it _must_ be
> >   the same task.
> > 
> > Because: if the PMI happens during ctxsw (which has IRQs disabled), the
> > IPI will not happen until after the ctxsw, at which point we'll also
> > have switched out the perf context of that task -- IOW the context
> > should be inactive.
> > 
> > 
> > Anyway, it looks like a virt issue; I'll start caring once you can
> > reproduce on real hardware.
> 
> Hurm.. I might have spoken too soon. I still don't give a crap about
> virt, but I think I might see an actual problem.
> 
> The moment we re-enable IRQs after ctxsw, the task can already be
> running on another CPU, and _that_ would trigger failure here.
> 
> Let me think a little about that.

Humm, but in that case:

  context_switch()
    prepare_task_switch()
      perf_event_task_sched_out()
        __perf_event_task_sched_out()
	  perf_event_context_sched_out()
	    task_ctx_sched_out()
	      ctx_sched_out()
	        group_sched_out()
		  event_sched_out()
		    if (event->pending_disable)

Would have already cleared the pending_disable state, so the IPI would
not have ran perf_event_disable_local() in the first place.