linux-kernel - Re: [PATCH] perf: Fix missing SIGTRAPs due to pending

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <YzLtAG2bfRJ/vFRu@elver.google.com>
Date:   Tue, 27 Sep 2022 14:30:56 +0200
From:   Marco Elver <elver@...gle.com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     Ingo Molnar <mingo@...hat.com>,
        Arnaldo Carvalho de Melo <acme@...nel.org>,
        Mark Rutland <mark.rutland@....com>,
        Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
        Jiri Olsa <jolsa@...nel.org>,
        Namhyung Kim <namhyung@...nel.org>,
        linux-perf-users@...r.kernel.org, linux-kernel@...r.kernel.org,
        kasan-dev@...glegroups.com, Dmitry Vyukov <dvyukov@...gle.com>
Subject: Re: [PATCH] perf: Fix missing SIGTRAPs due to pending_disable abuse

On Tue, Sep 27, 2022 at 02:13PM +0200, Marco Elver wrote:
> Due to the implementation of how SIGTRAP are delivered if
> perf_event_attr::sigtrap is set, we've noticed 3 issues:
> 
> 	1. Missing SIGTRAP due to a race with event_sched_out() (more
> 	   details below).
> 
> 	2. Hardware PMU events being disabled due to returning 1 from
> 	   perf_event_overflow(). The only way to re-enable the event is
> 	   for user space to first "properly" disable the event and then
> 	   re-enable it.
> 
> 	3. The inability to automatically disable an event after a
> 	   specified number of overflows via PERF_EVENT_IOC_REFRESH.
> 
> The worst of the 3 issues is problem (1), which occurs when a
> pending_disable is "consumed" by a racing event_sched_out(), observed as
> follows:
> 
> 		CPU0			| 	CPU1
> 	--------------------------------+---------------------------
> 	__perf_event_overflow()		|
> 	 perf_event_disable_inatomic()	|
> 	  pending_disable = CPU0	| ...
> 	  				| _perf_event_enable()
> 					|  event_function_call()
> 					|   task_function_call()
> 					|    /* sends IPI to CPU0 */
> 	<IPI>				| ...
> 	 __perf_event_enable()		+---------------------------
> 	  ctx_resched()
> 	   task_ctx_sched_out()
> 	    ctx_sched_out()
> 	     group_sched_out()
> 	      event_sched_out()
> 	       pending_disable = -1
> 	</IPI>
> 	<IRQ-work>
> 	 perf_pending_event()
> 	  perf_pending_event_disable()
> 	   /* Fails to send SIGTRAP because no pending_disable! */
> 	</IRQ-work>
> 
> In the above case, not only is that particular SIGTRAP missed, but also
> all future SIGTRAPs because 'event_limit' is not reset back to 1.
> 
> To fix, rework pending delivery of SIGTRAP via IRQ-work by introduction
> of a separate 'pending_sigtrap', no longer using 'event_limit' and
> 'pending_disable' for its delivery.
> 
> During testing, this also revealed several more possible races between
> reschedules and pending IRQ work; see code comments for details.
> 
> Doing so makes it possible to use 'event_limit' normally (thereby
> enabling use of PERF_EVENT_IOC_REFRESH), perf_event_overflow() no longer
> returns 1 on SIGTRAP causing disabling of hardware PMUs, and finally the
> race is no longer possible due to event_sched_out() not consuming
> 'pending_disable'.
> 
> Fixes: 97ba62b27867 ("perf: Add support for SIGTRAP on perf events")
> Reported-by: Dmitry Vyukov <dvyukov@...gle.com>
> Debugged-by: Dmitry Vyukov <dvyukov@...gle.com>
> Signed-off-by: Marco Elver <elver@...gle.com>
> ---
>  include/linux/perf_event.h |  2 +
>  kernel/events/core.c       | 85 ++++++++++++++++++++++++++++++++------
>  2 files changed, 75 insertions(+), 12 deletions(-)
> 
> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> index 907b0e3f1318..dff3430844a2 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -740,8 +740,10 @@ struct perf_event {
>  	int				pending_wakeup;
>  	int				pending_kill;
>  	int				pending_disable;
> +	int				pending_sigtrap;
>  	unsigned long			pending_addr;	/* SIGTRAP */
>  	struct irq_work			pending;
> +	struct irq_work			pending_resched;
>  
>  	atomic_t			event_limit;
>  
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 75f5705b6892..df90777262bf 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -2527,6 +2527,14 @@ event_sched_in(struct perf_event *event,
>  	if (event->attr.exclusive)
>  		cpuctx->exclusive = 1;
>  
> +	if (event->pending_sigtrap) {
> +		/*
> +		 * The task and event might have been moved to another CPU:
> +		 * queue another IRQ work. See perf_pending_event_sigtrap().
> +		 */
> +		WARN_ON_ONCE(!irq_work_queue(&event->pending_resched));

One question we had is if it's possible for an event to be scheduled in,
immediately scheduled out, and then scheduled in on a 3rd CPU. I.e. we'd
still be in trouble if we can do this:

	CPU0
	sched-out
		CPU1
		sched-in
		sched-out
			CPU2
			sched-in

without any IRQ work ever running. Some naive solutions so the
pending_resched IRQ work isn't needed, like trying to send a signal
right here (or in event_sched_out()), don't work because we've seen
syzkaller produce programs where there's a pending event and then the
scheduler moves the task; because we're in the scheduler we can deadlock
if we try to send the signal here.

Thanks,
-- Marco