[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180718081905.GA13520@krava>
Date: Wed, 18 Jul 2018 10:19:05 +0200
From: Jiri Olsa <jolsa@...hat.com>
To: Cong Wang <xiyou.wangcong@...il.com>
Cc: linux-kernel@...r.kernel.org,
Peter Zijlstra <peterz@...radead.org>,
Ingo Molnar <mingo@...hat.com>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Arnaldo Carvalho de Melo <acme@...nel.org>,
Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
Namhyung Kim <namhyung@...nel.org>, stable@...r.kernel.org
Subject: Re: [PATCH] perf/core: fix a possible deadlock scenario
On Mon, Jul 16, 2018 at 02:51:01PM -0700, Cong Wang wrote:
> hrtimer_cancel() busy-waits for the hrtimer callback to stop,
> pretty much like del_timer_sync(). This creates a possible deadlock
> scenario where we hold a spinlock before calling hrtimer_cancel()
> while in trying to acquire the same spinlock in the callback.
>
> This kind of deadlock is already known and is catchable by lockdep,
> like for del_timer_sync(), we can add lockdep annotations. However,
> it is still missing for hrtimer_cancel(). (I have a WIP patch to make
> it complete for hrtimer_cancel() but it breaks booting.)
>
> And there is such a deadlock scenario in kernel/events/core.c too,
> well actually, it is a simpler version: the hrtimer callback waits
> for itself to finish on the same CPU! It sounds stupid but it is
> not obvious at all, it hides very deeply in the perf event code:
>
> cpu_clock_event_init():
> perf_swevent_init_hrtimer():
> hwc->hrtimer.function = perf_swevent_hrtimer;
>
> perf_swevent_hrtimer():
> __perf_event_overflow():
> __perf_event_account_interrupt():
> perf_adjust_period():
> pmu->stop():
> cpu_clock_event_stop():
> perf_swevent_cancel():
> hrtimer_cancel()
>
> Getting stuck in a timer doesn't sound very scary, however, in this
sound scary enough for me ;-) were you able to hit it?
> case, its consequences are a disaster:
>
> perf_event_overflow() which calls __perf_event_overflow() is called
> in NMI handler too, so it is racy with hrtimer callback as disabling
> IRQ can't possibly disable NMI. This means this hrtimer callback
> once interrupted by an NMI handler could deadlock within NMI!
hum, the swevent pmu does not triger NMI, so that timer
will never be touched in NMI context
jirka
Powered by blists - more mailing lists