[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.20.1709161850010.2105@nanos>
Date:   Sat, 16 Sep 2017 19:35:10 +0200 (CEST)
From:   Thomas Gleixner <tglx@...utronix.de>
To:     Fengguang Wu <fengguang.wu@...el.com>
cc:     LKP <lkp@...org>, LKML <linux-kernel@...r.kernel.org>,
        Don Zickus <dzickus@...hat.com>,
        Ingo Molnar <mingo@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>
Subject: Re: d57108d4f6 ("watchdog/core: Get rid of the thread .."): BUG:
 unable to handle kernel NULL pointer dereference at 0000000000000208
On Sat, 16 Sep 2017, Fengguang Wu wrote:
> > > [    0.038086] Performance Events: unsupported p6 CPU model 61 no PMU
> > > driver, software events only.
> 
> What's your host CPU? I can reproduce it in Nehalem, Haswell and Sandy
> Bridge machines with the attached script.
My bad. I booted the wrong config ....
> > > [    0.041031] Hierarchical SRCU implementation.
> > > [    0.046210] NMI watchdog: Perf event create on CPU 0 failed with -2
> > > [    0.046980] NMI watchdog: Perf NMI watchdog permanetely disabled
> > > 
> > > Confused
> > 
> > I still can't reproduce. Can you please apply the debug patch below and
> > provide the output?
> 
> OK. I'll try and report back tomorrow.
Don't bother. I found it already. On UP we have:
#define for_each_cpu(cpu, mask)               \
        for ((cpu) = 0; (cpu) < 1; (cpu)++, (void)mask)
which is a total fail as it breaks any code which uses for_each_cpu() or
any of the other variants on UP by assuming that all cpumask have bit 0
set.
That means any code which does not have conditional code for some of the
cpumask functions is potentially broken. Sigh.
The simple cure for the watchdog is below.
Thanks,
	tglx
8<------------------
diff --git a/kernel/watchdog_hld.c b/kernel/watchdog_hld.c
index b2931154b5f2..d4c0f75b189e 100644
--- a/kernel/watchdog_hld.c
+++ b/kernel/watchdog_hld.c
@@ -221,7 +221,12 @@ void hardlockup_detector_perf_cleanup(void)
 		struct perf_event *event = per_cpu(watchdog_ev, cpu);
 
 		per_cpu(watchdog_ev, cpu) = NULL;
-		perf_event_release_kernel(event);
+		/*
+		 * Check the event, because on UP for_each_cpu() assumes
+		 * idiotically that all masks handed in have bit 0 set.
+		 */
+		if (event)
+			perf_event_release_kernel(event);
 	}
 	cpumask_clear(&dead_events_mask);
 }
Powered by blists - more mailing lists
 
