[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20170907160436.b4mjttqugh3o77zl@redhat.com>
Date: Thu, 7 Sep 2017 12:04:36 -0400
From: Don Zickus <dzickus@...hat.com>
To: Thomas Gleixner <tglx@...utronix.de>
Cc: LKML <linux-kernel@...r.kernel.org>,
Peter Zijlstra <peterz@...radead.org>,
Ingo Molnar <mingo@...nel.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Borislav Petkov <bp@...en8.de>,
Sebastian Siewior <bigeasy@...utronix.de>,
Nicholas Piggin <npiggin@...il.com>,
Chris Metcalf <cmetcalf@...lanox.com>,
Ulrich Obergfell <uobergfe@...hat.com>
Subject: Re: [patch 00/29] lockup_detector: Cure hotplug deadlocks and
replace duct tape
On Thu, Aug 31, 2017 at 09:15:58AM +0200, Thomas Gleixner wrote:
> The lockup detector is broken is several ways:
>
> - It's deadlock prone vs. CPU hotplug in various ways. Some of these
> are due to recursive cpus_read_lock() others are due to
> cpus_read_lock() from CPU hotplug callbacks which immediately lock
> the machine because cpus are write locked.
>
> - The handling of the cpu hotplug threads happens sideways to the
> smpboot thread infrastructure, which is racy and pointless
>
> - The handling of the user space sysctl interface is a complete
> trainwreck as it fiddles directly with variables which can be
> modified or evaluated by the running watchdogs.
>
> - The perf event initialization is a steaming pile of duct tape as it
> idiotically tries to create perf events over and over even if perf is
> not functional (no hardware, ....). To avoid excessive dmesg spam it
> contains magic printk ratelimiting along with either wrong or useless
> messages.
>
> - The code structure is horrible as ifdef sections are scattered all
> over the place which makes it unreadable
>
> - There is more wreckage, but see the changelogs for the ugly details.
>
> Before I get utterly grumpy, I just pretend that I don't give a sh*t!
>
> The following series sanitizes the facility and addresses the problems.
One of the goals I was trying to achieve with splitting out watchdog_hld.c
was to abstract it as another hw nmi thing. As some arches wanted to move
away from using perf as a hardlockup detector.
So watchdog_nmi_enable/disable was an attempt to be that hook, maybe
watchdog_nmi_reconfigure.
I think some of your hardlockup_detector_perf_enable/disable/restart might
fit into that. The _cleanup() probably not.
Other than that and the compile issue, I don't really have much problems
with the bulk of the changes and my simple tests seem to work fine.
Cheers,
Don
>
> Thanks,
>
> tglx
> ---
> arch/parisc/kernel/process.c | 2
> arch/powerpc/kernel/watchdog.c | 22 -
> arch/x86/events/intel/core.c | 11
> include/linux/nmi.h | 121 +++----
> include/linux/smpboot.h | 4
> kernel/cpu.c | 6
> kernel/smpboot.c | 22 -
> kernel/sysctl.c | 22 -
> kernel/watchdog.c | 638 ++++++++++++++---------------------------
> kernel/watchdog_hld.c | 193 ++++++------
> 10 files changed, 433 insertions(+), 608 deletions(-)
>
>
Powered by blists - more mailing lists