linux-kernel - Re: [patch 00/29] lockup_detector: Cure hotplug deadlocks and replace duct tape

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20170901144229.3791e5c9@roar.ozlabs.ibm.com>
Date:   Fri, 1 Sep 2017 14:42:29 +1000
From:   Nicholas Piggin <npiggin@...il.com>
To:     Don Zickus <dzickus@...hat.com>
Cc:     Thomas Gleixner <tglx@...utronix.de>,
        LKML <linux-kernel@...r.kernel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...nel.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Borislav Petkov <bp@...en8.de>,
        Sebastian Siewior <bigeasy@...utronix.de>,
        Chris Metcalf <cmetcalf@...lanox.com>,
        Ulrich Obergfell <uobergfe@...hat.com>,
        Michael Ellerman <mpe@...erman.id.au>
Subject: Re: [patch 00/29] lockup_detector: Cure hotplug deadlocks and
 replace duct tape

On Thu, 31 Aug 2017 18:10:14 -0400
Don Zickus <dzickus@...hat.com> wrote:

> On Thu, Aug 31, 2017 at 09:15:58AM +0200, Thomas Gleixner wrote:
> > The lockup detector is broken is several ways:
> > 
> >     - It's deadlock prone vs. CPU hotplug in various ways. Some of these
> >       are due to recursive cpus_read_lock() others are due to
> >       cpus_read_lock() from CPU hotplug callbacks which immediately lock
> >       the machine because cpus are write locked.
> > 
> >     - The handling of the cpu hotplug threads happens sideways to the
> >       smpboot thread infrastructure, which is racy and pointless
> > 
> >     - The handling of the user space sysctl interface is a complete
> >       trainwreck as it fiddles directly with variables which can be
> >       modified or evaluated by the running watchdogs.
> > 
> >     - The perf event initialization is a steaming pile of duct tape as it
> >       idiotically tries to create perf events over and over even if perf is
> >       not functional (no hardware, ....). To avoid excessive dmesg spam it
> >       contains magic printk ratelimiting along with either wrong or useless
> >       messages.
> > 
> >     - The code structure is horrible as ifdef sections are scattered all
> >       over the place which makes it unreadable
> > 
> >     - There is more wreckage, but see the changelogs for the ugly details.
> > 
> > Before I get utterly grumpy, I just pretend that I don't give a sh*t!
> > 
> > The following series sanitizes the facility and addresses the problems.  
> 
> Hi Thomas,
> 
> Thanks for the patchset.  I agree with most your issues you complained
> about, just wasn't smart enough to figure out the right way to solve them.
> Despite your aggressive comments, I will review the code to see if it covers
> the scenarios that have popped up over the years and run some testing on my
> side.  Probably need a few days to do that.

The powerpc bits look fine, there's no real changes pending there,
so just take them through your tree if you like.

I had a glance throught the series, no comments yet. The powerpc watchdog
already duplicates the proc tunables rather than using them directly, so
in theory it did not need the 2 stage reconfigure. In practice, it has a
brown paper bag bug because it does not stop the watchdog before changing
its internal variables :P 2 stage is probably safer and clearer way to go
though.

Thanks,
Nick