lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 25 Apr 2023 08:26:14 -0700
From:   Doug Anderson <dianders@...omium.org>
To:     Chen-Yu Tsai <wenst@...omium.org>
Cc:     Daniel Thompson <daniel.thompson@...aro.org>,
        Petr Mladek <pmladek@...e.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Lecopzer Chen <lecopzer.chen@...iatek.com>,
        Stephen Boyd <swboyd@...omium.org>,
        Chen-Yu Tsai <wens@...e.org>,
        linux-arm-kernel@...ts.infradead.org,
        kgdb-bugreport@...ts.sourceforge.net,
        Marc Zyngier <maz@...nel.org>,
        linux-perf-users@...r.kernel.org,
        Mark Rutland <mark.rutland@....com>,
        Masayoshi Mizuma <msys.mizuma@...il.com>,
        Will Deacon <will@...nel.org>, ito-yuichi@...itsu.com,
        Sumit Garg <sumit.garg@...aro.org>,
        Catalin Marinas <catalin.marinas@....com>,
        Colin Cross <ccross@...roid.com>,
        Matthias Kaehlcke <mka@...omium.org>,
        Guenter Roeck <groeck@...omium.org>,
        Tzung-Bi Shih <tzungbi@...omium.org>,
        Alexander Potapenko <glider@...gle.com>,
        AngeloGioacchino Del Regno 
        <angelogioacchino.delregno@...labora.com>,
        Dan Williams <dan.j.williams@...el.com>,
        Geert Uytterhoeven <geert+renesas@...der.be>,
        Ingo Molnar <mingo@...nel.org>,
        John Ogness <john.ogness@...utronix.de>,
        Josh Poimboeuf <jpoimboe@...nel.org>,
        Juergen Gross <jgross@...e.com>,
        Kees Cook <keescook@...omium.org>,
        Laurent Dufour <ldufour@...ux.ibm.com>,
        Liam Howlett <liam.howlett@...cle.com>,
        Marco Elver <elver@...gle.com>,
        Matthias Brugger <matthias.bgg@...il.com>,
        Michael Ellerman <mpe@...erman.id.au>,
        Miguel Ojeda <ojeda@...nel.org>,
        Nathan Chancellor <nathan@...nel.org>,
        Nick Desaulniers <ndesaulniers@...gle.com>,
        "Paul E. McKenney" <paulmck@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Randy Dunlap <rdunlap@...radead.org>,
        Rasmus Villemoes <linux@...musvillemoes.dk>,
        Sami Tolvanen <samitolvanen@...gle.com>,
        Stefano Stabellini <sstabellini@...nel.org>,
        Vlastimil Babka <vbabka@...e.cz>,
        Zhaoyang Huang <zhaoyang.huang@...soc.com>,
        Zhen Lei <thunder.leizhen@...wei.com>,
        linux-kernel@...r.kernel.org, linux-mediatek@...ts.infradead.org
Subject: Re: [PATCH] hardlockup: detect hard lockups using secondary (buddy) cpus

Hi,

On Mon, Apr 24, 2023 at 9:58 PM Chen-Yu Tsai <wenst@...omium.org> wrote:
>
> On Mon, Apr 24, 2023 at 11:42 PM Doug Anderson <dianders@...omium.org> wrote:
> >
> > Hi,
> >
> > On Mon, Apr 24, 2023 at 5:54 AM Daniel Thompson
> > <daniel.thompson@...aro.org> wrote:
> > >
> > > On Fri, Apr 21, 2023 at 03:53:30PM -0700, Douglas Anderson wrote:
> > > > From: Colin Cross <ccross@...roid.com>
> > > >
> > > > Implement a hardlockup detector that can be enabled on SMP systems
> > > > that don't have an arch provided one or one implemented atop perf by
> > > > using interrupts on other cpus. Each cpu will use its softlockup
> > > > hrtimer to check that the next cpu is processing hrtimer interrupts by
> > > > verifying that a counter is increasing.
> > > >
> > > > NOTE: unlike the other hard lockup detectors, the buddy one can't
> > > > easily provide a backtrace on the CPU that locked up. It relies on
> > > > some other mechanism in the system to get information about the locked
> > > > up CPUs. This could be support for NMI backtraces like [1], it could
> > > > be a mechanism for printing the PC of locked CPUs like [2], or it
> > > > could be something else.
> > > >
> > > > This style of hardlockup detector originated in some downstream
> > > > Android trees and has been rebased on / carried in ChromeOS trees for
> > > > quite a long time for use on arm and arm64 boards. Historically on
> > > > these boards we've leveraged mechanism [2] to get information about
> > > > hung CPUs, but we could move to [1].
> > >
> > > On the Arm platforms is this code able to leverage the existing
> > > infrastructure to extract status from stuck CPUs:
> > > https://docs.kernel.org/trace/coresight/coresight-cpu-debug.html
> >
> > Yup! I wasn't explicit about this, but that's where you end up if you
> > follow the whole bug tracker item that was linked as [2].
> > Specifically, we used to have downstream patches in the ChromeOS that
> > just reached into the coresight range from a SoC specific driver and
> > printed out the CPU_DBGPCSR. When Brian was uprevving rk3399
> > Chromebooks he found that the equivalent functionality had made it
> > upstream in a generic way through the coresight framework. Brian
> > confirmed it was working on rk3399 and made all of the device tree
> > changes needed to get it all hooked up, so (at least for that SoC) it
> > should work on that SoC.
> >
> > [2] https://issuetracker.google.com/172213129
>
> IIRC with the coresight CPU debug driver enabled and the proper DT nodes
> added, the panic handler does dump out information from the hardware.
> I don't think it's wired up for hung tasks though.

Yes, that's correct. The coresight CPU debug driver doesn't work for
hung tasks because it can't get a real stack crawl. All it can get is
the PC of the last branch that the CPU took. This is why combining
${SUBJECT} patch with the ability to get stack traces via pseudo-NMI
is superior. That being said, even with just the coresight CPU debug
driver ${SUBJECT} patch is still helpful because (assuming
"hardlockup_panic" is set) we'll do a panic which will then trigger
the coresight CPU debug driver. :-)

-Doug

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ