lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 24 Apr 2023 08:41:28 -0700
From:   Doug Anderson <dianders@...omium.org>
To:     Daniel Thompson <daniel.thompson@...aro.org>
Cc:     Petr Mladek <pmladek@...e.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Lecopzer Chen <lecopzer.chen@...iatek.com>,
        Stephen Boyd <swboyd@...omium.org>,
        Chen-Yu Tsai <wens@...e.org>,
        linux-arm-kernel@...ts.infradead.org,
        kgdb-bugreport@...ts.sourceforge.net,
        Marc Zyngier <maz@...nel.org>,
        linux-perf-users@...r.kernel.org,
        Mark Rutland <mark.rutland@....com>,
        Masayoshi Mizuma <msys.mizuma@...il.com>,
        Will Deacon <will@...nel.org>, ito-yuichi@...itsu.com,
        Sumit Garg <sumit.garg@...aro.org>,
        Catalin Marinas <catalin.marinas@....com>,
        Colin Cross <ccross@...roid.com>,
        Matthias Kaehlcke <mka@...omium.org>,
        Guenter Roeck <groeck@...omium.org>,
        Tzung-Bi Shih <tzungbi@...omium.org>,
        Alexander Potapenko <glider@...gle.com>,
        AngeloGioacchino Del Regno 
        <angelogioacchino.delregno@...labora.com>,
        Dan Williams <dan.j.williams@...el.com>,
        Geert Uytterhoeven <geert+renesas@...der.be>,
        Ingo Molnar <mingo@...nel.org>,
        John Ogness <john.ogness@...utronix.de>,
        Josh Poimboeuf <jpoimboe@...nel.org>,
        Juergen Gross <jgross@...e.com>,
        Kees Cook <keescook@...omium.org>,
        Laurent Dufour <ldufour@...ux.ibm.com>,
        Liam Howlett <liam.howlett@...cle.com>,
        Marco Elver <elver@...gle.com>,
        Matthias Brugger <matthias.bgg@...il.com>,
        Michael Ellerman <mpe@...erman.id.au>,
        Miguel Ojeda <ojeda@...nel.org>,
        Nathan Chancellor <nathan@...nel.org>,
        Nick Desaulniers <ndesaulniers@...gle.com>,
        "Paul E. McKenney" <paulmck@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Randy Dunlap <rdunlap@...radead.org>,
        Rasmus Villemoes <linux@...musvillemoes.dk>,
        Sami Tolvanen <samitolvanen@...gle.com>,
        Stefano Stabellini <sstabellini@...nel.org>,
        Vlastimil Babka <vbabka@...e.cz>,
        Zhaoyang Huang <zhaoyang.huang@...soc.com>,
        Zhen Lei <thunder.leizhen@...wei.com>,
        linux-kernel@...r.kernel.org, linux-mediatek@...ts.infradead.org
Subject: Re: [PATCH] hardlockup: detect hard lockups using secondary (buddy) cpus

Hi,

On Mon, Apr 24, 2023 at 5:54 AM Daniel Thompson
<daniel.thompson@...aro.org> wrote:
>
> On Fri, Apr 21, 2023 at 03:53:30PM -0700, Douglas Anderson wrote:
> > From: Colin Cross <ccross@...roid.com>
> >
> > Implement a hardlockup detector that can be enabled on SMP systems
> > that don't have an arch provided one or one implemented atop perf by
> > using interrupts on other cpus. Each cpu will use its softlockup
> > hrtimer to check that the next cpu is processing hrtimer interrupts by
> > verifying that a counter is increasing.
> >
> > NOTE: unlike the other hard lockup detectors, the buddy one can't
> > easily provide a backtrace on the CPU that locked up. It relies on
> > some other mechanism in the system to get information about the locked
> > up CPUs. This could be support for NMI backtraces like [1], it could
> > be a mechanism for printing the PC of locked CPUs like [2], or it
> > could be something else.
> >
> > This style of hardlockup detector originated in some downstream
> > Android trees and has been rebased on / carried in ChromeOS trees for
> > quite a long time for use on arm and arm64 boards. Historically on
> > these boards we've leveraged mechanism [2] to get information about
> > hung CPUs, but we could move to [1].
>
> On the Arm platforms is this code able to leverage the existing
> infrastructure to extract status from stuck CPUs:
> https://docs.kernel.org/trace/coresight/coresight-cpu-debug.html

Yup! I wasn't explicit about this, but that's where you end up if you
follow the whole bug tracker item that was linked as [2].
Specifically, we used to have downstream patches in the ChromeOS that
just reached into the coresight range from a SoC specific driver and
printed out the CPU_DBGPCSR. When Brian was uprevving rk3399
Chromebooks he found that the equivalent functionality had made it
upstream in a generic way through the coresight framework. Brian
confirmed it was working on rk3399 and made all of the device tree
changes needed to get it all hooked up, so (at least for that SoC) it
should work on that SoC.

[2] https://issuetracker.google.com/172213129

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ