lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAD=FV=VbxM-hftVKsp2JhQpnQm9u_H86rQbnucS5-x-q8j0QQA@mail.gmail.com>
Date:   Thu, 5 Oct 2023 08:34:56 -0700
From:   Doug Anderson <dianders@...omium.org>
To:     Mark Rutland <mark.rutland@....com>,
        Catalin Marinas <catalin.marinas@....com>,
        Will Deacon <will@...nel.org>
Cc:     Marc Zyngier <maz@...nel.org>, Stephen Boyd <swboyd@...omium.org>,
        Valentin Schneider <vschneid@...hat.com>,
        Chen-Yu Tsai <wenst@...omium.org>,
        AngeloGioacchino Del Regno 
        <angelogioacchino.delregno@...labora.com>,
        D Scott Phillips <scott@...amperecomputing.com>,
        Josh Poimboeuf <jpoimboe@...nel.org>,
        Matthias Brugger <matthias.bgg@...il.com>,
        Misono Tomohiro <misono.tomohiro@...itsu.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Sumit Garg <sumit.garg@...aro.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org,
        linux-mediatek@...ts.infradead.org
Subject: Re: [PATCH 1/2] arm64: smp: Fix pseudo NMI issues w/ broken Mediatek FW

Hi,

On Thu, Oct 5, 2023 at 3:27 AM Mark Rutland <mark.rutland@....com> wrote:
>
> On Wed, Oct 04, 2023 at 07:04:12AM -0700, Doug Anderson wrote:
> > On Wed, Oct 4, 2023 at 3:15 AM Marc Zyngier <maz@...nel.org> wrote:
> > >
> > > On Wed, 04 Oct 2023 10:59:50 +0100,
> > > Mark Rutland <mark.rutland@....com> wrote:
> > > >
> > > > Given you haven't seen any issues, I suspect those are getting reset to fixed
> > > > values that happens to work out for us, but it is a bit worrisome more
> > > > generally (e.g. the LPI case above).
> > >
> > > It is likely that these SoCs don't even have an ITS.
> >
> > Right. That was what we decided [1] when Marc pointed this out earlier.
> >
> > Overall: we know that this firmware behavior is not good but we're
> > stuck with it. :( At the very least, any new devices coming out will
> > have this fixed. Presumably if old devices are working OK enough today
> > (as long as you don't enable pseudo-NMI) then they can be made to keep
> > working?
> >
> > So circling back: what patch should we actually land?
>
> For now I'd prefer we took the patch I sent in:
>
>   https://lore.kernel.org/linux-arm-kernel/ZRr8r7XMoyDKaitd@FVFF77S0Q05N.cambridge.arm.com/
>
> ... as that leaves us no worse than before this series, and it's pretty simple.

Sounds good to me!

Catalin / Will: Please yell if there's anything you need me to do.
Otherwise I'll assume you'll pick up Mark's patch instead of my patch
#1 and then you'll pick up my patch #2.


> > As of right now only pseudo-NMI is broken, but it would be good to make sure
> > that if the kernel later adds other features that would be broken on this
> > hardware that it gets handled properly...
>
> Going further than the above, I think there are three options here:
>
> 1) Complete fix: depend on a working firmware, and throw this workaround away.
>
>    IIUC from the above, that's not something you can commit to.

Right. We've landed the fix in the firmware branch for many of the
devices that had the issue but there's a whole process (and cost
involved) in getting this actually rolled out. Each unique board needs
to kick off a FW qual. Given that nothing is actually broken today
it's hard to justify a FW qual just for this, so we're left
piggybacking on the next reason for a FW qual (if there is one).
...even if we could kick them off all instantly, though, it's always
best not to rely on a FW fix, especially if (as in this case) it's to
keep us from crashing. There'll always be some case where someone
tries to boot a new OS with an old firmware. One such case can happen
when booting recovery images on ChromeOS where the device always boots
from the Read-only (not updatable) firmware.


> 2) Partial fix: have the kernel save/restore everything.
>
>    IIUC this is unpalatable.

Yeah, Marc already NAKed it.


> 3) Partial fix: make the ARM64_HAS_GIC_PRIO_MASKING cpucap depend on the
>    absence of a "mediatek,broken-save-restore-fw" property in the DT. I believe
>    we can check that in early_enable_pseudo_nmi() or can_use_gic_priorities().
>
>    That'll avoid potential issues if/when we change the priorities used for
>    pNMI (which is something I've been looking at).
>
> I'm happy with (3) if Marc is.

Yeah, that seems best to me as a long term solution, too.

-Doug

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ