linux-kernel - Re: [EXT] Re: [PATCH] clocksource: Add Marvell Errata-38627 workaround

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <87fsw1dkbp.wl-maz@kernel.org>
Date:   Mon, 26 Jul 2021 19:03:06 +0100
From:   Marc Zyngier <maz@...nel.org>
To:     Bharat Bhushan <bbhushan2@...vell.com>
Cc:     Mark Rutland <mark.rutland@....com>,
        "catalin.marinas@....com" <catalin.marinas@....com>,
        "will@...nel.org" <will@...nel.org>,
        "daniel.lezcano@...aro.org" <daniel.lezcano@...aro.org>,
        "konrad.dybcio@...ainline.org" <konrad.dybcio@...ainline.org>,
        "saiprakash.ranjan@...eaurora.org" <saiprakash.ranjan@...eaurora.org>,
        "robh@...nel.org" <robh@...nel.org>,
        "marcan@...can.st" <marcan@...can.st>,
        "suzuki.poulose@....com" <suzuki.poulose@....com>,
        "broonie@...nel.org" <broonie@...nel.org>,
        "linux-arm-kernel@...ts.infradead.org" 
        <linux-arm-kernel@...ts.infradead.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        Linu Cherian <lcherian@...vell.com>,
        Sunil Kovvuri Goutham <sgoutham@...vell.com>
Subject: Re: [EXT] Re: [PATCH] clocksource: Add Marvell Errata-38627 workaround

Hi Bharat,

On Mon, 26 Jul 2021 05:29:53 +0100,
Bharat Bhushan <bbhushan2@...vell.com> wrote:
> 
> Sorry for delayed response
> 
> Please see inline
> 
> > -----Original Message-----
> > From: Mark Rutland <mark.rutland@....com>
> > Sent: Tuesday, July 13, 2021 9:43 PM
> >
> > 1) A guest can deliberately cause information to be leaked to itself via
> >    the corrupted GPRs. I haven't seen any rationale for why that is not
> >    a problem, nor have I seen a suggested workaround.
> > 
> > 2) A guest *may* be able to trigger this while the host is running. I
> >    haven't seen anything that rules this out so far.
> > 
> > 3) Even in the absence of virtualization, it would be necessary to
> >    workaround this for *every* level-triggered interrupt, which includes
> >    at the timer, PMU, and GIC maintenance interrupts, in addition to any
> >    other configurable PPIs or SPIs.
> > 
> > Without a fix that covers all of those, I don't think the
> > workaround is viable.
> 
> This patch covers workaround for ARM arch timer in non-virtualized
> cases.
> 
> While we are considering different scenarios which can trigger the
> issue.  After discussing with HW folks internally we have come to a
> conclusion that there is no single workaround which will fix all the
> scenarios. The host timer interrupt workaround is different from
> virtualization and from other interrupt sources.
> 
> While we are working on other workarounds, we want to push timer
> workaround first as currently that's the one customers are
> encountering right now and want a upstream accepted patch
> soon. Other workarounds will take time to test and qualify.
> 
> Wrt drivers disabling the interrupt, except changing the driver, we
> don't see any common place where we can add a workaround. Please let
> me your take on this.

I don't think a workaround limited to the timer is viable. It is quite
obvious that once you have worked around the most likely cause for a
crash (timer interrupts), you will need to come up with yet another
workaround for another interrupt source.

We need a solution that works for all interrupts, or at the very least
all per-CPU interrupts. For global interrupts, only you can find out
how they can be mitigated. If that means changing drivers, so be it.
I understand that this isn't what you want to read, but I'm not
confident taking this patch with the knowledge that there is still a
million ways to make it fall over.

Evidently, KVM cannot be enabled on such a system. More importantly, I
cannot see how we can support users of such a machine either. How to
analyse a crash report if there is a remote possibility that the CPU
has decided to ignore a number of instructions?

To sum it up, I'm not prepared to approve such a patch until there is
a compelling story for all the interrupts that may trigger such
behaviour.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.