linux-kernel - RE: [EXT] Re: [PATCH] clocksource: Add Marvell Errata-38627 workaround

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CO6PR18MB4465AAE7916DCFECE9D47EA2E3E89@CO6PR18MB4465.namprd18.prod.outlook.com>
Date:   Mon, 26 Jul 2021 04:29:53 +0000
From:   Bharat Bhushan <bbhushan2@...vell.com>
To:     Mark Rutland <mark.rutland@....com>
CC:     "catalin.marinas@....com" <catalin.marinas@....com>,
        "will@...nel.org" <will@...nel.org>,
        "daniel.lezcano@...aro.org" <daniel.lezcano@...aro.org>,
        "maz@...nel.org" <maz@...nel.org>,
        "konrad.dybcio@...ainline.org" <konrad.dybcio@...ainline.org>,
        "saiprakash.ranjan@...eaurora.org" <saiprakash.ranjan@...eaurora.org>,
        "robh@...nel.org" <robh@...nel.org>,
        "marcan@...can.st" <marcan@...can.st>,
        "suzuki.poulose@....com" <suzuki.poulose@....com>,
        "broonie@...nel.org" <broonie@...nel.org>,
        "linux-arm-kernel@...ts.infradead.org" 
        <linux-arm-kernel@...ts.infradead.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        Linu Cherian <lcherian@...vell.com>,
        Sunil Kovvuri Goutham <sgoutham@...vell.com>
Subject: RE: [EXT] Re: [PATCH] clocksource: Add Marvell Errata-38627
 workaround

Sorry for delayed response

Please see inline

> -----Original Message-----
> From: Mark Rutland <mark.rutland@....com>
> Sent: Tuesday, July 13, 2021 9:43 PM
> To: Bharat Bhushan <bbhushan2@...vell.com>
> Cc: catalin.marinas@....com; will@...nel.org; daniel.lezcano@...aro.org;
> maz@...nel.org; konrad.dybcio@...ainline.org;
> saiprakash.ranjan@...eaurora.org; robh@...nel.org; marcan@...can.st;
> suzuki.poulose@....com; broonie@...nel.org; linux-arm-
> kernel@...ts.infradead.org; linux-kernel@...r.kernel.org; Linu Cherian
> <lcherian@...vell.com>; Sunil Kovvuri Goutham <sgoutham@...vell.com>
> Subject: Re: [EXT] Re: [PATCH] clocksource: Add Marvell Errata-38627
> workaround
> 
> On Tue, Jul 13, 2021 at 02:40:22AM +0000, Bharat Bhushan wrote:
> > Hi Mark,
> >
> > > -----Original Message-----
> > > From: Mark Rutland <mark.rutland@....com>
> > > Sent: Thursday, July 8, 2021 5:12 PM
> > > To: Bharat Bhushan <bbhushan2@...vell.com>
> > > Cc: catalin.marinas@....com; will@...nel.org;
> > > daniel.lezcano@...aro.org; maz@...nel.org;
> > > konrad.dybcio@...ainline.org; saiprakash.ranjan@...eaurora.org;
> > > robh@...nel.org; marcan@...can.st; suzuki.poulose@....com;
> > > broonie@...nel.org; linux-arm- kernel@...ts.infradead.org;
> > > linux-kernel@...r.kernel.org; Linu Cherian <lcherian@...vell.com>
> > > Subject: Re: [EXT] Re: [PATCH] clocksource: Add Marvell Errata-38627
> > > workaround
> > >
> > > On Thu, Jul 08, 2021 at 10:47:42AM +0000, Bharat Bhushan wrote:
> > > > Hi Mark,
> > > >
> > > > Sorry for the delay, was gathering some details.
> > > > Pease see inline
> > > >
> > > > > -----Original Message-----
> > > > > From: Mark Rutland <mark.rutland@....com>
> > > > > Sent: Monday, July 5, 2021 2:38 PM
> > > > > To: Bharat Bhushan <bbhushan2@...vell.com>
> > > > > Cc: catalin.marinas@....com; will@...nel.org;
> > > > > daniel.lezcano@...aro.org; maz@...nel.org;
> > > > > konrad.dybcio@...ainline.org; saiprakash.ranjan@...eaurora.org;
> > > > > robh@...nel.org; marcan@...can.st; suzuki.poulose@....com;
> > > > > broonie@...nel.org; linux-arm- kernel@...ts.infradead.org;
> > > > > linux-kernel@...r.kernel.org; Linu Cherian
> > > > > <lcherian@...vell.com>
> > > > > Subject: [EXT] Re: [PATCH] clocksource: Add Marvell Errata-38627
> > > > > workaround
> > > > >
> > > > > External Email
> > > > >
> > > > > ----------------------------------------------------------------
> > > > > ----
> > > > > --
> > > > > Hi Bharat,
> > > > >
> > > > > On Mon, Jul 05, 2021 at 11:38:43AM +0530, Bharat Bhushan wrote:
> > > > > > CPU pipeline have unpredicted behavior when timer interrupt
> > > > > > appears and then disappears prior to the exception happening.
> > > > > > Time interrupt appears on timer expiry and disappears when
> > > > > > timer programming or timer disable. This typically can happen
> > > > > > when a load instruction misses in the cache,  which can take
> > > > > > few hundreds of cycles, and an interrupt appears after the
> > > > > > load instruction starts executing but disappears before the load
> instruction completes.
> > > > >
> > > > > Could you elaborate on the scenario? What sort of unpredictable
> > > > > behaviour can occur? e.g:
> > > >
> > > > This is a race condition where an instruction (except store,
> > > > system, load atomic and load exclusive) becomes "nop" if interrupt
> > > > appears and disappears before taken by CPU. For example interrupt
> > > > appears after the atomic load instruction starts executing and
> > > > disappears before the atomic load instruction completes, in that
> > > > case instruction (not all) can become "nop". As interrupt
> > > > disappears before atomic instruction completes, cpu continues to
> > > > execute and while take junk from register as other dependent got "nop".
> > >
> > > Thanks for this; I have a number of further questions below.
> > >
> > > You said this doesn't apply to:
> > >
> > > * store
> > > * system
> > > * load atomic
> > > * load exclusive
> > >
> > > ... but your example explains this happening for an atomic load,
> > > which was in that list. Was the example bad, or was the list wrong?
> >
> > The load atomic completes successfully. It doesn't become a nop. A
> > loads atomic is significant just because it's an instruction which has
> > a long time between executing an retiring. This provides a window of
> > vulnerability when an interrupt asserts and then deasserts. This
> > stimulates the bug and causes other instructions executing in
> > parallel,  which can get nop.
> 
> Thanks for clarifiying; this was not clear from your initial description.
> 
> > > It's not entirely clear to me which instructions this covers. e.g.
> > > is "system" the entire system instruction class (i.e. all opcodes
> > > 0b110101010_0xxxxxxx_xxxxxxxx_xxxxxxxx), or did you mean something
> > > more specific? Does "store" include store-exlcusive?
> > >
> > > Other than that list, can this occur for *any* instruction? e.g.
> > > MOV, SHA256*, *DIV?
> >
> > There are two general classes of instructions. Those that only change
> > a gpr or PC. These are arithmetic, floating point, branch. Loads with
> > no side effects also fall into this category. These are the
> > instructions that can erroneously be nop'd. The other category are
> > instructions that can change architectural state more than a GPR.
> > These include all stores, atomic loads, exclusive loads, loads to
> > non-cacheable space,msr,mrs,eret,tlb*,sys,brk,etc, these does not get
> > "noped"
> >
> > >
> > > Does this only apply to a single instruction at a time, or can
> > > multiple instructions "become nop"?
> >
> > Can be multiple,
> >
> > >
> > > When an instruction "becomes nop", will subsequent instructions see
> > > a consistent architectural state (e.g. GPRs as they were exactly
> > > before the instruction which "becomes nop"), or can they see
> > > something else (e.g. garbage forwarded from register renaming or other
> internal microarchitectural state)?
> >
> > > > > * Does the CPU lockup?
> > > > No
> > > >
> > > > > * Does the CPU take the exception at all?
> > > > No
> > > >
> > > > > * Does the load behave erroneously?
> > > > No,
> > > >
> > > > > * Does any CPU state (e.g. GPRs, PC, PSTATE) become corrupted?
> > > >
> > > > yes, GPRs will get corrupted, will have stale value
> > >
> > > As above, is that the prior architectural value of the GPRs, or can
> > > that be some bogus microarchitectural state (e.g. from renaming or other
> forwarding paths)?
> >
> > The instructions that become a nop doesn't write the GPR and because
> > this is an OOO machine the GPR result isn't the prior architectural
> > value but whatever crap is leftover in the physical register.
> 
> Ok, so that's a potential information leak from a different context (e.g. higher EL),
> depending on what happened to be left in that physical register.
> 
> Consider a malicious guest at EL1. What prevents it from triggering this
> deliberately, then inspecting the GPRs after taking the IRQ in order to read prior
> secrets?
> 
> > > > > Does the problem manifest when IRQs are masked by DAIF.I, or by
> > > > > CNT*_CTL_EL0.{IMASK,ENABLE} ?
> > > >
> > > > No, there are no issue if interrupts are masked.
> > >
> > > If a write to CNTV_CTL_EL0.IMASK races with the interrupt being
> > > asserted, can that trigger the problem?
> >
> > If interrupt is enabled (DAIF) - then it will be taken, and no issue
> > But if interrupts are disabled then following sequence can see the race
> >  1)	interrupt is disabled (DAIF)
> > 2)	TVAL/ENABLE/IMASK at timer h/w programming to de-assert interrupt.
> > 	Race of Irq asserted before irq de-asserted, than this short window of
> assertion will be considered as spike from timer h/w block
> > 3)	Enable DAIF
> > 	Because of propagation delay CPU sees assertion and de-assertion
> > (spike), errata hit
> >
> > Will add "isb" around interrupt enablement in next version of patch.
> 
> ... why?
> 
> That doesn't seem to follow from the abvoe, so I think I'm missing a step.
> 
> > > If a write to DAIF.I races with the interrupt being asserted, can
> > > that trigger the problem?
> >
> > No race with writing to DAIF.I with interrupt assertion, Writing
> > DAIF.I = 0 (enablement of interrupt) can race with de-assertion, which
> > can lead to hitting errata
> 
> Ok.
> 
> That *might* make it possible to bodge around the timer specifically, but as
> below I don't think we can ensure this is safe in the presence of virtualization,
> nor when considering other interrupts.
> 
> > > From your description so far, this doesn't sound like it is specific
> > > to the timer interrupt. Is it possible for a different interrupt to trigger this, e.g:
> > >
> > > * Can the same happen with another PPI, e.g. the PMU interrupt if that
> > >   gets de-asserted, or there's a race with DAIF.I?
> > >
> > > * Can the same happen with an SGI, e.g. if one CPU asserts then
> > >   de-asserts an SGI targetting another CPU, or there's a race with
> > >   DAIF.I?
> > >
> > > * Can the same happen with an SPI, e.g. if a device asserts then
> > >   de-asserts its IRQ line, or there's a race with DAIF.I?
> >
> > No issue with edge triggered, but this can happen with any level sensitive
> interrupt.
> 
> Ok.
> 
> So that'll include at least the PMU and
> 
> > > If not, *why* does this happen specifically for the timer interrupt?
> > >
> > > > > > Workaround of this is to ensure maximum 2us of time gap
> > > > > > between timer interrupt and timer programming which can de-assert
> timer interrupt.
> > > > >
> > > > > The code below seems to try to enforce a 2us *minimum*. Which is
> > > > > it supposed to be?
> > > >
> > > > Yes, it is minimum 2us.
> > > >
> > > > >
> > > > > Can you explain *why* this is supposed to help?
> > > > With the workaround interrupt assertion and de-assertion will be
> > > > minimum 2us
> > > apart.
> > >
> > > I understood that, but why is that deemed to be sufficient? e.g. is
> > > it somehow guaranteed that the CPU will complete the instruction that would
> "become nop"
> > > in that time?
> >
> > With this delay we avoid spike, either this this will becomes an
> > actual interrupt or the spike never visible to core.
> >
> > > > > I don't see how we can guarantee this in a VM, or if the CPU
> > > > > misses on an instruction fetch.
> > > >
> > > > This errata applies to VM (virtual timer) as well, maybe there is
> > > > some gap in my understanding, how it will be different in VM.
> > > > Can you help with what issue we can have VM?
> > >
> > > A VCPU can be pre-empted by the host at *any* time, for an arbitrary
> > > length of time. So e.g. you can have a scenario such as:
> > >
> > > 1. Guest reads CNTx_TVAL, sees interrupt is 4us in the future and
> > >    decides it does not need to wait
> > > 2. Host preempts guest
> > > 3. Host does some processing for ~3.9us 4. Host returns to guest,
> > > with 0.1us left until the interrupt triggers 5. Guest reprograms
> > > CNTx_TVAL, and triggers the erratum
> >
> > Yes, when timer expire just before tval written (race condition) , so
> > there is assertion-followed by de-assertion, As interrupts are enabled
> > in host, interrupt will be visible as spike to host.
> 
> Ok, so that's a recipe for the guest to attack the host.
> 
> > We will apply workaround whenever entering to guest (add a delay
> > before exiting to guest in case guest timer is going to expire).
> 
> I think this is papering over the problem.
> 
> You've said this can happen for *any* level-triggered interrupt. AFAIK, nothing
> prevents a malicious guest from deliberately asserting and de-asserting a level-
> triggered interrupt (e.g. by writing to the GIC distributor), and I also note that the
> GIC maintenance interrupt is level-triggered.
> 
> So, as above:
> 
> 1) A guest can deliberately cause information to be leaked to itself via
>    the corrupted GPRs. I haven't seen any rationale for why that is not
>    a problem, nor have I seen a suggested workaround.
> 
> 2) A guest *may* be able to trigger this while the host is running. I
>    haven't seen anything that rules this out so far.
> 
> 3) Even in the absence of virtualization, it would be necessary to
>    workaround this for *every* level-triggered interrupt, which includes
>    at the timer, PMU, and GIC maintenance interrupts, in addition to any
>    other configurable PPIs or SPIs.
> 
> Without a fix that covers all of those, I don't think the workaround is viable.

This patch covers workaround for ARM arch timer in non-virtualized cases.

While we are considering different scenarios which can trigger the issue.
After discussing with HW folks internally we have come to a conclusion that there is no 
single workaround which will fix all the scenarios. The host timer interrupt workaround is
different from virtualization and from other interrupt sources.

While we are working on other workarounds, we want to push timer workaround first
as currently that's the one customers are encountering right now and want a upstream
accepted patch soon. Other workarounds will take time to test and qualify.

Wrt drivers disabling the interrupt, except changing the driver, we don't see any common
place where we can add a workaround. Please let me your take on this.

Thanks
-Bharat

> 
> Thanks,
> Mark.