lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Sun, 11 Jul 2021 10:57:34 +0100
From:   Marc Zyngier <maz@...nel.org>
To:     Bharat Bhushan <bbhushan2@...vell.com>
Cc:     "catalin.marinas@....com" <catalin.marinas@....com>,
        "will@...nel.org" <will@...nel.org>,
        "daniel.lezcano@...aro.org" <daniel.lezcano@...aro.org>,
        "mark.rutland@....com" <mark.rutland@....com>,
        "konrad.dybcio@...ainline.org" <konrad.dybcio@...ainline.org>,
        "saiprakash.ranjan@...eaurora.org" <saiprakash.ranjan@...eaurora.org>,
        "robh@...nel.org" <robh@...nel.org>,
        "marcan@...can.st" <marcan@...can.st>,
        "suzuki.poulose@....com" <suzuki.poulose@....com>,
        "broonie@...nel.org" <broonie@...nel.org>,
        "linux-arm-kernel@...ts.infradead.org" 
        <linux-arm-kernel@...ts.infradead.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        Linu Cherian <lcherian@...vell.com>
Subject: Re: [EXT] Re: [PATCH] clocksource: Add Marvell Errata-38627 workaround

On Thu, 08 Jul 2021 11:48:18 +0100,
Bharat Bhushan <bbhushan2@...vell.com> wrote:
> 
> Hi Marc,
> 
> Similar questions are asked by Mark, response might be duplicated.

Mark had a ton of very good questions, so I won't repeat them. Some
more below though:

> > -----Original Message-----
> > From: Marc Zyngier <maz@...nel.org>
> > Sent: Monday, July 5, 2021 2:57 PM
> > To: Bharat Bhushan <bbhushan2@...vell.com>
> > Cc: catalin.marinas@....com; will@...nel.org; daniel.lezcano@...aro.org;
> > mark.rutland@....com; konrad.dybcio@...ainline.org;
> > saiprakash.ranjan@...eaurora.org; robh@...nel.org; marcan@...can.st;
> > suzuki.poulose@....com; broonie@...nel.org; linux-arm-
> > kernel@...ts.infradead.org; linux-kernel@...r.kernel.org; Linu Cherian
> > <lcherian@...vell.com>
> > Subject: [EXT] Re: [PATCH] clocksource: Add Marvell Errata-38627 workaround
> > 
> > External Email
> > 
> > ----------------------------------------------------------------------
> > On Mon, 05 Jul 2021 07:08:43 +0100,
> > Bharat Bhushan <bbhushan2@...vell.com> wrote:
> > >
> > > CPU pipeline have unpredicted behavior when timer interrupt appears
> > > and then disappears prior to the exception happening.
> > 
> > What kind of unpredictable behaviours?  
> 
> This is a race condition where an instruction (except store, system,
> load atomic and load exclusive) becomes "nop" if interrupt appears
> and disappears before taken by CPU. This can lead to GPR
> corruption. For example interrupt appears after the atomic load
> instruction starts executing and disappears before the atomic load
> instruction completes, in that case instruction (not all) can become
> "nop". As interrupt disappears before atomic instruction completes,
> cpu continues to execute and while take stale value from register as
> other dependent got "nop".

So here's what I understand from the above:

- Interrupts being a context synchronisation event, the CPU deals with
  them by preventing in-flight instructions from having any effect
  (what you above describe as becoming NOP).

- If the interrupt is recalled before the exception entry can take
  place, the exception doesn't occur, but the discarded instructions
  are not replayed, leaving the program in an inconsistent state.

Is this interpretation correct? If so, I have more questions:

- Does the erratum trigger when interrupts are masked in PSTATE? Can
  this erratum be triggered by masking interrupts in PSTATE?

- What makes this specific to the timer? Why can't this be triggered
  with any other interrupt? Spurious interrupts do exist, and happen
  all the time, specially with level triggered interrupts.

- What if *another* CPU masks the interrupt at the GIC redistributor
  level?

> > What happens if a guest isn't aware of the erratum or actively
> > tries to trigger it?
> 
> Errata applies to VM (EL1 virtual timer) as well. In addition
> extending the workaround to timer context save/restore in kvm seems
> to work.  Can you help if we are missing something in VM?

Maybe. First, I want to understand why this is specific to the timer,
and whether this can have any impact when already in an exception
context. I'm not convinced that this issue is specific to the timer
either.

Which revision of the architecture does this CPU implements? Depending
on whether the CPU runs VHE or not, we handle things slightly differently.

> > > Time interrupt appears on timer
> > > expiry and disappears when timer programming or timer disable. This
> > > typically can happen when a load instruction misses in the cache,
> > > which can take few hundreds of cycles, and an interrupt appears after
> > > the load instruction starts executing but disappears before the load
> > > instruction completes.
> > >
> > > Workaround of this is to ensure maximum 2us of time
> > 
> > maximum? I'm not sure how you can bound this. Or did you mean *minimum*?
> 
> It is minimum
> 
> > 
> > How was this value obtained? What guarantees that it is safe?
> 
> H/w team suggested same

This doesn't answer my question.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ