[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <86cycw9rd4.fsf@scott-ph-mail.amperecomputing.com>
Date: Mon, 28 Apr 2025 09:35:03 -0700
From: D Scott Phillips <scott@...amperecomputing.com>
To: Marc Zyngier <maz@...nel.org>
Cc: Catalin Marinas <catalin.marinas@....com>, James Clark
<james.clark@...aro.org>, James Morse <james.morse@....com>, Joey Gouly
<joey.gouly@....com>, Kevin Brodsky <kevin.brodsky@....com>, Mark Brown
<broonie@...nel.org>, Mark Rutland <mark.rutland@....com>, Oliver Upton
<oliver.upton@...ux.dev>, "Rob Herring (Arm)" <robh@...nel.org>, Shameer
Kolothum <shameerali.kolothum.thodi@...wei.com>, Shiqi Liu
<shiqiliu@...t.edu.cn>, Will Deacon <will@...nel.org>, Yicong Yang
<yangyicong@...ilicon.com>, kvmarm@...ts.linux.dev,
linux-arm-kernel@...ts.infradead.org, open list
<linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 1/2] arm64: errata: Work around AmpereOne's erratum
AC03_CPU_36
Marc Zyngier <maz@...nel.org> writes:
> On Fri, 25 Apr 2025 03:02:29 +0100,
> D Scott Phillips <scott@...amperecomputing.com> wrote:
>>
>> Marc Zyngier <maz@...nel.org> writes:
>>
>> > On Tue, 15 Apr 2025 16:47:10 +0100,
>> > D Scott Phillips <scott@...amperecomputing.com> wrote:
>> >>
>> >> AC03_CPU_36 can cause asynchronous exceptions to be routed to the wrong
>> >> exception level if an async exception coincides with an update to the
>> >> controls for the target exception level in HCR_EL2. On affected
>> >> machines, always do writes to HCR_EL2 with async exceptions blocked.
>> >
>> > From the actual errata document [1]:
>> >
>> > <quote>
>> > If an Asynchronous Exception to EL2 occurs, while EL2 software is
>> > changing the EL2 exception control bits from a configuration where
>> > asynchronous exceptions are routed to EL2 to a configuration where
>> > asynchronous exceptions are routed to EL1, the processor may exhibit
>> > the incorrect exception behavior of routing an interrupt taken at EL2
>> > to EL1. The affected system register is HCR_EL2, which contains
>> > control bits for routing and enabling of EL2 exceptions.
>> > </quote>
>> >
>> > My reading is that things can go wrong when clearing the xMO bits.
>> >
>> > I don't think we need to touch the xMO bits at all when running
>> > VHE. So my preference would be to:
>> >
>> > - simply leave the xMO bits set at all times (nothing bad can happen
>> > from that, can it?)
>> >
>> > - prevent these systems from using anything but VHE (and fail KVM init
>> > otherwise)
>>
>> Hi Marc, I started writing up this patch and then realized that the
>> issue can also not happen in nvhe mode. While xMO bits are modified
>> there, async exceptions are always masked and so the "simultaneously
>> take an async exception" part of the erratum can't happen.
>>
>> Does that sound right to you, or are there cases that I'm missing. If
>> it's right the nvhe is also can't hit the erratum case, then what do you
>> think is the right thing for me to do here?
>
> That's an interesting point. We always run the nVHE/hVHE hypervisor
> code with interrupts disabled by virtue of taking an HVC exception
> into EL2, so that particular case seems OK as it literally implements
> the proposed workaround.
>
> However, there's at least one catch: the SError handling code in
> hyp/entry.S relies on clearing PSTATE.A to take a pending abort (the
> so-called VAXorcism). I take that this CPU implements FEAT_RAS, and
> that we don't need to worry about this code path either, and that the
> erratum cannot trigger on speculatively executed paths?
Yep, right on both counts, the cpu supports FEAT_RAS, and the erratum
case doesn't happen speculatively.
> If we're OK with that, then I don't think there is much to do, other
> than always setting the xMO bits at all times, for which I already
> have a patch in review (v2 coming shortly).
OK, sounds good to me.
Powered by blists - more mailing lists