[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aV6K7QnUa7jDpKw-@willie-the-truck>
Date: Wed, 7 Jan 2026 16:33:49 +0000
From: Will Deacon <will@...nel.org>
To: Marc Zyngier <maz@...nel.org>
Cc: Lucas Wei <lucaswei@...gle.com>,
Catalin Marinas <catalin.marinas@....com>,
Jonathan Corbet <corbet@....net>, sjadavani@...gle.com,
kernel test robot <lkp@...el.com>, stable@...r.kernel.org,
kernel-team@...roid.com, linux-arm-kernel@...ts.infradead.org,
linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org,
robin.murphy@....com, smostafa@...gle.com
Subject: Re: [PATCH v2] arm64: errata: Workaround for SI L1 downstream
coherency issue
Hey Marc,
On Thu, Jan 01, 2026 at 06:55:05PM +0000, Marc Zyngier wrote:
> On Mon, 29 Dec 2025 03:36:19 +0000,
> Lucas Wei <lucaswei@...gle.com> wrote:
> > diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c
> > index 8cb3b575a031..5c0ab6bfd44a 100644
> > --- a/arch/arm64/kernel/cpu_errata.c
> > +++ b/arch/arm64/kernel/cpu_errata.c
> > @@ -141,6 +141,30 @@ has_mismatched_cache_type(const struct arm64_cpu_capabilities *entry,
> > return (ctr_real != sys) && (ctr_raw != sys);
> > }
> >
> > +#ifdef CONFIG_ARM64_ERRATUM_4311569
> > +static DEFINE_STATIC_KEY_FALSE(arm_si_l1_workaround_4311569);
> > +static int __init early_arm_si_l1_workaround_4311569_cfg(char *arg)
> > +{
> > + static_branch_enable(&arm_si_l1_workaround_4311569);
> > + pr_info("Enabling cache maintenance workaround for ARM SI-L1 erratum 4311569\n");
> > +
> > + return 0;
> > +}
> > +early_param("arm_si_l1_workaround_4311569", early_arm_si_l1_workaround_4311569_cfg);
> > +
> > +/*
> > + * We have some earlier use cases to call cache maintenance operation functions, for example,
> > + * dcache_inval_poc() and dcache_clean_poc() in head.S, before making decision to turn on this
> > + * workaround. Since the scope of this workaround is limited to non-coherent DMA agents, its
> > + * safe to have the workaround off by default.
> > + */
> > +static bool
> > +need_arm_si_l1_workaround_4311569(const struct arm64_cpu_capabilities *entry, int scope)
> > +{
> > + return static_branch_unlikely(&arm_si_l1_workaround_4311569);
> > +}
> > +#endif
>
> But this isn't a detection mechanism. That's relying on the user
> knowing they are dealing with broken hardware. How do they find out?
Sadly, I'm not aware of a mechanism to detect this reliably at runtime
but adding Robin in case he knows of one. Linux generally doesn't need
to worry about the SLC, so we'd have to add something to DT to detect
it and even then I don't know whether it's something that is typically
exposed to non-secure...
We also need the workaround to be up early enough that drivers don't
run into issues, so that would probably involve invasive surgery in the
DT parsing code.
> You don't even call out what platform is actually affected...
Well, it's an Android phone :)
More generally, it's going to be anything with an Arm "SI L1" configured
to work with non-coherent DMA agents below it. Christ knows whose bright
idea it was to put "L1" in the name of the thing containing the system
cache.
> The other elephant in the room is virtualisation: how does a guest
> performing CMOs deals with this? How does it discover the that the
> host is broken? I also don't see any attempt to make KVM handle the
> erratum on behalf of the guest...
A guest shouldn't have to worry about the problem, as it only affects
clean to PoC for non-coherent DMA agents that reside downstream of the
SLC in the interconnect. Since VFIO doesn't permit assigning
non-coherent devices to a guest, guests shouldn't ever need to push
writes that far (and FWB would cause bigger problems if that was
something we wanted to support)
+Mostafa to keep me honest on the VFIO front.
Will
Powered by blists - more mailing lists