[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZfnkEqglNPRzH3Zk@sequoia>
Date: Tue, 19 Mar 2024 14:14:26 -0500
From: Tyler Hicks <code@...icks.com>
To: Will Deacon <will@...nel.org>
Cc: Robin Murphy <robin.murphy@....com>, Jason Gunthorpe <jgg@...pe.ca>,
Jerry Snitselaar <jsnitsel@...hat.com>,
linux-arm-kernel@...ts.infradead.org, iommu@...ts.linux.dev,
linux-kernel@...r.kernel.org, Dexuan Cui <decui@...rosoft.com>,
Easwar Hariharan <eahariha@...ux.microsoft.com>
Subject: Re: Why is the ARM SMMU v1/v2 put into bypass mode on kexec?
On 2024-03-19 15:47:56, Will Deacon wrote:
> On Tue, Mar 19, 2024 at 12:57:52PM +0000, Robin Murphy wrote:
> > Beyond properly quiescing and resetting the system back to a boot-time
> > state, the outgoing kernel in a kexec can only really do things which affect
> > itself. Sure, we *could* configure the SMMU to block all traffic and disable
> > the interrupt to avoid getting stuck in a storm of faults on the way out,
> > but what does that mean for the incoming kexec payload? That it can have the
> > pleasure of discovering the SMMU, innocently enabling the interrupt and
> > getting stuck in an unexpected storm of faults. Or perhaps just resetting
> > the SMMU into a disabled state and thus still unwittingly allowing its
> > memory to be corrupted by the previous kernel not supporting kexec properly.
>
> Right, it's hard to win if DMA-active devices weren't quiesced properly
> by the outgoing kernel. Either the SMMU was left in abort (leading to the
> problems you list above) or the SMMU is left in bypass (leading to possible
> data corruption). Which is better?
My thoughts are that a loud and obvious failure (via unidentified stream
fault messages and/or a possible interrupt storm preventing the new
kernel from booting) is favorable to silent and subtle data corruption
of the target kernel.
> The best solution is obviously to implement those missing ->shutdown()
> callbacks.
Completely agree here but it can be difficult to even identify that a
missing ->shutdown hook is the root cause without code changes to put
the SMMU into abort mode and sleep for a bit in the SMMU's ->shutdown
hook.
Tyler
Powered by blists - more mailing lists