[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5b19ab13-a7a0-48e2-99a4-357a9f4aeafa@arm.com>
Date: Tue, 19 Mar 2024 18:17:39 +0000
From: Robin Murphy <robin.murphy@....com>
To: Will Deacon <will@...nel.org>
Cc: Tyler Hicks <code@...icks.com>, Jason Gunthorpe <jgg@...pe.ca>,
Jerry Snitselaar <jsnitsel@...hat.com>,
linux-arm-kernel@...ts.infradead.org, iommu@...ts.linux.dev,
linux-kernel@...r.kernel.org, Dexuan Cui <decui@...rosoft.com>,
Easwar Hariharan <eahariha@...ux.microsoft.com>
Subject: Re: Why is the ARM SMMU v1/v2 put into bypass mode on kexec?
On 2024-03-19 3:47 pm, Will Deacon wrote:
> On Tue, Mar 19, 2024 at 12:57:52PM +0000, Robin Murphy wrote:
>> On 2024-03-14 7:06 pm, Tyler Hicks wrote:
>>> On 2024-03-14 09:55:46, Tyler Hicks wrote:
>>>> Given that drivers are only optionally asked to implement the .shutdown
>>>> hook, which is required to properly quiesce devices before a kexec, why
>>>> is it that we put the ARM SMMU v1/v2 into bypass mode in the arm-smmu
>>>> driver's own .shutdown hook?
>>>>
>>>> arm_smmu_device_shutdown() -> set SMMU_sCR0.CLIENTPD bit to 1
>>>>
>>>> Driver authors often forget to even implement a .shutdown hook, which
>>>> results in some hard-to-debug memory corruption issues in the kexec'ed
>>>> target kernel due to pending DMA operations happening on untranslated
>>>> addresses. Why not leave the SMMU in translate mode but clear the stream
>>>> mapping table (or maybe even call arm_smmu_device_reset()) in the SMMU's
>>>> .shutdown hook to prevent the memory corruption from happening in the
>>>> first place?
>>>>
>>>> Fully acknowledging that the proper fix is to quiesce the devices, I
>>>> feel like resetting the SMMU and leaving it in translate mode across
>>>> kexec would be more consistent with the intent behind v5.2 commit
>>>> 954a03be033c ("iommu/arm-smmu: Break insecure users by disabling bypass
>>>> by default"). The incoming transactions of devices, that weren't
>>>> properly quiesced during a kexec, would be blocked until their drivers
>>>> have a chance to reinitialize the devices in the new kernel.
>>>>
>>>> I appreciate any help understanding why bypass mode is utilized here as
>>>> I'm sure there are nuances that I haven't considered. Thank you!
>>>
>>> I now see that Will has previously mentioned that he'd be open to such a
>>> change:
>>>
>>> One thing I would be in favour of is changing the ->shutdown() code to
>>> honour disable_bypass=1 so that we put the SMMU in an aborting state
>>> instead of passthrough. Would that help at all? It would at least
>>> avoid the memory corruption on missing shutdown callback.
>>>
>>> - https://lore.kernel.org/linux-arm-kernel/20200608113852.GA3108@willie-the-truck/
>>>
>>> Robin mentions the need to support kexec into a non-SMMU-aware OS. I
>>> hadn't considered that bit of complexity:
>>>
>>> ... consider if the first kernel *did* leave it enabled with whatever
>>> left-over translations in place, and kexec'ed into another OS that
>>> wasn't SMMU-aware...
>>>
>>> - https://lore.kernel.org/linux-arm-kernel/e072f61a-d6cf-2e34-efd5-c1db38c5c622@arm.com/
>>>
>>> Now that we're 3-4 years removed from that conversation, has anything
>>> changed? Will, is there anything we'd need to watch out for if we were
>>> to prototype this sort of change? For example, would it be wise to
>>> disable fault interrupts when putting the SMMU in an aborting state
>>> before kexec'ing?
>
> I've grown older and wiser in those four years and no longer think that's
> a good idea :) Well, older maybe, but the reality is that the code around
> the driver has evolved and 'disable_bypass' is even more of a hack now
> than it used to be.
>
>> Fundamentally, we expect the SMMU to be disabled at initial boot, so per the
>> intent of kexec we put it back in that state. That also seems the most
>> likely expectation of anything we could kexec into, given that it is the
>> natural state of an untouched SMMU after a hard reset, and thus comes out as
>> the least-worst option.
>
> Heh, that sounded too good to be true when I read it so I went and looked at
> the code:
>
> SMMUv3: arm_smmu_device_shutdown() -> clears CR0 but doesn't touch GBPA
> SMMUv2: arm_smmu_device_shutdown() -> writes CLIENTPD to CR0
>
> So it's a bit of a muddle afaict; SMMUv2 explicitly goes into bypass
> whereas SMMUv3 probably does honour disable_bypass=false! Did I miss
> something?
I think so, namely the utter madness around how and when we do actually
touch GBPA - if we found SMMUEN set at the start of probe, then we set
GBPA to abort before initially clearing SMMUEN; if the DT is broken then
we set GBPA to bypass instead of enabling SMMUEN at the end of probe,
*unless* disable_bypass was set. Thus by the time we get to shutdown,
SMMUEN may or may not already be 0 and GPBA may or may not have been
changed from its initial value to either one of bypass or abort.
> As discussed elsewhere, if we remove disable_bypass from SMMUv3, then
> we should be able to be consistent here. The question is, what's the
> right behaviour?
"Not that", at the very least ;)
In terms of the shutdown behaviour, I think it actually works out as-is.
For the normal case we haven't touched GBPA, so we are truly returning
to the boot-time condition; in the unexpected case where SMMUEN was
already enabled then we'll go into an explicit GPBA abort state, but
that seems a not-unreasonable compromise for not preserving the entire
boot-time Stream Table etc., whose presence kind of implies it wouldn't
have been bypassing everything anyway.
The more I look at the remaining aspect of disable_bypass for
controlling broken-DT behaviour the more I suspect it can't actually be
useful either way, especially not since default domains. I have no
memory of what my original reasoning might have been, so I'm inclined to
just rip that all out and let probe fail. I see no reason these days not
to expect a broken DT to leads to a broken system, especially not now
with DTSchema validation.
Then there's just the kdump warning it suppresses, of which I also have
no idea why it's there either, but apparently that one's on you :P
Cheers,
Robin.
Powered by blists - more mailing lists