[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d0f5fec8-5b37-4a83-ab4e-cd5528bf4057@amd.com>
Date: Thu, 17 Jul 2025 01:51:37 -0500
From: "Kalra, Ashish" <ashish.kalra@....com>
To: Vasant Hegde <vasant.hegde@....com>, joro@...tes.org,
suravee.suthikulpanit@....com, thomas.lendacky@....com,
Sairaj.ArunKodilkar@....com, herbert@...dor.apana.org.au
Cc: seanjc@...gle.com, pbonzini@...hat.com, will@...nel.org,
robin.murphy@....com, john.allen@....com, davem@...emloft.net, bp@...en8.de,
michael.roth@....com, iommu@...ts.linux.dev, linux-kernel@...r.kernel.org,
linux-crypto@...r.kernel.org, kvm@...r.kernel.org
Subject: Re: [PATCH v3 2/4] iommu/amd: Reuse device table for kdump
Hello Vasant and Sairaj,
On 7/17/2025 1:05 AM, Vasant Hegde wrote:
> Ashish,
>
> On 7/17/2025 3:37 AM, Kalra, Ashish wrote:
>> Hello Vasant,
>>
>> On 7/16/2025 4:42 AM, Vasant Hegde wrote:
>>>
>>>
>>> On 7/16/2025 12:57 AM, Ashish Kalra wrote:
>>>> From: Ashish Kalra <ashish.kalra@....com>
>>>>
>>>> After a panic if SNP is enabled in the previous kernel then the kdump
>>>> kernel boots with IOMMU SNP enforcement still enabled.
>>>>
>>>> IOMMU device table register is locked and exclusive to the previous
>>>> kernel. Attempts to copy old device table from the previous kernel
>>>> fails in kdump kernel as hardware ignores writes to the locked device
>>>> table base address register as per AMD IOMMU spec Section 2.12.2.1.
>>>>
>>>> This results in repeated "Completion-Wait loop timed out" errors and a
>>>> second kernel panic: "Kernel panic - not syncing: timer doesn't work
>>>> through Interrupt-remapped IO-APIC".
>>>>
>>>> Reuse device table instead of copying device table in case of kdump
>>>> boot and remove all copying device table code.
>>>>
>>>> Signed-off-by: Ashish Kalra <ashish.kalra@....com>
>>>> ---
>>>> drivers/iommu/amd/init.c | 97 ++++++++++++----------------------------
>>>> 1 file changed, 28 insertions(+), 69 deletions(-)
>>>>
>>>> diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
>>>> index 32295f26be1b..18bd869a82d9 100644
>>>> --- a/drivers/iommu/amd/init.c
>>>> +++ b/drivers/iommu/amd/init.c
>>>> @@ -406,6 +406,9 @@ static void iommu_set_device_table(struct amd_iommu *iommu)
>>>>
>>>> BUG_ON(iommu->mmio_base == NULL);
>>>>
>>>> + if (is_kdump_kernel())
>>>
>>> This is fine.. but its becoming too many places with kdump check! I don't know
>>> what is the better way here.
>>> Is it worth to keep it like this -OR- add say iommu ops that way during init we
>>> check is_kdump_kernel() and adjust the ops ?
>>>
>>> @Joerg, any preference?
>>>
>>>
>
> .../...
>
>>>> break;
>>>> }
>>>> @@ -2917,8 +2876,8 @@ static void early_enable_iommu(struct amd_iommu *iommu)
>>>> * This function finally enables all IOMMUs found in the system after
>>>> * they have been initialized.
>>>> *
>>>> - * Or if in kdump kernel and IOMMUs are all pre-enabled, try to copy
>>>> - * the old content of device table entries. Not this case or copy failed,
>>>> + * Or if in kdump kernel and IOMMUs are all pre-enabled, try to reuse
>>>> + * the old content of device table entries. Not this case or reuse failed,
>>>> * just continue as normal kernel does.
>>>> */
>>>> static void early_enable_iommus(void)
>>>> @@ -2926,18 +2885,18 @@ static void early_enable_iommus(void)
>>>> struct amd_iommu *iommu;
>>>> struct amd_iommu_pci_seg *pci_seg;
>>>>
>>>> - if (!copy_device_table()) {
>>>> + if (!reuse_device_table()) {
>>>
>>> Hmmm. What happens if SNP enabled and reuse_device_table() couldn't setup
>>> previous DTE?
>>> In non-SNP case it works fine as we can rebuild new DTE. But in SNP case we
>>> should fail the kdump right?
>>>
>>
>> Which will happen automatically, if we can't setup previous DTE for SNP case
>> then IOMMU commands will time-out and subsequenly cause a panic as IRQ remapping
>> won't be setup.
>
> But what is the point is proceeding when we know its going to fail? I think its
> better to fail here so that at least we know where/why it failed.
>
Yes that makes sense.
As Sairaj suggested, we can add a BUG_ON() if reuse_device_table() fails in case
of SNP enabled.
Thanks,
Ashish
Powered by blists - more mailing lists