[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <84fb1f3d-1c92-4b15-8279-617046fe2b93@amd.com>
Date: Wed, 16 Jul 2025 16:55:51 -0500
From: "Kalra, Ashish" <ashish.kalra@....com>
To: Vasant Hegde <vasant.hegde@....com>, joro@...tes.org,
suravee.suthikulpanit@....com, thomas.lendacky@....com,
Sairaj.ArunKodilkar@....com, herbert@...dor.apana.org.au
Cc: seanjc@...gle.com, pbonzini@...hat.com, will@...nel.org,
robin.murphy@....com, john.allen@....com, davem@...emloft.net, bp@...en8.de,
michael.roth@....com, iommu@...ts.linux.dev, linux-kernel@...r.kernel.org,
linux-crypto@...r.kernel.org, kvm@...r.kernel.org
Subject: Re: [PATCH v3 1/4] iommu/amd: Add support to remap/unmap IOMMU
buffers for kdump
Hello Vasant,
On 7/16/2025 4:19 AM, Vasant Hegde wrote:
> Hi Ashish,
>
>
> On 7/16/2025 12:56 AM, Ashish Kalra wrote:
>> From: Ashish Kalra <ashish.kalra@....com>
>>
>> After a panic if SNP is enabled in the previous kernel then the kdump
>> kernel boots with IOMMU SNP enforcement still enabled.
>>
>> IOMMU completion wait buffers (CWBs), command buffers and event buffer
>> registers remain locked and exclusive to the previous kernel. Attempts
>> to allocate and use new buffers in the kdump kernel fail, as hardware
>> ignores writes to the locked MMIO registers as per AMD IOMMU spec
>> Section 2.12.2.1.
>>
>> This results in repeated "Completion-Wait loop timed out" errors and a
>> second kernel panic: "Kernel panic - not syncing: timer doesn't work
>> through Interrupt-remapped IO-APIC"
>>
>> The following MMIO registers are locked and ignore writes after failed
>> SNP shutdown:
>> Command Buffer Base Address Register
>> Event Log Base Address Register
>> Completion Store Base Register/Exclusion Base Register
>> Completion Store Limit Register/Exclusion Limit Register
>> As a result, the kdump kernel cannot initialize the IOMMU or enable IRQ
>> remapping, which is required for proper operation.
>
> There are couple of other registers in locked list. Can you please rephrase
> above paras? Also you don't need to callout indivisual registers here. You can
> just add link to IOMMU spec.
Yes i will drop listing the individual registers here and just provide the link
to the IOMMU specs.
>
> Unrelated to this patch :
> I went to some of the SNP related code in IOMMU driver. One thing confused me
> is in amd_iommu_snp_disable() code why Command buffer is not marked as shared?
> any idea?
>
Yes that's interesting.
This is as per the SNP Firmware ABI specs:
from SNP_INIT_EX:
The firmware initializes the IOMMU to perform RMP enforcement. The firmware also transitions
the event log, PPR log, and completion wait buffers of the IOMMU to an RMP page state that is
read only to the hypervisor and cannot be assigned to guests
So during SNP_SHUTDOWN_EX, transitioning these same buffers back to shared state.
But will investigate deeper and check why is command buffer not marked as FW/Reclaim state
by firmware ?
>
>>
>> Reuse the pages of the previous kernel for completion wait buffers,
>> command buffers, event buffers and memremap them during kdump boot
>> and essentially work with an already enabled IOMMU configuration and
>> re-using the previous kernel’s data structures.
>>
>> Reusing of command buffers and event buffers is now done for kdump boot
>> irrespective of SNP being enabled during kdump.
>>
>> Re-use of completion wait buffers is only done when SNP is enabled as
>> the exclusion base register is used for the completion wait buffer
>> (CWB) address only when SNP is enabled.
>>
>> Signed-off-by: Ashish Kalra <ashish.kalra@....com>
>> ---
>> drivers/iommu/amd/amd_iommu_types.h | 5 +
>> drivers/iommu/amd/init.c | 163 ++++++++++++++++++++++++++--
>> drivers/iommu/amd/iommu.c | 2 +-
>> 3 files changed, 157 insertions(+), 13 deletions(-)
>>
>> diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h
>> index 9b64cd706c96..082eb1270818 100644
>> --- a/drivers/iommu/amd/amd_iommu_types.h
>> +++ b/drivers/iommu/amd/amd_iommu_types.h
>> @@ -791,6 +791,11 @@ struct amd_iommu {
>> u32 flags;
>> volatile u64 *cmd_sem;
>> atomic64_t cmd_sem_val;
>> + /*
>> + * Track physical address to directly use it in build_completion_wait()
>> + * and avoid adding any special checks and handling for kdump.
>> + */
>> + u64 cmd_sem_paddr;
>
> With this we are tracking both physical and virtual address? Is that really
> needed? Can we just track PA and convert it into va?
>
I believe it is simpler to keep/track cmd_sem and use it directly, instead of doing
phys_to_virt() calls everytime before using it.
>>
>> #ifdef CONFIG_AMD_IOMMU_DEBUGFS
>> /* DebugFS Info */
>> diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
>> index cadb2c735ffc..32295f26be1b 100644
>> --- a/drivers/iommu/amd/init.c
>> +++ b/drivers/iommu/amd/init.c
>> @@ -710,6 +710,23 @@ static void __init free_alias_table(struct amd_iommu_pci_seg *pci_seg)
>> pci_seg->alias_table = NULL;
>> }
>>
>> +static inline void *iommu_memremap(unsigned long paddr, size_t size)
>> +{
>> + phys_addr_t phys;
>> +
>> + if (!paddr)
>> + return NULL;
>> +
>> + /*
>> + * Obtain true physical address in kdump kernel when SME is enabled.
>> + * Currently, IOMMU driver does not support booting into an unencrypted
>> + * kdump kernel.
>
> You mean production kernel w/ SME and kdump kernel with non-SME is not supported?
>
Yes.
>
>> + */
>> + phys = __sme_clr(paddr);
>> +
>> + return ioremap_encrypted(phys, size);
>
> You are clearing C bit and then immediately remapping using encrypted mode. Also
> existing code checks for C bit before calling ioremap_encrypted(). So I am not
> clear why you do this.
>
>
We need to clear the C-bit to get the correct physical address for remapping.
Which existing code checks for C-bit before calling ioremap_encrypted() ?
After getting the correct physical address we call ioremap_encrypted() which
which map it with C-bit enabled if SME is enabled or else it will map it
without C-bit (so it handles both SME and non-SME cases).
Earlier we used to check for CC_ATTR_HOST_MEM_ENCRYPT flag and if set
then call ioremap_encrypted() or otherwise call memremap(), but then
as mentioned above ioremap_encrypted() works for both cases - SME or
non-SME, hence we use that approach.
>
>> +}
>> +
>> /*
>> * Allocates the command buffer. This buffer is per AMD IOMMU. We can
>> * write commands to that buffer later and the IOMMU will execute them
>> @@ -942,8 +959,105 @@ static int iommu_init_ga_log(struct amd_iommu *iommu)
>> static int __init alloc_cwwb_sem(struct amd_iommu *iommu)
>> {
>> iommu->cmd_sem = iommu_alloc_4k_pages(iommu, GFP_KERNEL, 1);
>> + if (!iommu->cmd_sem)
>> + return -ENOMEM;
>> + iommu->cmd_sem_paddr = iommu_virt_to_phys((void *)iommu->cmd_sem);
>> + return 0;
>> +}
>> +
>> +static int __init remap_event_buffer(struct amd_iommu *iommu)
>> +{
>> + u64 paddr;
>> +
>> + pr_info_once("Re-using event buffer from the previous kernel\n");
>> + /*
>> + * Read-back the event log base address register and apply
>> + * PM_ADDR_MASK to obtain the event log base address.
>> + */
>> + paddr = readq(iommu->mmio_base + MMIO_EVT_BUF_OFFSET) & PM_ADDR_MASK;
>> + iommu->evt_buf = iommu_memremap(paddr, EVT_BUFFER_SIZE);
>> +
>> + return iommu->evt_buf ? 0 : -ENOMEM;
>> +}
>> +
>> +static int __init remap_command_buffer(struct amd_iommu *iommu)
>> +{
>> + u64 paddr;
>> +
>> + pr_info_once("Re-using command buffer from the previous kernel\n");
>> + /*
>> + * Read-back the command buffer base address register and apply
>> + * PM_ADDR_MASK to obtain the command buffer base address.
>> + */
>> + paddr = readq(iommu->mmio_base + MMIO_CMD_BUF_OFFSET) & PM_ADDR_MASK;
>> + iommu->cmd_buf = iommu_memremap(paddr, CMD_BUFFER_SIZE);
>> +
>> + return iommu->cmd_buf ? 0 : -ENOMEM;
>> +}
>> +
>> +static int __init remap_cwwb_sem(struct amd_iommu *iommu)
>> +{
>> + u64 paddr;
>> +
>> + if (check_feature(FEATURE_SNP)) {
>> + /*
>> + * When SNP is enabled, the exclusion base register is used for the
>> + * completion wait buffer (CWB) address. Read and re-use it.
>> + */
>> + pr_info_once("Re-using CWB buffers from the previous kernel\n");
>> + /*
>> + * Read-back the exclusion base register and apply PM_ADDR_MASK
>> + * to obtain the exclusion range base address.
>> + */
>> + paddr = readq(iommu->mmio_base + MMIO_EXCL_BASE_OFFSET) & PM_ADDR_MASK;
>> + iommu->cmd_sem = iommu_memremap(paddr, PAGE_SIZE);
>> + if (!iommu->cmd_sem)
>> + return -ENOMEM;
>> + iommu->cmd_sem_paddr = paddr;
>> + } else {
>> + return alloc_cwwb_sem(iommu);
>
> I understand this one is different from command/event buffer. But calling
> function name as remap_*() and then allocating memory internally is bit odd.
> Also this differs from previous functions.
>
Yes i agree, but then what do we name it ?
remap_or_alloc_cwb_sem() does that sound Ok ?
Thanks,
Ashish
>> + }
>> +
>> + return 0;
>> +}
>> +
>> +static int __init alloc_iommu_buffers(struct amd_iommu *iommu)
>> +{
>> + int ret;
>> +
>> + /*
>> + * IOMMU Completion Store Base MMIO, Command Buffer Base Address MMIO
>> + * registers are locked if SNP is enabled during kdump, reuse/remap
>
> Redudant explaination because implementation is going to support non-SNP
> scenario as well.
>
>
> -Vasant
>
>
Powered by blists - more mailing lists