lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <91572f4e-6607-41b8-91b6-146261393f07@amd.com>
Date: Thu, 17 Jul 2025 12:35:04 +0530
From: Vasant Hegde <vasant.hegde@....com>
To: "Kalra, Ashish" <ashish.kalra@....com>, joro@...tes.org,
 suravee.suthikulpanit@....com, thomas.lendacky@....com,
 Sairaj.ArunKodilkar@....com, herbert@...dor.apana.org.au
Cc: seanjc@...gle.com, pbonzini@...hat.com, will@...nel.org,
 robin.murphy@....com, john.allen@....com, davem@...emloft.net, bp@...en8.de,
 michael.roth@....com, iommu@...ts.linux.dev, linux-kernel@...r.kernel.org,
 linux-crypto@...r.kernel.org, kvm@...r.kernel.org
Subject: Re: [PATCH v3 1/4] iommu/amd: Add support to remap/unmap IOMMU
 buffers for kdump

Ashish,


On 7/17/2025 3:25 AM, Kalra, Ashish wrote:
> Hello Vasant,
> 
> On 7/16/2025 4:19 AM, Vasant Hegde wrote:
>> Hi Ashish,
>>
>>
>> On 7/16/2025 12:56 AM, Ashish Kalra wrote:
>>> From: Ashish Kalra <ashish.kalra@....com>
>>>
>>> After a panic if SNP is enabled in the previous kernel then the kdump
>>> kernel boots with IOMMU SNP enforcement still enabled.
>>>
>>> IOMMU completion wait buffers (CWBs), command buffers and event buffer
>>> registers remain locked and exclusive to the previous kernel. Attempts
>>> to allocate and use new buffers in the kdump kernel fail, as hardware
>>> ignores writes to the locked MMIO registers as per AMD IOMMU spec
>>> Section 2.12.2.1.
>>>
>>> This results in repeated "Completion-Wait loop timed out" errors and a
>>> second kernel panic: "Kernel panic - not syncing: timer doesn't work
>>> through Interrupt-remapped IO-APIC"
>>>
>>> The following MMIO registers are locked and ignore writes after failed
>>> SNP shutdown:
>>> Command Buffer Base Address Register
>>> Event Log Base Address Register
>>> Completion Store Base Register/Exclusion Base Register
>>> Completion Store Limit Register/Exclusion Limit Register
>>> As a result, the kdump kernel cannot initialize the IOMMU or enable IRQ
>>> remapping, which is required for proper operation.
>>
>> There are couple of other registers in locked list. Can you please rephrase
>> above paras?  Also you don't need to callout indivisual registers here. You can
>> just add link to IOMMU spec.
> 
> Yes i will drop listing the individual registers here and just provide the link
> to the IOMMU specs.

May be you can rephrase a bit so that its clear that there are many register
gets locked. This patch fixes few of them and following patches will fix
remaining ones.

> 
>>
>> Unrelated to this patch :
>>   I went to some of the SNP related code in IOMMU driver. One thing confused me
>> is in amd_iommu_snp_disable() code why Command buffer is not marked as shared?
>> any idea?
>>
> 
> Yes that's interesting. 
> 
> This is as per the SNP Firmware ABI specs: 
> 
> from SNP_INIT_EX: 
> 
> The firmware initializes the IOMMU to perform RMP enforcement. The firmware also transitions
> the event log, PPR log, and completion wait buffers of the IOMMU to an RMP page state that is 
> read only to the hypervisor and cannot be assigned to guests
> 
> So during SNP_SHUTDOWN_EX, transitioning these same buffers back to shared state.
> 
> But will investigate deeper and check why is command buffer not marked as FW/Reclaim state
> by firmware ? 

Sure.

> 
>>
>>>
>>> Reuse the pages of the previous kernel for completion wait buffers,
>>> command buffers, event buffers and memremap them during kdump boot
>>> and essentially work with an already enabled IOMMU configuration and
>>> re-using the previous kernel’s data structures.
>>>
>>> Reusing of command buffers and event buffers is now done for kdump boot
>>> irrespective of SNP being enabled during kdump.
>>>
>>> Re-use of completion wait buffers is only done when SNP is enabled as
>>> the exclusion base register is used for the completion wait buffer
>>> (CWB) address only when SNP is enabled.
>>>
>>> Signed-off-by: Ashish Kalra <ashish.kalra@....com>
>>> ---
>>>  drivers/iommu/amd/amd_iommu_types.h |   5 +
>>>  drivers/iommu/amd/init.c            | 163 ++++++++++++++++++++++++++--
>>>  drivers/iommu/amd/iommu.c           |   2 +-
>>>  3 files changed, 157 insertions(+), 13 deletions(-)
>>>
>>> diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h
>>> index 9b64cd706c96..082eb1270818 100644
>>> --- a/drivers/iommu/amd/amd_iommu_types.h
>>> +++ b/drivers/iommu/amd/amd_iommu_types.h
>>> @@ -791,6 +791,11 @@ struct amd_iommu {
>>>  	u32 flags;
>>>  	volatile u64 *cmd_sem;
>>>  	atomic64_t cmd_sem_val;
>>> +	/*
>>> +	 * Track physical address to directly use it in build_completion_wait()
>>> +	 * and avoid adding any special checks and handling for kdump.
>>> +	 */
>>> +	u64 cmd_sem_paddr;
>>
>> With this we are tracking both physical and virtual address? Is that really
>> needed? Can we just track PA and convert it into va?
>>
> 
> I believe it is simpler to keep/track cmd_sem and use it directly, instead of doing
> phys_to_virt() calls everytime before using it.
>  
>>>  
>>>  #ifdef CONFIG_AMD_IOMMU_DEBUGFS
>>>  	/* DebugFS Info */
>>> diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
>>> index cadb2c735ffc..32295f26be1b 100644
>>> --- a/drivers/iommu/amd/init.c
>>> +++ b/drivers/iommu/amd/init.c
>>> @@ -710,6 +710,23 @@ static void __init free_alias_table(struct amd_iommu_pci_seg *pci_seg)
>>>  	pci_seg->alias_table = NULL;
>>>  }
>>>  
>>> +static inline void *iommu_memremap(unsigned long paddr, size_t size)
>>> +{
>>> +	phys_addr_t phys;
>>> +
>>> +	if (!paddr)
>>> +		return NULL;
>>> +
>>> +	/*
>>> +	 * Obtain true physical address in kdump kernel when SME is enabled.
>>> +	 * Currently, IOMMU driver does not support booting into an unencrypted
>>> +	 * kdump kernel.
>>
>> You mean production kernel w/ SME and kdump kernel with non-SME is not supported?
>>
> 
> Yes. 

Then can you please rephrase above comment?

> 
>>
>>> +	 */
>>> +	phys = __sme_clr(paddr);
>>> +
>>> +	return ioremap_encrypted(phys, size);
>>
>> You are clearing C bit and then immediately remapping using encrypted mode. Also
>> existing code checks for C bit before calling ioremap_encrypted(). So I am not
>> clear why you do this.
>>
>>
> 
> We need to clear the C-bit to get the correct physical address for remapping.
> 
> Which existing code checks for C-bit before calling ioremap_encrypted() ?
> 
> After getting the correct physical address we call ioremap_encrypted() which
> which map it with C-bit enabled if SME is enabled or else it will map it 
> without C-bit (so it handles both SME and non-SME cases).
>  
> Earlier we used to check for CC_ATTR_HOST_MEM_ENCRYPT flag and if set 
> then call ioremap_encrypted() or otherwise call memremap(), but then
> as mentioned above ioremap_encrypted() works for both cases - SME or
> non-SME, hence we use that approach.

If you want to keep it in current way then it needs better comment. I'd say add
CC_ATTR_HOST_MEM_ENCRYPT check so that its easy to read.


> 
>>
>>> +}
>>> +
>>>  /*
>>>   * Allocates the command buffer. This buffer is per AMD IOMMU. We can
>>>   * write commands to that buffer later and the IOMMU will execute them
>>> @@ -942,8 +959,105 @@ static int iommu_init_ga_log(struct amd_iommu *iommu)
>>>  static int __init alloc_cwwb_sem(struct amd_iommu *iommu)
>>>  {
>>>  	iommu->cmd_sem = iommu_alloc_4k_pages(iommu, GFP_KERNEL, 1);
>>> +	if (!iommu->cmd_sem)
>>> +		return -ENOMEM;
>>> +	iommu->cmd_sem_paddr = iommu_virt_to_phys((void *)iommu->cmd_sem);
>>> +	return 0;
>>> +}
>>> +
>>> +static int __init remap_event_buffer(struct amd_iommu *iommu)
>>> +{
>>> +	u64 paddr;
>>> +
>>> +	pr_info_once("Re-using event buffer from the previous kernel\n");
>>> +	/*
>>> +	 * Read-back the event log base address register and apply
>>> +	 * PM_ADDR_MASK to obtain the event log base address.
>>> +	 */
>>> +	paddr = readq(iommu->mmio_base + MMIO_EVT_BUF_OFFSET) & PM_ADDR_MASK;
>>> +	iommu->evt_buf = iommu_memremap(paddr, EVT_BUFFER_SIZE);
>>> +
>>> +	return iommu->evt_buf ? 0 : -ENOMEM;
>>> +}
>>> +
>>> +static int __init remap_command_buffer(struct amd_iommu *iommu)
>>> +{
>>> +	u64 paddr;
>>> +
>>> +	pr_info_once("Re-using command buffer from the previous kernel\n");
>>> +	/*
>>> +	 * Read-back the command buffer base address register and apply
>>> +	 * PM_ADDR_MASK to obtain the command buffer base address.
>>> +	 */
>>> +	paddr = readq(iommu->mmio_base + MMIO_CMD_BUF_OFFSET) & PM_ADDR_MASK;
>>> +	iommu->cmd_buf = iommu_memremap(paddr, CMD_BUFFER_SIZE);
>>> +
>>> +	return iommu->cmd_buf ? 0 : -ENOMEM;
>>> +}
>>> +
>>> +static int __init remap_cwwb_sem(struct amd_iommu *iommu)
>>> +{
>>> +	u64 paddr;
>>> +
>>> +	if (check_feature(FEATURE_SNP)) {
>>> +		/*
>>> +		 * When SNP is enabled, the exclusion base register is used for the
>>> +		 * completion wait buffer (CWB) address. Read and re-use it.
>>> +		 */
>>> +		pr_info_once("Re-using CWB buffers from the previous kernel\n");
>>> +		/*
>>> +		 * Read-back the exclusion base register and apply PM_ADDR_MASK
>>> +		 * to obtain the exclusion range base address.
>>> +		 */
>>> +		paddr = readq(iommu->mmio_base + MMIO_EXCL_BASE_OFFSET) & PM_ADDR_MASK;
>>> +		iommu->cmd_sem = iommu_memremap(paddr, PAGE_SIZE);
>>> +		if (!iommu->cmd_sem)
>>> +			return -ENOMEM;
>>> +		iommu->cmd_sem_paddr = paddr;
>>> +	} else {
>>> +		return alloc_cwwb_sem(iommu);
>>
>> I understand this one is different from command/event buffer. But calling
>> function name as remap_*() and then allocating memory internally is bit odd.
>> Also this differs from previous functions.
>>
> 
> Yes i agree, but then what do we name it ?
> 
> remap_or_alloc_cwb_sem() does that sound Ok ?

May be.


-Vasant



Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ