lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <01c81b5e-d6a3-41fa-9758-37661e115483@amd.com>
Date: Tue, 28 Oct 2025 14:45:16 +0530
From: Vasant Hegde <vasant.hegde@....com>
To: Jörg Rödel <joro@...tes.org>,
 Dheeraj Kumar Srivastava <dheerajkumar.srivastava@....com>
Cc: will@...nel.org, robin.murphy@....com, iommu@...ts.linux.dev,
 linux-kernel@...r.kernel.org, suravee.suthikulpanit@....com,
 Santosh.Shukla@....com
Subject: Re: [PATCH] iommu/amd: Enhance "Completion-wait Time-out" error
 message

Joerg,


On 10/27/2025 6:19 PM, Jörg Rödel wrote:
> Hey Dheeraj,
> 
> On Thu, Oct 16, 2025 at 08:38:09PM +0530, Dheeraj Kumar Srivastava wrote:
>>  static int wait_on_sem(struct amd_iommu *iommu, u64 data)
>>  {
>> -	int i = 0;
>> +	struct iommu_cmd *cmd;
>> +	int i = 0, j;
>>  
>>  	while (*iommu->cmd_sem != data && i < LOOP_TIMEOUT) {
>>  		udelay(1);
>> @@ -1166,7 +1167,33 @@ static int wait_on_sem(struct amd_iommu *iommu, u64 data)
>>  	}
>>  
>>  	if (i == LOOP_TIMEOUT) {
>> -		pr_alert("Completion-Wait loop timed out\n");
>> +		int head, tail;
>> +
>> +		head = readl(iommu->mmio_base + MMIO_CMD_HEAD_OFFSET);
>> +		tail = readl(iommu->mmio_base + MMIO_CMD_TAIL_OFFSET);
>> +
>> +		pr_alert("IOMMU %04x:%02x:%02x.%01x: Completion-Wait loop timed out\n",
>> +			 iommu->pci_seg->id, PCI_BUS_NUM(iommu->devid),
>> +			 PCI_SLOT(iommu->devid), PCI_FUNC(iommu->devid));
> 
> Better use dev_err(&amd_iommu->dev->dev, ...) here.
> 
>> +		if (!amd_iommu_dump) {
>> +			/*
>> +			 * On command buffer completion timeout, step back by 2 commands
>> +			 * to locate the actual command that is causing the issue.
>> +			 */
>> +			tail = (MMIO_CMD_BUFFER_TAIL(tail) - 2) & (CMD_BUFFER_ENTRIES - 1);
>> +			cmd = (struct iommu_cmd *)(iommu->cmd_buf + tail * sizeof(*cmd));
>> +			dump_command(iommu_virt_to_phys(cmd));
>> +		} else {
>> +			/* Dump entire command buffer along with head and tail indices */
>> +			pr_alert("CMD Buffer head=%d tail=%d\n", (int)(MMIO_CMD_BUFFER_HEAD(head)),
>> +				 (int)(MMIO_CMD_BUFFER_TAIL(tail)));
>> +			for (j = 0; j < CMD_BUFFER_ENTRIES; j++) {
>> +				cmd = (struct iommu_cmd *)(iommu->cmd_buf + j * sizeof(*cmd));
>> +				pr_err("%3d: %08x %08x %08x %08x\n", j, cmd->data[0], cmd->data[1],
>> +				       cmd->data[2], cmd->data[3]);
>> +			}
>> +		}
> 
> I don't think it makes much sense to just print the command before the failed
> completion wait. In case of a timeout and amd_iommu_dump == true, just dump the
> whole pending command buffer, from head to tail.

We have debugfs support to extract entire command buffer. Also many cases once
we hit completion wait timeout, buffer won't progress.. and we will hit
completion wait repetitively. Hence in V2 he has removed printing entire command
buffer.


Do you want to log entire buffer once to dmesg if amd_iommu_dump=1 ? (for first
completion wait timeout event).


-Vasant


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ