[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <01c81b5e-d6a3-41fa-9758-37661e115483@amd.com>
Date: Tue, 28 Oct 2025 14:45:16 +0530
From: Vasant Hegde <vasant.hegde@....com>
To: Jörg Rödel <joro@...tes.org>,
Dheeraj Kumar Srivastava <dheerajkumar.srivastava@....com>
Cc: will@...nel.org, robin.murphy@....com, iommu@...ts.linux.dev,
linux-kernel@...r.kernel.org, suravee.suthikulpanit@....com,
Santosh.Shukla@....com
Subject: Re: [PATCH] iommu/amd: Enhance "Completion-wait Time-out" error
message
Joerg,
On 10/27/2025 6:19 PM, Jörg Rödel wrote:
> Hey Dheeraj,
>
> On Thu, Oct 16, 2025 at 08:38:09PM +0530, Dheeraj Kumar Srivastava wrote:
>> static int wait_on_sem(struct amd_iommu *iommu, u64 data)
>> {
>> - int i = 0;
>> + struct iommu_cmd *cmd;
>> + int i = 0, j;
>>
>> while (*iommu->cmd_sem != data && i < LOOP_TIMEOUT) {
>> udelay(1);
>> @@ -1166,7 +1167,33 @@ static int wait_on_sem(struct amd_iommu *iommu, u64 data)
>> }
>>
>> if (i == LOOP_TIMEOUT) {
>> - pr_alert("Completion-Wait loop timed out\n");
>> + int head, tail;
>> +
>> + head = readl(iommu->mmio_base + MMIO_CMD_HEAD_OFFSET);
>> + tail = readl(iommu->mmio_base + MMIO_CMD_TAIL_OFFSET);
>> +
>> + pr_alert("IOMMU %04x:%02x:%02x.%01x: Completion-Wait loop timed out\n",
>> + iommu->pci_seg->id, PCI_BUS_NUM(iommu->devid),
>> + PCI_SLOT(iommu->devid), PCI_FUNC(iommu->devid));
>
> Better use dev_err(&amd_iommu->dev->dev, ...) here.
>
>> + if (!amd_iommu_dump) {
>> + /*
>> + * On command buffer completion timeout, step back by 2 commands
>> + * to locate the actual command that is causing the issue.
>> + */
>> + tail = (MMIO_CMD_BUFFER_TAIL(tail) - 2) & (CMD_BUFFER_ENTRIES - 1);
>> + cmd = (struct iommu_cmd *)(iommu->cmd_buf + tail * sizeof(*cmd));
>> + dump_command(iommu_virt_to_phys(cmd));
>> + } else {
>> + /* Dump entire command buffer along with head and tail indices */
>> + pr_alert("CMD Buffer head=%d tail=%d\n", (int)(MMIO_CMD_BUFFER_HEAD(head)),
>> + (int)(MMIO_CMD_BUFFER_TAIL(tail)));
>> + for (j = 0; j < CMD_BUFFER_ENTRIES; j++) {
>> + cmd = (struct iommu_cmd *)(iommu->cmd_buf + j * sizeof(*cmd));
>> + pr_err("%3d: %08x %08x %08x %08x\n", j, cmd->data[0], cmd->data[1],
>> + cmd->data[2], cmd->data[3]);
>> + }
>> + }
>
> I don't think it makes much sense to just print the command before the failed
> completion wait. In case of a timeout and amd_iommu_dump == true, just dump the
> whole pending command buffer, from head to tail.
We have debugfs support to extract entire command buffer. Also many cases once
we hit completion wait timeout, buffer won't progress.. and we will hit
completion wait repetitively. Hence in V2 he has removed printing entire command
buffer.
Do you want to log entire buffer once to dmesg if amd_iommu_dump=1 ? (for first
completion wait timeout event).
-Vasant
Powered by blists - more mailing lists