linux-kernel - Re: 3.6.11 AMD-Vi: Completion-Wait loop timed out

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <50FF20F2.9090503@amd.com>
Date:	Tue, 22 Jan 2013 17:29:54 -0600
From:	Suravee Suthikulanit <suravee.suthikulpanit@....com>
To:	Udo van den Heuvel <udovdh@...all.nl>
CC:	Boris Ostrovsky <boris.ostrovsky@....com>,
	Jacob Shin <jacob.shin@....com>,
	Borislav Petkov <bp@...en8.de>,
	Jörg Rödel <joro@...tes.org>,
	<linux-kernel@...r.kernel.org>
Subject: Re: 3.6.11  AMD-Vi: Completion-Wait loop timed out

On 1/22/2013 10:29 AM, Udo van den Heuvel wrote:

> On 2013-01-22 17:12, Boris Ostrovsky wrote:
>> Your BIOS does not have the required erratum workaround. We will provide
>> a patch to close that hole but since the problem is not easily
>> reproducible (and the erratum is also not easy to trigger) it may be
>> difficult to say whether it really helped with your problem.

Udo,

I sent out a patch (http://marc.info/?l=linux-kernel&m=135889686523524&w=2) which should implement
the workaround for AMD processor family15h model 10-1Fh erratum 746 in the IOMMU driver.
In your case, the output from "setpci -s 00:00.02 F4.w" is "0050" which tells me that BIOS doesn't
implement the work around. After patching, you should see the following message in "dmesg".

"AMD-Vi: Applying erratum 746 for IOMMU at 0000:00:00.2"

> Can we think of certain loads/actions/etc that could help trigger the issue?
> Then if reproducing is easier we can better say if stuff is actually
> fixed after the workaround.
>
> Udo

Looking at the original kernel message, it seems that the the kernel timed out while waiting for the IOMMU
to finish executing the "COMPLETION_WAIT" command.   In this particular case, it is issued as part of
"__domain_flush_pages()" while trying to send the "INVALIDATE_IOMMU_PAGE" command to the IOMMU but the command
buffer is getting full and the kernel needed to wait for the command buffer to free up.  However, the kernel
message did not exactly telling us what caused IOMMU to locked up in the first place.

According to my observation, high disk traffic workload should trigger large amount of "INVALIDATE_IOMMU_PAGE".
However, this doesn't automatically issuing "COMPLETION_WAIT" command.  The following patch slightly modify
the code to always issue "COMPLETION_WAIT" after every command.  This should help increasing the chance of reproducing
the issue.

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index c1c74e0..d05b1f9 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -1016,6 +1016,7 @@ static int iommu_queue_command_sync(struct amd_iommu *iommu,
                                     struct iommu_cmd *cmd,
                                     bool sync)
  {
+#if 0
         u32 left, tail, head, next_tail;
         unsigned long flags;

@@ -1052,6 +1053,40 @@ again:

         spin_unlock_irqrestore(&iommu->lock, flags);

+#else
+       u32 tail;
+       unsigned long flags;
+
+       WARN_ON(iommu->cmd_buf_size & CMD_BUFFER_UNINITIALIZED);
+       printk (KERN_DEBUG "AMD-Vi: iommu_queue_command_sync: iommu_queue_command_sync"
+               " data[0]:%#x data[1]:%#x data[2]:%#x data[3]:%#x\n",
+               cmd->data[0], cmd->data[1], cmd->data[2], cmd->data[3] );
+
+       spin_lock_irqsave(&iommu->lock, flags);
+
+       tail = readl(iommu->mmio_base + MMIO_CMD_TAIL_OFFSET);
+       copy_cmd_to_buffer(iommu, cmd, tail);
+
+       spin_unlock_irqrestore(&iommu->lock, flags);
+
+       // Sending completion_wait command
+       {
+               struct iommu_cmd sync_cmd;
+               volatile u64 sem = 0;
+               int ret;
+
+               spin_lock_irqsave(&iommu->lock, flags);
+
+               tail = readl(iommu->mmio_base + MMIO_CMD_TAIL_OFFSET);
+               build_completion_wait(&sync_cmd, (u64)&sem);
+               copy_cmd_to_buffer(iommu, &sync_cmd, tail);
+
+               spin_unlock_irqrestore(&iommu->lock, flags);
+
+               if ((ret = wait_on_sem(&sem)) != 0)
+                       return ret;
+       }
+#endif
         return 0;
  }

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/