lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 22 Jan 2013 17:29:54 -0600
From:	Suravee Suthikulanit <suravee.suthikulpanit@....com>
To:	Udo van den Heuvel <udovdh@...all.nl>
CC:	Boris Ostrovsky <boris.ostrovsky@....com>,
	Jacob Shin <jacob.shin@....com>,
	Borislav Petkov <bp@...en8.de>,
	Jörg Rödel <joro@...tes.org>,
	<linux-kernel@...r.kernel.org>
Subject: Re: 3.6.11  AMD-Vi: Completion-Wait loop timed out

On 1/22/2013 10:29 AM, Udo van den Heuvel wrote:

> On 2013-01-22 17:12, Boris Ostrovsky wrote:
>> Your BIOS does not have the required erratum workaround. We will provide
>> a patch to close that hole but since the problem is not easily
>> reproducible (and the erratum is also not easy to trigger) it may be
>> difficult to say whether it really helped with your problem.

Udo,

I sent out a patch (http://marc.info/?l=linux-kernel&m=135889686523524&w=2) which should implement
the workaround for AMD processor family15h model 10-1Fh erratum 746 in the IOMMU driver.
In your case, the output from "setpci -s 00:00.02 F4.w" is "0050" which tells me that BIOS doesn't
implement the work around. After patching, you should see the following message in "dmesg".

"AMD-Vi: Applying erratum 746 for IOMMU at 0000:00:00.2"

> Can we think of certain loads/actions/etc that could help trigger the issue?
> Then if reproducing is easier we can better say if stuff is actually
> fixed after the workaround.
>
> Udo

Looking at the original kernel message, it seems that the the kernel timed out while waiting for the IOMMU
to finish executing the "COMPLETION_WAIT" command.   In this particular case, it is issued as part of
"__domain_flush_pages()" while trying to send the "INVALIDATE_IOMMU_PAGE" command to the IOMMU but the command
buffer is getting full and the kernel needed to wait for the command buffer to free up.  However, the kernel
message did not exactly telling us what caused IOMMU to locked up in the first place.

According to my observation, high disk traffic workload should trigger large amount of "INVALIDATE_IOMMU_PAGE".
However, this doesn't automatically issuing "COMPLETION_WAIT" command.  The following patch slightly modify
the code to always issue "COMPLETION_WAIT" after every command.  This should help increasing the chance of reproducing
the issue.


diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index c1c74e0..d05b1f9 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -1016,6 +1016,7 @@ static int iommu_queue_command_sync(struct amd_iommu *iommu,
                                     struct iommu_cmd *cmd,
                                     bool sync)
  {
+#if 0
         u32 left, tail, head, next_tail;
         unsigned long flags;
  
@@ -1052,6 +1053,40 @@ again:
  
         spin_unlock_irqrestore(&iommu->lock, flags);
  
+#else
+       u32 tail;
+       unsigned long flags;
+
+       WARN_ON(iommu->cmd_buf_size & CMD_BUFFER_UNINITIALIZED);
+       printk (KERN_DEBUG "AMD-Vi: iommu_queue_command_sync: iommu_queue_command_sync"
+               " data[0]:%#x data[1]:%#x data[2]:%#x data[3]:%#x\n",
+               cmd->data[0], cmd->data[1], cmd->data[2], cmd->data[3] );
+
+       spin_lock_irqsave(&iommu->lock, flags);
+
+       tail = readl(iommu->mmio_base + MMIO_CMD_TAIL_OFFSET);
+       copy_cmd_to_buffer(iommu, cmd, tail);
+
+       spin_unlock_irqrestore(&iommu->lock, flags);
+
+       // Sending completion_wait command
+       {
+               struct iommu_cmd sync_cmd;
+               volatile u64 sem = 0;
+               int ret;
+
+               spin_lock_irqsave(&iommu->lock, flags);
+
+               tail = readl(iommu->mmio_base + MMIO_CMD_TAIL_OFFSET);
+               build_completion_wait(&sync_cmd, (u64)&sem);
+               copy_cmd_to_buffer(iommu, &sync_cmd, tail);
+
+               spin_unlock_irqrestore(&iommu->lock, flags);
+
+               if ((ret = wait_on_sem(&sem)) != 0)
+                       return ret;
+       }
+#endif
         return 0;
  }







--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ