lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 18 May 2020 17:06:45 +0800
From:   Kai-Heng Feng <kai.heng.feng@...onical.com>
To:     jroedel@...e.de
Cc:     iommu@...ts.linux-foundation.org,
        open list <linux-kernel@...r.kernel.org>
Subject: [Regression] "iommu/amd: Relax locking in dma_ops path" makes tg3
 ethernet transmit queue timeout

Hi,

Broadcom ethernet tg3 unusable after commit 92d420ec028d ("iommu/amd: Relax locking in dma_ops path").
After a short period it stops:
[  122.717144] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:303 dev_watchdog+0x237/0x240()
[  122.717152] NETDEV WATCHDOG: enp3s0 (tg3): transmit queue 0 timed out

After testing the patch section by section, this is the part that caused the regression:

@@ -2578,19 +2580,8 @@ static dma_addr_t map_page(struct device *dev, struct page *page,
 
        dma_mask = *dev->dma_mask;
 
-       spin_lock_irqsave(&domain->lock, flags);
-
-       addr = __map_single(dev, domain->priv, paddr, size, dir, false,
+       return __map_single(dev, domain->priv, paddr, size, dir, false,
                            dma_mask);
-       if (addr == DMA_ERROR_CODE)
-               goto out;
-
-       domain_flush_complete(domain);
-
-out:
-       spin_unlock_irqrestore(&domain->lock, flags);
-
-       return addr;
 }

Particularly, as soon as the spinlock is removed, the issue can be reproduced.
Function domain_flush_complete() doesn't seem to affect the status.

However, the .map_page callback was removed by be62dbf554c5 ("iommu/amd: Convert AMD iommu driver to the dma-iommu api"), so there's no easy revert for this issue.

This is still reproducible as of today's mainline kernel, v5.7-rc6.

Kai-Heng

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ