lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <AD4B8103-C98F-4BC5-9F7F-496D220781F3@bootc.net>
Date:	Sat, 17 Sep 2011 12:57:42 +0100
From:	Chris Boot <bootc@...tc.net>
To:	"Woodhouse, David" <david.woodhouse@...el.com>
Cc:	lkml <linux-kernel@...r.kernel.org>
Subject: Re: iommu_iova leak

On 17 Sep 2011, at 11:45, Woodhouse, David wrote:
> On Fri, 2011-09-16 at 13:43 +0100, Chris Boot wrote:
>> In the very short term the number is up and down by a few hundred 
>> objects but the general trend is constantly upwards. After about 5 days' 
>> uptime I have some very serious IO slowdowns (narrowed down by a friend 
>> to SCSI command queueing) with a lot of time spent in
>> alloc_iova() and rb_prev() according to 'perf top'. Eventually these 
>> translate into softlockups and the machine becomes almost unusable.
> 
> If you're seeing it spend ages in rb_prev() that implies that the
> mappings are still *active* and in the rbtree, rather than just the the
> iommu_iova data structure has been leaked.
> 
> I suppose it's vaguely possible that we're leaking them in such a way
> that they remain on the rbtree, perhaps if the deferred unmap is never
> actually happening... but I think it's a whole lot more likely that the
> PCI driver is just never bothering to unmap the pages it maps.
> 
> If you boot with 'intel_iommu=strict' that will avoid the deferred unmap
> which is the only likely culprit in the IOMMU code...


Booting with intel_iommu=on,strict still shows the iommu_iova on a constant increase, so I don't think it's that.

I've bodged the following patch to see if it catches anything obvious. We'll see if anything useful comes of it. Sorry, my mail client kills whitespace.

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index c621c98..aebbd56 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -2724,6 +2724,7 @@ static dma_addr_t __intel_map_single(struct device *hwdev, phys_addr_t paddr,
        int ret;
        struct intel_iommu *iommu;
        unsigned long paddr_pfn = paddr >> PAGE_SHIFT;
+       int dma_map_count;
 
        BUG_ON(dir == DMA_NONE);
 
@@ -2761,6 +2762,9 @@ static dma_addr_t __intel_map_single(struct device *hwdev, phys_addr_t paddr,
        if (ret)
                goto error;
 
+       dma_map_count = atomic_inc_return(&pdev->dma_map_count);
+       WARN_ON((dma_map_count > 2000) && !(dma_map_count % 1000));
+
        /* it's a non-present to present mapping. Only flush if caching mode */
        if (cap_caching_mode(iommu->cap))
                iommu_flush_iotlb_psi(iommu, domain->id, mm_to_dma_pfn(iova->pfn_lo), size, 1);
@@ -2892,6 +2896,7 @@ static void intel_unmap_page(struct device *dev, dma_addr_t dev_addr,
 
        pr_debug("Device %s unmapping: pfn %lx-%lx\n",
                 pci_name(pdev), start_pfn, last_pfn);
+       atomic_dec(&pdev->dma_map_count);
 
        /*  clear the whole page */
        dma_pte_clear_range(domain, start_pfn, last_pfn);
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index f3f94a5..cb1e86b 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -1108,6 +1108,7 @@ struct pci_dev *alloc_pci_dev(void)
                return NULL;
 
        INIT_LIST_HEAD(&dev->bus_list);
+       atomic_set(&dev->dma_map_count, 0);
 
        return dev;
 }
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 8c230cb..d431f39 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -331,6 +331,7 @@ struct pci_dev {
        int rom_attr_enabled;           /* has display of the rom attribute been enabled? */
        struct bin_attribute *res_attr[DEVICE_COUNT_RESOURCE]; /* sysfs file for resources */
        struct bin_attribute *res_attr_wc[DEVICE_COUNT_RESOURCE]; /* sysfs file for WC mapping of resources */
+       atomic_t        dma_map_count;
 #ifdef CONFIG_PCI_MSI
        struct list_head msi_list;
 #endif

-- 
Chris Boot
bootc@...tc.net

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ