lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:   Sun,  3 Sep 2023 22:40:50 +0800
From:   Jingqi Liu <Jingqi.liu@...el.com>
To:     iommu@...ts.linux.dev, Lu Baolu <baolu.lu@...ux.intel.com>,
        Tian Kevin <kevin.tian@...el.com>,
        Joerg Roedel <joro@...tes.org>, Will Deacon <will@...nel.org>,
        Robin Murphy <robin.murphy@....com>
Cc:     linux-kernel@...r.kernel.org, Jingqi Liu <Jingqi.liu@...el.com>
Subject: [PATCH] iommu/vt-d: debugfs: Fix race with iommu unmap when traversing

When traversing page table, it may race with iommu unmap.

For the page table page pointed to by a PDPE/PDE, there are three
scenarios in the iommu unmap path.

  1) The page has been freed.

     If the page has a refcount of zero, it has been freed. The
     debugfs should avoid to traverse it.

     In the debugfs, the refcount of a page table page is checked
     before traversing it. If its refcount is zero, the page will not
     be traversed. If the refcount is not zero, increment its refcount
     before traversal and decrement its refcount after traversal.

  2) The page is not freed and the PDPE/PDE is not cleared.

     When the page is unlinked from its parent, it's not freed
     immediately. It will be freed as soon as the IOTLB is flushed.

     Before the page is freed, the contents in the page and all the
     pages under its level are stale. The debugfs should avoid to
     traverse these pages.

     According the VT-d specification, bit 3 of the PDPE/PTE pointing
     to a page directory/table is ignored by the hardware. The driver
     sets it to 0 by default. In the unmap path, set bit 3 to 1 to
     indicate that the page pointed to by the PDPE/PDE is stale.

     Check bit 3 of the PDPE/PDE before traversing the page pointed to
     by the PDPE/PDE in the debugfs. Traversal stops if bit 3 is 1.

  3) The page is not freed and the PDPE/PDE has been cleared.

     Check the preseent bit of the PDPE/PDE in the debugfs. The
     traversal stops if the present bit is cleared.

Suggested-by: Kevin Tian <kevin.tian@...el.com>
Signed-off-by: Jingqi Liu <Jingqi.liu@...el.com>
---
 drivers/iommu/intel/debugfs.c | 47 +++++++++++++++++++++++++++++++++--
 drivers/iommu/intel/iommu.c   | 16 ++++++++++++
 2 files changed, 61 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/intel/debugfs.c b/drivers/iommu/intel/debugfs.c
index 1f925285104e..e12f1ed88e65 100644
--- a/drivers/iommu/intel/debugfs.c
+++ b/drivers/iommu/intel/debugfs.c
@@ -333,9 +333,52 @@ static void pgtable_walk_level(struct seq_file *m, struct dma_pte *pde,
 		path[level] = pde->val;
 		if (dma_pte_superpage(pde) || level == 1)
 			dump_page_info(m, start, path);
-		else
-			pgtable_walk_level(m, phys_to_virt(dma_pte_addr(pde)),
+		else {
+			/*
+			 * The entry references a Page-Directory Table
+			 * or a Page Table.
+			 */
+			struct page *pg;
+			u64 pte_addr;
+
+retry:
+			pte_addr = dma_pte_addr(pde);
+			pg = pfn_to_page(PFN_DOWN(pte_addr));
+			if (!get_page_unless_zero(pg))
+				/*
+				 * If this page has a refcount of zero,
+				 * it has been freed, or will be freed.
+				 */
+				continue;
+
+			/*
+			 * Check if the page that pointed to by the PDE is
+			 * stale. Bit 3 of the PDE is ignored by hardware.
+			 * If the page is stale, bit 3 is already set to 1
+			 * in the unmap path.
+			 */
+			if (pde->val & BIT_ULL(3)) {
+				put_page(pg);
+				continue;
+			}
+
+			/* Check if the value of this entry is changed. */
+			if (pde->val != path[level]) {
+				put_page(pg);
+
+				if (!dma_pte_present(pde))
+					/* This entry is invalid. Skip it. */
+					continue;
+
+				/* The entry has been updated. */
+				path[level] = pde->val;
+				goto retry;
+			}
+
+			pgtable_walk_level(m, phys_to_virt(pte_addr),
 					   level - 1, start, path);
+			put_page(pg);
+		}
 		path[level] = 0;
 	}
 }
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index b871a6afd803..6775b6bdfa3d 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -1103,6 +1103,22 @@ static void dma_pte_list_pagetables(struct dmar_domain *domain,
 	pg = pfn_to_page(dma_pte_addr(pte) >> PAGE_SHIFT);
 	list_add_tail(&pg->lru, freelist);
 
+	/*
+	 * When the page pointed to by a PDPE/PDE is unlinked from its parent,
+	 * it's not freed immediately. It will be freed as soon as the IOTLB
+	 * is flushed. Before the page is freed, another process context (e.g.
+	 * debugfs) may still traverse it. But the content of the page is no
+	 * longer valid.
+	 * According the VT-d specification, bit 3 of the PDPE/PTE pointing to
+	 * a page directory/table is ignored by the hardware. The driver sets
+	 * it to 0 by default. To avoid the unnecessary page walks, set bit 3
+	 * to 1 to indicate that the page pointed to by the PDPE/PDE is stale.
+	 * Then in another process context, bit 3 of the PDPE/PDE is checked
+	 * before traversing the page pointed to by the PDPE/PDE. Traversal
+	 * stops if bit 3 is 1.
+	 */
+	pte->val |= BIT_ULL(3);
+
 	if (level == 1)
 		return;
 
-- 
2.21.3

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ