lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 3 Jul 2023 07:00:25 +0000
From:   "Tian, Kevin" <kevin.tian@...el.com>
To:     "Liu, Jingqi" <jingqi.liu@...el.com>,
        "iommu@...ts.linux.dev" <iommu@...ts.linux.dev>,
        Lu Baolu <baolu.lu@...ux.intel.com>,
        David Woodhouse <dwmw2@...radead.org>,
        Joerg Roedel <joro@...tes.org>, Will Deacon <will@...nel.org>,
        Robin Murphy <robin.murphy@....com>
CC:     "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: RE: [PATCH] iommu/vt-d: debugfs: Increment the reference count of
 page table page

> From: Liu, Jingqi <jingqi.liu@...el.com>
> Sent: Sunday, June 25, 2023 10:28 AM
> 
> There may be a race with iommu_unmap() interface when traversing a page
> table.
> 
> When debugfs traverses an IOMMU page table, iommu_unmap() may clear
> entries and free the page table pages pointed to by the entries.
> So debugfs may read invalid or freed pages.
> 
> To avoid this, increment the refcount of a page table page before
> traversing the page, and decrement its refcount after traversing it.

I'm not sure how this race can be fully avoided w/o cooperation in the
unmap path...

> 
> Signed-off-by: Jingqi Liu <Jingqi.liu@...el.com>
> ---
>  drivers/iommu/intel/debugfs.c | 36 +++++++++++++++++++++++++++++++++-
> -
>  1 file changed, 34 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/iommu/intel/debugfs.c b/drivers/iommu/intel/debugfs.c
> index 1f925285104e..d228e1580aec 100644
> --- a/drivers/iommu/intel/debugfs.c
> +++ b/drivers/iommu/intel/debugfs.c
> @@ -333,9 +333,41 @@ static void pgtable_walk_level(struct seq_file *m,
> struct dma_pte *pde,
>  		path[level] = pde->val;
>  		if (dma_pte_superpage(pde) || level == 1)
>  			dump_page_info(m, start, path);
> -		else
> -			pgtable_walk_level(m,
> phys_to_virt(dma_pte_addr(pde)),
> +		else {
> +			struct page *pg;
> +			u64 pte_addr;
> +
> +			/*
> +			 * The entry references a Page-Directory Table
> +			 * or a Page Table.
> +			 */
> +retry:
> +			pte_addr = dma_pte_addr(pde);
> +			pg = pfn_to_page(pte_addr >> PAGE_SHIFT);
> +			if (!get_page_unless_zero(pg))
> +				/*
> +				 * If this page has a refcount of zero,
> +				 * it has been freed, or will be freed.
> +				 */
> +				continue;
> +
> +			/* Check if the value of the entry is changed. */
> +			if (pde->val != path[level]) {
> +				put_page(pg);
> +
> +				if (!dma_pte_present(pde))
> +					/* The entry is invalid. Skip it. */
> +					continue;
> +
> +				/* The entry has been updated. */
> +				path[level] = pde->val;
> +				goto retry;
> +			}
> +
> +			pgtable_walk_level(m, phys_to_virt(pte_addr),
>  					   level - 1, start, path);

What about pde->val getting cleared after phys_to_virt(pte_addr) leading
to all the levels below 'pg' being freed? In that case this code still walks
the stale 'pg' content which however all point to invalid pages then.

It's probably simpler to just check the format of each PTE (e.g. whether 
any reserved bit is set) and if abnormal then break the current level of
walk. 

> +			put_page(pg);
> +		}
>  		path[level] = 0;
>  	}
>  }
> --
> 2.21.3

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ