lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <55A4B205-BC38-4F16-9EB9-54026C326E60@gmail.com>
Date:   Fri, 26 Mar 2021 21:36:46 -0700
From:   Nadav Amit <nadav.amit@...il.com>
To:     Lu Baolu <baolu.lu@...ux.intel.com>
Cc:     "Longpeng (Mike, Cloud Infrastructure Service Product Dept.)" 
        <longpeng2@...wei.com>, chenjiashang <chenjiashang@...wei.com>,
        "Tian, Kevin" <kevin.tian@...el.com>,
        "will@...nel.org" <will@...nel.org>,
        "iommu@...ts.linux-foundation.org" <iommu@...ts.linux-foundation.org>,
        LKML <linux-kernel@...r.kernel.org>,
        "alex.williamson@...hat.com" <alex.williamson@...hat.com>,
        "Gonglei (Arei)" <arei.gonglei@...wei.com>,
        David Woodhouse <dwmw2@...radead.org>
Subject: Re: A problem of Intel IOMMU hardware ?



> On Mar 26, 2021, at 7:31 PM, Lu Baolu <baolu.lu@...ux.intel.com> wrote:
> 
> Hi Nadav,
> 
> On 3/19/21 12:46 AM, Nadav Amit wrote:
>> So here is my guess:
>> Intel probably used as a basis for the IOTLB an implementation of
>> some other (regular) TLB design.
>> Intel SDM says regarding TLBs (4.10.4.2 “Recommended Invalidation”):
>> "Software wishing to prevent this uncertainty should not write to
>> a paging-structure entry in a way that would change, for any linear
>> address, both the page size and either the page frame, access rights,
>> or other attributes.”
>> Now the aforementioned uncertainty is a bit different (multiple
>> *valid*  translations of a single address). Yet, perhaps this is
>> yet another thing that might happen.
>> From a brief look on the handling of MMU (not IOMMU) hugepages
>> in Linux, indeed the PMD is first cleared and flushed before a
>> new valid PMD is set. This is possible for MMUs since they
>> allow the software to handle spurious page-faults gracefully.
>> This is not the case for the IOMMU though (without PRI).
>> Not sure this explains everything though. If that is the problem,
>> then during a mapping that changes page-sizes, a TLB flush is
>> needed, similarly to the one Longpeng did manually.
> 
> I have been working with Longpeng on this issue these days. It turned
> out that your guess is right. The PMD is first cleared but not flushed
> before a new valid one is set. The previous entry might be cached in the
> paging structure caches hence leads to disaster.
> 
> In __domain_mapping():
> 
> 2352                                 /*
> 2353                                  * Ensure that old small page tables are
> 2354                                  * removed to make room for superpage(s).
> 2355                                  * We're adding new large pages, so make sure
> 2356                                  * we don't remove their parent tables.
> 2357                                  */
> 2358                                 dma_pte_free_pagetable(domain, iov_pfn, end_pfn,
> 2359 largepage_lvl + 1);
> 
> I guess adding a cache flush operation after PMD switching should solve
> the problem.
> 
> I am still not clear about this comment:
> 
> "
> This is possible for MMUs since they allow the software to handle
> spurious page-faults gracefully. This is not the case for the IOMMU
> though (without PRI).
> "
> 
> Can you please shed more light on this?

I was looking at the code in more detail, and apparently my concern
is incorrect.

I was under the assumption that the IOMMU map/unmap can merge/split
(specifically split) huge-pages. For instance, if you map 2MB and
then unmap 4KB out of the 2MB, then you would split the hugepage
and keep the rest of the mappings alive. This is the way MMU is
usually managed. To my defense, I also saw such partial unmappings
in Longpeng’s first scenario.

If this was possible, then you would have a case in which out of 2MB
(for instance), 4KB were unmapped, and you need to split the 2MB
hugepage into 4KB pages. If you try to clear the PMD, flush, and then
set the PMD to point to table with live 4KB PTES, you can have
an interim state in which the PMD is not present. DMAs that arrive
at this stage might fault, and without PRI (and device support)
you do not have a way of restarting the DMA after the hugepage split
is completed.

Anyhow, this concern is apparently not relevant. I guess I was too
naive to assume the IOMMU management is similar to the MMU. I now
see that there is a comment in intel_iommu_unmap() saying:

        /* Cope with horrid API which requires us to unmap more than the
           size argument if it happens to be a large-page mapping. */

Regards,
Nadav

Download attachment "signature.asc" of type "application/pgp-signature" (834 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ