lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <692186fd-42b8-4054-ead2-f6c6b1bf5b2d@linux.intel.com>
Date:   Wed, 17 Mar 2021 13:16:58 +0800
From:   Lu Baolu <baolu.lu@...ux.intel.com>
To:     "Longpeng (Mike, Cloud Infrastructure Service Product Dept.)" 
        <longpeng2@...wei.com>, dwmw2@...radead.org, joro@...tes.org,
        will@...nel.org, alex.williamson@...hat.com
Cc:     baolu.lu@...ux.intel.com, iommu@...ts.linux-foundation.org,
        LKML <linux-kernel@...r.kernel.org>,
        "Gonglei (Arei)" <arei.gonglei@...wei.com>, chenjiashang@...wei.com
Subject: Re: A problem of Intel IOMMU hardware ?

Hi Longpeng,

On 3/17/21 11:16 AM, Longpeng (Mike, Cloud Infrastructure Service 
Product Dept.) wrote:
> Hi guys,
> 
> We find the Intel iommu cache (i.e. iotlb) maybe works wrong in a special
> situation, it would cause DMA fails or get wrong data.
> 
> The reproducer (based on Alex's vfio testsuite[1]) is in attachment, it can
> reproduce the problem with high probability (~50%).
> 
> The machine we used is:
> processor	: 47
> vendor_id	: GenuineIntel
> cpu family	: 6
> model		: 85
> model name	: Intel(R) Xeon(R) Gold 6146 CPU @ 3.20GHz
> stepping	: 4
> microcode	: 0x2000069
> 
> And the iommu capability reported is:
> ver 1:0 cap 8d2078c106f0466 ecap f020df
> (caching mode = 0 , page-selective invalidation = 1)
> 
> (The problem is also on 'Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz' and
> 'Intel(R) Xeon(R) Platinum 8378A CPU @ 3.00GHz')
> 
> We run the reproducer on Linux 4.18 and it works as follow:
> 
> Step 1. alloc 4G *2M-hugetlb* memory (N.B. no problem with 4K-page mapping)

I don't understand 2M-hugetlb here means exactly. The IOMMU hardware
supports both 2M and 1G super page. The mapping physical memory is 4G.
Why couldn't it use 1G super page?

> Step 2. DMA Map 4G memory
> Step 3.
>      while (1) {
>          {UNMAP, 0x0, 0xa0000}, ------------------------------------ (a)
>          {UNMAP, 0xc0000, 0xbff40000},

Have these two ranges been mapped before? Does the IOMMU driver
complains when you trying to unmap a range which has never been
mapped? The IOMMU driver implicitly assumes that mapping and
unmapping are paired.

>          {MAP,   0x0, 0xc0000000}, --------------------------------- (b)
>                  use GDB to pause at here, and then DMA read IOVA=0,

IOVA 0 seems to be a special one. Have you verified with other addresses
than IOVA 0?

>                  sometimes DMA success (as expected),
>                  but sometimes DMA error (report not-present).
>          {UNMAP, 0x0, 0xc0000000}, --------------------------------- (c)
>          {MAP,   0x0, 0xa0000},
>          {MAP,   0xc0000, 0xbff40000},
>      }
> 
> The DMA read operations sholud success between (b) and (c), it should NOT report
> not-present at least!
> 
> After analysis the problem, we think maybe it's caused by the Intel iommu iotlb.
> It seems the DMA Remapping hardware still uses the IOTLB or other caches of (a).
> 
> When do DMA unmap at (a), the iotlb will be flush:
>      intel_iommu_unmap
>          domain_unmap
>              iommu_flush_iotlb_psi
> 
> When do DMA map at (b), no need to flush the iotlb according to the capability
> of this iommu:
>      intel_iommu_map
>          domain_pfn_mapping
>              domain_mapping
>                  __mapping_notify_one
>                      if (cap_caching_mode(iommu->cap)) // FALSE
>                          iommu_flush_iotlb_psi

That's true. The iotlb flushing is not needed in case of PTE been
changed from non-present to present unless caching mode.

> But the problem will disappear if we FORCE flush here. So we suspect the iommu
> hardware.
> 
> Do you have any suggestion ?

Best regards,
baolu

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ