lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 17 Mar 2021 11:16:58 +0800
From:   "Longpeng (Mike, Cloud Infrastructure Service Product Dept.)" 
        <longpeng2@...wei.com>
To:     <dwmw2@...radead.org>, <baolu.lu@...ux.intel.com>,
        <joro@...tes.org>, <will@...nel.org>, <alex.williamson@...hat.com>
CC:     <iommu@...ts.linux-foundation.org>,
        LKML <linux-kernel@...r.kernel.org>,
        "Gonglei (Arei)" <arei.gonglei@...wei.com>,
        <chenjiashang@...wei.com>, <longpeng2@...wei.com>
Subject: A problem of Intel IOMMU hardware ?

Hi guys,

We find the Intel iommu cache (i.e. iotlb) maybe works wrong in a special
situation, it would cause DMA fails or get wrong data.

The reproducer (based on Alex's vfio testsuite[1]) is in attachment, it can
reproduce the problem with high probability (~50%).

The machine we used is:
processor	: 47
vendor_id	: GenuineIntel
cpu family	: 6
model		: 85
model name	: Intel(R) Xeon(R) Gold 6146 CPU @ 3.20GHz
stepping	: 4
microcode	: 0x2000069

And the iommu capability reported is:
ver 1:0 cap 8d2078c106f0466 ecap f020df
(caching mode = 0 , page-selective invalidation = 1)

(The problem is also on 'Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz' and
'Intel(R) Xeon(R) Platinum 8378A CPU @ 3.00GHz')

We run the reproducer on Linux 4.18 and it works as follow:

Step 1. alloc 4G *2M-hugetlb* memory (N.B. no problem with 4K-page mapping)
Step 2. DMA Map 4G memory
Step 3.
    while (1) {
        {UNMAP, 0x0, 0xa0000}, ------------------------------------ (a)
        {UNMAP, 0xc0000, 0xbff40000},
        {MAP,   0x0, 0xc0000000}, --------------------------------- (b)
                use GDB to pause at here, and then DMA read IOVA=0,
                sometimes DMA success (as expected),
                but sometimes DMA error (report not-present).
        {UNMAP, 0x0, 0xc0000000}, --------------------------------- (c)
        {MAP,   0x0, 0xa0000},
        {MAP,   0xc0000, 0xbff40000},
    }

The DMA read operations sholud success between (b) and (c), it should NOT report
not-present at least!

After analysis the problem, we think maybe it's caused by the Intel iommu iotlb.
It seems the DMA Remapping hardware still uses the IOTLB or other caches of (a).

When do DMA unmap at (a), the iotlb will be flush:
    intel_iommu_unmap
        domain_unmap
            iommu_flush_iotlb_psi

When do DMA map at (b), no need to flush the iotlb according to the capability
of this iommu:
    intel_iommu_map
        domain_pfn_mapping
            domain_mapping
                __mapping_notify_one
                    if (cap_caching_mode(iommu->cap)) // FALSE
                        iommu_flush_iotlb_psi
But the problem will disappear if we FORCE flush here. So we suspect the iommu
hardware.

Do you have any suggestion ?








View attachment "vfiotest.c" of type "text/plain" (19319 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ