linux-kernel - Re: [PATCH] iommu/vt-d: Avoid superfluous IOTLB tracking in lazy mode

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <8ab16dfb-8fd1-2035-c191-1b22f72eb30e@linux.intel.com>
Date:   Tue, 7 Feb 2023 14:42:06 +0800
From:   Baolu Lu <baolu.lu@...ux.intel.com>
To:     Robin Murphy <robin.murphy@....com>,
        Jacob Pan <jacob.jun.pan@...ux.intel.com>,
        LKML <linux-kernel@...r.kernel.org>, iommu@...ts.linux.dev,
        Joerg Roedel <joro@...tes.org>
Cc:     baolu.lu@...ux.intel.com, David Woodhouse <dwmw2@...radead.org>,
        Raj Ashok <ashok.raj@...el.com>,
        "Tian, Kevin" <kevin.tian@...el.com>, Yi Liu <yi.l.liu@...el.com>,
        stable@...r.kernel.org, Sanjay Kumar <sanjay.k.kumar@...el.com>
Subject: Re: [PATCH] iommu/vt-d: Avoid superfluous IOTLB tracking in lazy mode

On 2023/2/6 19:20, Robin Murphy wrote:
> On 2023-02-04 06:32, Baolu Lu wrote:
>> On 2023/2/4 7:04, Jacob Pan wrote:
>>> Intel IOMMU driver implements IOTLB flush queue with domain selective
>>> or PASID selective invalidations. In this case there's no need to track
>>> IOVA page range and sync IOTLBs, which may cause significant performance
>>> hit.
>>
>> [Add cc Robin]
>>
>> If I understand this patch correctly, this might be caused by below
>> helper:
>>
>> /**
>>   * iommu_iotlb_gather_add_page - Gather for page-based TLB invalidation
>>   * @domain: IOMMU domain to be invalidated
>>   * @gather: TLB gather data
>>   * @iova: start of page to invalidate
>>   * @size: size of page to invalidate
>>   *
>>   * Helper for IOMMU drivers to build invalidation commands based on 
>> individual
>>   * pages, or with page size/table level hints which cannot be 
>> gathered if they
>>   * differ.
>>   */
>> static inline void iommu_iotlb_gather_add_page(struct iommu_domain 
>> *domain,
>>                                                 struct 
>> iommu_iotlb_gather *gather,
>>                                                 unsigned long iova, 
>> size_t size)
>> {
>>          /*
>>           * If the new page is disjoint from the current range or is 
>> mapped at
>>           * a different granularity, then sync the TLB so that the gather
>>           * structure can be rewritten.
>>           */
>>          if ((gather->pgsize && gather->pgsize != size) ||
>>              iommu_iotlb_gather_is_disjoint(gather, iova, size))
>>                  iommu_iotlb_sync(domain, gather);
>>
>>          gather->pgsize = size;
>>          iommu_iotlb_gather_add_range(gather, iova, size);
>> }
>>
>> As the comments for iommu_iotlb_gather_is_disjoint() says,
>>
>> "...For many IOMMUs, flushing the IOMMU in this case is better
>>   than merging the two, which might lead to unnecessary invalidations.
>>   ..."
>>
>> So, perhaps the right fix for this performance issue is to add
>>
>>      if (!gather->queued)
>>
>> in iommu_iotlb_gather_add_page() or iommu_iotlb_gather_is_disjoint()?
>> It should benefit other arch's as well.
> 
> The iotlb_gather helpers are really just that - little tools to help 
> drivers with various common iotlb_gather accounting patterns. The 
> decision whether to bother with that accounting at all should really 
> come beforehand, and whether a driver supports flush queues is 
> orthogonal to whether it uses any particular gather helper(s) or not, so 
> I think the patch as-is is correct.

Okay, that's fine. Thanks for the explanation.

> 
>>> This patch adds a check to avoid IOVA gather page and IOTLB sync for
>>> the lazy path.
>>>
>>> The performance difference on Sapphire Rapids 100Gb NIC is improved by
>>> the following (as measured by iperf send):
>>
>> Which test case have you done? Post the real data if you have any.
>>
>>>
>>> w/o this fix~48 Gbits/s. with this fix ~54 Gbits/s
>>>
>>> Cc: <stable@...r.kernel.org>
>>
>> Again, add a Fixes tag so that people know how far this fix should be
>> back ported.
> 
> Note that the overall issue probably dates back to the initial iommu-dma 
> conversion, but if you think it's important enough to go back beyond 
> 5.15 when gather->queued was introduced, that'll need a different fix.

So perhaps

Cc: stable@...r.kernel.org # 5.15+

is enough?

Best regards,
baolu