[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAOLeGd3a63_za6cYs3HyzFn1A=j7gaEcWurT9yuXknMspa80fA@mail.gmail.com>
Date: Wed, 2 Jul 2025 19:45:12 +0300
From: Ioanna Alifieraki <ioanna-maria.alifieraki@...onical.com>
To: Baolu Lu <baolu.lu@...ux.intel.com>
Cc: kevin.tian@...el.com, jroedel@...e.de, robin.murphy@....com,
will@...nel.org, joro@...tes.org, dwmw2@...radead.org, iommu@...ts.linux.dev,
linux-kernel@...r.kernel.org, regressions@...ts.linux.dev,
stable@...r.kernel.org
Subject: Re: [REGRESSION][BISECTED] Performance Regression in IOMMU/VT-d Since
Kernel 6.10
On Wed, Jul 2, 2025 at 12:00 PM Baolu Lu <baolu.lu@...ux.intel.com> wrote:
>
> On 7/2/2025 1:14 PM, Baolu Lu wrote:
> > On 7/2/25 01:11, Ioanna Alifieraki wrote:
> >> #regzbot introduced: 129dab6e1286
> >>
> >> Hello everyone,
> >>
> >> We've identified a performance regression that starts with linux
> >> kernel 6.10 and persists through 6.16(tested at commit e540341508ce).
> >> Bisection pointed to commit:
> >> 129dab6e1286 ("iommu/vt-d: Use cache_tag_flush_range_np() in
> >> iotlb_sync_map").
> >>
> >> The issue occurs when running fio against two NVMe devices located
> >> under the same PCIe bridge (dual-port NVMe configuration). Performance
> >> drops compared to configurations where the devices are on different
> >> bridges.
> >>
> >> Observed Performance:
> >> - Before the commit: ~6150 MiB/s, regardless of NVMe device placement.
> >> - After the commit:
> >> -- Same PCIe bridge: ~4985 MiB/s
> >> -- Different PCIe bridges: ~6150 MiB/s
> >>
> >>
> >> Currently we can only reproduce the issue on a Z3 metal instance on
> >> gcp. I suspect the issue can be reproducible if you have a dual port
> >> nvme on any machine.
> >> At [1] there's a more detailed description of the issue and details
> >> on the reproducer.
> >
> > This test was running on bare metal hardware instead of any
> > virtualization guest, right? If that's the case,
> > cache_tag_flush_range_np() is almost a no-op.
> >
> > Can you please show me the capability register of the IOMMU by:
> >
> > #cat /sys/bus/pci/devices/[pci_dev_name]/iommu/intel-iommu/cap
>
> Also, can you please try whether the below changes make any difference?
> I've also attached a patch file to this email so you can apply the
> change more easily.
Thanks for the patch Baolu, I've tested and I can confirm we get ~6150MiB/s
for nvme pairs both under the same and different bridge.
The output of
cat /sys/bus/pci/devices/[pci_dev_name]/iommu/intel-iommu/cap
19ed008c40780c66
for all nvmes.
I got confirmation there's no virtualization happening on this instance
at all.
FWIW, I had run perf when initially investigating the issue and it was
showing quite some time spent in cache_tag_flush_range_np().
Thanks again!
Jo
>
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index 7aa3932251b2..f60201ee4be0 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -1796,6 +1796,18 @@ static int domain_setup_first_level(struct
> intel_iommu *iommu,
> (pgd_t *)pgd, flags, old);
> }
>
> +static bool domain_need_iotlb_sync_map(struct dmar_domain *domain,
> + struct intel_iommu *iommu)
> +{
> + if (cap_caching_mode(iommu->cap) && !domain->use_first_level)
> + return true;
> +
> + if (rwbf_quirk || cap_rwbf(iommu->cap))
> + return true;
> +
> + return false;
> +}
> +
> static int dmar_domain_attach_device(struct dmar_domain *domain,
> struct device *dev)
> {
> @@ -1833,6 +1845,8 @@ static int dmar_domain_attach_device(struct
> dmar_domain *domain,
> if (ret)
> goto out_block_translation;
>
> + domain->iotlb_sync_map |= domain_need_iotlb_sync_map(domain, iommu);
> +
> return 0;
>
> out_block_translation:
> @@ -3945,7 +3959,10 @@ static bool risky_device(struct pci_dev *pdev)
> static int intel_iommu_iotlb_sync_map(struct iommu_domain *domain,
> unsigned long iova, size_t size)
> {
> - cache_tag_flush_range_np(to_dmar_domain(domain), iova, iova + size - 1);
> + struct dmar_domain *dmar_domain = to_dmar_domain(domain);
> +
> + if (dmar_domain->iotlb_sync_map)
> + cache_tag_flush_range_np(dmar_domain, iova, iova + size - 1);
>
> return 0;
> }
> diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
> index 3ddbcc603de2..7ab2c34a5ecc 100644
> --- a/drivers/iommu/intel/iommu.h
> +++ b/drivers/iommu/intel/iommu.h
> @@ -614,6 +614,9 @@ struct dmar_domain {
> u8 has_mappings:1; /* Has mappings configured through
> * iommu_map() interface.
> */
> + u8 iotlb_sync_map:1; /* Need to flush IOTLB cache or write
> + * buffer when creating mappings.
> + */
>
> spinlock_t lock; /* Protect device tracking lists */
> struct list_head devices; /* all devices' list */
> --
> 2.43.0
>
> Thanks,
> baolu
Powered by blists - more mailing lists