lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAG48ez1q_Sgk5nr7Bngyt0UB3FkYb6e0cHv18wqD=sLEdrZkmw@mail.gmail.com>
Date: Tue, 2 Sep 2025 18:05:42 +0200
From: Jann Horn <jannh@...gle.com>
To: Giovanni Cabiddu <giovanni.cabiddu@...el.com>
Cc: Rik van Riel <riel@...riel.com>, x86@...nel.org, linux-kernel@...r.kernel.org, 
	bp@...en8.de, peterz@...radead.org, dave.hansen@...ux.intel.com, 
	zhengqi.arch@...edance.com, nadav.amit@...il.com, thomas.lendacky@....com, 
	kernel-team@...a.com, linux-mm@...ck.org, akpm@...ux-foundation.org, 
	jackmanb@...gle.com, mhklinux@...look.com, andrew.cooper3@...rix.com, 
	Manali.Shukla@....com, mingo@...nel.org, Dave Hansen <dave.hansen@...el.com>, 
	baolu.lu@...el.com, david.guckian@...el.com, damian.muszynski@...el.com
Subject: Re: [BUG] x86/mm: regression after 4a02ed8e1cc3

On Tue, Sep 2, 2025 at 5:44 PM Giovanni Cabiddu
<giovanni.cabiddu@...el.com> wrote:
> On Tue, Feb 25, 2025 at 10:00:36PM -0500, Rik van Riel wrote:
> > Reduce code duplication by consolidating the decision point
> > for whether to do individual invalidations or a full flush
> > inside get_flush_tlb_info.
> >
> > Signed-off-by: Rik van Riel <riel@...riel.com>
> > Suggested-by: Dave Hansen <dave.hansen@...el.com>
> > Tested-by: Michael Kelley <mhklinux@...look.com>
> > Acked-by: Dave Hansen <dave.hansen@...el.com>
> > Reviewed-by: Borislav Petkov (AMD) <bp@...en8.de>
> > ---
> After 4a02ed8e1cc3 ("x86/mm: Consolidate full flush threshold
> decision"), we've seen data corruption in DMAd buffers when testing SVA.
>
> From our preliminary analysis, it appears that get_flush_tlb_info()
> modifies the start and end parameters for full TLB flushes (setting
> start=0, end=TLB_FLUSH_ALL). However, the MMU notifier call at the end
> of the function still uses the original parameters instead of the
> updated info->start and info->end.
>
> The change below appears to solve the problem, however we are not sure if
> this is the right way to fix the problem.
>
> ----8<----
> diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
> index 39f80111e6f1..e66c7662c254 100644
> --- a/arch/x86/mm/tlb.c
> +++ b/arch/x86/mm/tlb.c
> @@ -1459,7 +1459,7 @@ void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start,
>
>         put_flush_tlb_info();
>         put_cpu();
> -       mmu_notifier_arch_invalidate_secondary_tlbs(mm, start, end);
> +       mmu_notifier_arch_invalidate_secondary_tlbs(mm, info->start, info->end);
>  }

I don't see why the IOMMU flush should be broadened just because the
CPU flush got broadened.

On x86, IOMMU flushes happen from arch_tlbbatch_add_pending() and
flush_tlb_mm_range(); the IOMMU folks might know better, but as far as
I know, there is nothing that elides IOMMU flushes depending on the
state of X86-internal flush generation tracking or such.

To me this looks like a change that is correct but makes it easier to
hit IOMMU flushing issues in other places.

Are you encountering these issues on an Intel system or an AMD system?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ