lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJD7tkbk6tLMSKKc1XChJvpOi=J_T0WXXgwfscN0n8CK+CDoYQ@mail.gmail.com>
Date: Tue, 7 Jan 2025 17:36:15 -0800
From: Yosry Ahmed <yosryahmed@...gle.com>
To: Rik van Riel <riel@...riel.com>
Cc: x86@...nel.org, linux-kernel@...r.kernel.org, kernel-team@...a.com, 
	dave.hansen@...ux.intel.com, luto@...nel.org, peterz@...radead.org, 
	tglx@...utronix.de, mingo@...hat.com, bp@...en8.de, hpa@...or.com, 
	akpm@...ux-foundation.org, nadav.amit@...il.com, zhengqi.arch@...edance.com, 
	linux-mm@...ck.org, Reiji Watanabe <reijiw@...gle.com>, 
	Brendan Jackman <jackmanb@...gle.com>
Subject: Re: [PATCH v3 00/12] AMD broadcast TLB invalidation

On Mon, Jan 6, 2025 at 7:25 PM Rik van Riel <riel@...riel.com> wrote:
>
> On Mon, 2025-01-06 at 14:49 -0800, Yosry Ahmed wrote:
> >
> > We briefly looked at using INVLPGB/TLBSYNC as part of the ASI work to
> > optimize away the async freeing logic which sends TLB flush IPIs.
> >
> > I have a high-level question about INVLPGB/TLBSYNC that I could not
> > immediately find the answer to in the AMD manual. Sorry if I missed
> > the answer or if I missed something obvious.
> >
> > Do we know what the underlying mechanism for delivering the TLB
> > flushes is? If a CPU has interrupts disabled, does it still receive
> > the broadcast TLB flush request and handle it?
>
> I assume TLB invalidation is probably handled similarly
> to how cache coherency is handled between CPUs.
>
> However, it probably does not need to be quite as fast,
> since cache coherency traffic is probably 2-6 orders of
> magnitude more common than TLB invalidation traffic.
>
> >
> > My main concern is that TLBSYNC is a single instruction that seems
> > like it will wait for an arbitrary amount of time, and IIUC
> > interrupts
> > (and NMIs) will not be delivered to the running CPU until after the
> > instruction completes execution (only at an instruction boundary).
> >
> > Are there any guarantees about other CPUs handling the broadcast TLB
> > flush in a timely manner, or an explanation of how CPUs handle the
> > incoming requests in general?
>
> The performance numbers I got with the tlb_flush2_threads
> microbenchmark strongly suggest that INVLPGB flushes are
> handled by the receiving CPUs even while interrupts are
> disabled.
>
> CPU time spent in flush_tlb_mm_range goes down with
> INVLPGB, compared with IPI based TLB flushing, even when
> the IPIs only go to a subset of CPUs.
>
> I have no idea whether the invalidation is handled by
> something like microcode in the CPU, by the (more
> external?) logic that handles cache coherency, or
> something else entirely.
>
> I suspect AMD wouldn't tell us exactly ;)

Well, ideally they would just tell us the conditions under which CPUs
respond to the broadcast TLB flush or the expectations around latency.
I am also wondering if a CPU can respond to an INVLPGB while running
TLBSYNC, specifically if it's possible for two CPUs to send broadcasts
to one another and then execute TLBSYNC to wait for each other. Could
this lead to a deadlock? I think the answer is no but we have little
understanding about what's going on under the hood to know for sure
(or at least I do).

>
> --
> All Rights Reversed.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ