[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240304023934.GA13332@system.software.com>
Date: Mon, 4 Mar 2024 11:39:34 +0900
From: Byungchul Park <byungchul@...com>
To: David Hildenbrand <david@...hat.com>
Cc: linux-kernel@...r.kernel.org, linux-mm@...ck.org,
kernel_team@...ynix.com, akpm@...ux-foundation.org,
ying.huang@...el.com, vernhao@...cent.com,
mgorman@...hsingularity.net, hughd@...gle.com, willy@...radead.org,
peterz@...radead.org, luto@...nel.org, tglx@...utronix.de,
mingo@...hat.com, bp@...en8.de, dave.hansen@...ux.intel.com,
rjgolo@...il.com
Subject: Re: [RESEND PATCH v8 0/8] Reduce TLB flushes by 94% by improving
folio migration
On Thu, Feb 29, 2024 at 10:33:44AM +0100, David Hildenbrand wrote:
> On 29.02.24 10:28, Byungchul Park wrote:
> > On Mon, Feb 26, 2024 at 12:06:05PM +0900, Byungchul Park wrote:
> > > Hi everyone,
> > >
> > > While I'm working with a tiered memory system e.g. CXL memory, I have
> > > been facing migration overhead esp. TLB shootdown on promotion or
> > > demotion between different tiers. Yeah.. most TLB shootdowns on
> > > migration through hinting fault can be avoided thanks to Huang Ying's
> > > work, commit 4d4b6d66db ("mm,unmap: avoid flushing TLB in batch if PTE
> > > is inaccessible"). See the following link:
> > >
> > > https://lore.kernel.org/lkml/20231115025755.GA29979@system.software.com/
> > >
> > > However, it's only for ones using hinting fault. I thought it'd be much
> > > better if we have a general mechanism to reduce the number of TLB
> > > flushes and TLB misses, that we can ultimately apply to any type of
> > > migration, I tried it only for tiering for now tho.
> > >
> > > I'm suggesting a mechanism called MIGRC that stands for 'Migration Read
> > > Copy', to reduce TLB flushes by keeping source and destination of folios
> > > participated in the migrations until all TLB flushes required are done,
> > > only if those folios are not mapped with write permission PTE entries.
> > >
> > > To achieve that:
> > >
> > > 1. For the folios that map only to non-writable TLB entries, prevent
> > > TLB flush at migration by keeping both source and destination
> > > folios, which will be handled later at a better time.
> > >
> > > 2. When any non-writable TLB entry changes to writable e.g. through
> > > fault handler, give up migrc mechanism so as to perform TLB flush
> > > required right away.
> > >
> > > I observed a big improvement of TLB flushes # and TLB misses # at the
> > > following evaluation using XSBench like:
> > >
> > > 1. itlb flush was reduced by 93.9%.
> > > 2. dtlb thread was reduced by 43.5%.
> > > 3. stlb flush was reduced by 24.9%.
> >
> > Hi guys,
>
> Hi,
>
> >
> > The TLB flush reduction is 25% ~ 94%, IMO, it's unbelievable.
>
> Can't we find at least one benchmark that shows an actual improvement on
> some system?
XSBench is more like a real workload that is used for performance
analysis on high performance computing architectrues, not micro
benchmark only for testing TLB things.
XSBench : https://github.com/ANL-CESAR/XSBench
Not to mention TLB numbers, the performance improvement is a little but
clearly positive as you can see the result I shared.
Byungchul
> Staring at the number TLB flushes is nice, but if it does not affect actual
> performance of at least one benchmark why do we even care?
>
> "12 files changed, 597 insertions(+), 59 deletions(-)"
>
> is not negligible and needs proper review.
>
> That review needs motivation. The current numbers do not seem to be
> motivating enough :)
>
> --
> Cheers,
>
> David / dhildenb
Powered by blists - more mailing lists