linux-kernel - Re: [PATCH v9 rebase on mm-unstable 0/8] Reduce tlb and interrupt numbers over 90% by improving folio migration

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20240419060234.GA48027@system.software.com>
Date: Fri, 19 Apr 2024 15:02:34 +0900
From: Byungchul Park <byungchul@...com>
To: Andrew Morton <akpm@...ux-foundation.org>
Cc: linux-kernel@...r.kernel.org, linux-mm@...ck.org,
	kernel_team@...ynix.com, ying.huang@...el.com, vernhao@...cent.com,
	mgorman@...hsingularity.net, hughd@...gle.com, willy@...radead.org,
	david@...hat.com, peterz@...radead.org, luto@...nel.org,
	tglx@...utronix.de, mingo@...hat.com, bp@...en8.de,
	dave.hansen@...ux.intel.com, rjgolo@...il.com
Subject: Re: [PATCH v9 rebase on mm-unstable 0/8] Reduce tlb and interrupt
 numbers over 90% by improving folio migration

On Thu, Apr 18, 2024 at 01:17:57PM -0700, Andrew Morton wrote:
> On Thu, 18 Apr 2024 15:15:28 +0900 Byungchul Park <byungchul@...com> wrote:
> 
> >    $ time XSBench -t 16 -p 50000000
> > 
> >    BEFORE
> >    ------
> >    Threads:     16
> >    Runtime:     968.783 seconds
> >    Lookups:     1,700,000,000
> >    Lookups/s:   1,754,778
> > 
> >    15208.91s user 141.44s system 1564% cpu 16:20.98 total
> > 
> >    AFTER
> >    -----
> >    Threads:     16
> >    Runtime:     913.210 seconds
> >    Lookups:     1,700,000,000
> >    Lookups/s:   1,861,565
> > 
> >    14351.69s user 138.23s system 1565% cpu 15:25.47 total
> 
> Well that's nice.  What exactly is XSBench doing in this situation? 

As far as I know, it's frequently and continuously accessing annon areas
with addresses ranged within 6GB, by multi threads.  Thus, it triggers a
lot of promotions by hinting fault of numa balancing tiering and a lot
of demotions by kswapd as well, resulting in a ton of tlb flushes.

All I need is a system suffering from memory reclaim or any type of
folio migration since migrc mechanism is one for mitigating the overhead
of folio migration.  To see the benefits of migrc, it doesn't have to be
XSBench but any workload suffering from reclaim.

> What sort of improvements can we expect to see in useful workloads?

Increase throughput(= runtime reduction of each work in the system).

   1. Because migrc removes the CPU time that would've been spent in IPI
      handler due to tlb shootdown, by skipping a lot of tlb shootdowns.

   2. Becasue migrc reduces tlb misses so as to utilize tlb cache better,
      by skipping a lof of tlb flushes.

Besides, I expect overall scheduler latencies can be enhanced, the worst
latencies measured using some tracters of ftrace showed no change though.

> I see it no longer consumes an additional page flag, good.
> 
> The patches show no evidence of review activity and I'm not seeing much
> on the mailing list (patchset title was changed.  Previous title
> "Reduce TLB flushes under some specific conditions").  Perhaps a better

I changed the title because it was supposed to work with only numa
balancing tiering like promotion and demotion but, for now, migrc works
with any type of folio migration.  Thus, I can tell migrc demonstrates
its benefits as long as a system is under the control of reclaim and
folio migration.

	Byungchul

> description of the overall benefit to our users would help to motivate
> reviewers.