[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87v7yks0kd.fsf@yhuang6-desk2.ccr.corp.intel.com>
Date: Wed, 25 Sep 2024 14:35:14 +0800
From: "Huang, Ying" <ying.huang@...el.com>
To: Kanchana P Sridhar <kanchana.p.sridhar@...el.com>
Cc: <linux-kernel@...r.kernel.org>, <linux-mm@...ck.org>,
<hannes@...xchg.org>, <yosryahmed@...gle.com>, <nphamcs@...il.com>,
<chengming.zhou@...ux.dev>, <usamaarif642@...il.com>,
<shakeel.butt@...ux.dev>, <ryan.roberts@....com>, <21cnbao@...il.com>,
<akpm@...ux-foundation.org>, <nanhai.zou@...el.com>,
<wajdi.k.feghali@...el.com>, <vinodh.gopal@...el.com>
Subject: Re: [PATCH v7 0/8] mm: ZSWAP swap-out of mTHP folios
Kanchana P Sridhar <kanchana.p.sridhar@...el.com> writes:
[snip]
>
> Case 1: Comparing zswap 4K vs. zswap mTHP
> =========================================
>
> In this scenario, the "before" is CONFIG_THP_SWAP set to off, that results in
> 64K/2M (m)THP to be split into 4K folios that get processed by zswap.
>
> The "after" is CONFIG_THP_SWAP set to on, and this patch-series, that results
> in 64K/2M (m)THP to not be split, and processed by zswap.
>
> 64KB mTHP (cgroup memory.high set to 40G):
> ==========================================
>
> -------------------------------------------------------------------------------
> mm-unstable 9-23-2024 zswap-mTHP Change wrt
> CONFIG_THP_SWAP=N CONFIG_THP_SWAP=Y Baseline
> Baseline
> -------------------------------------------------------------------------------
> ZSWAP compressor zstd deflate- zstd deflate- zstd deflate-
> iaa iaa iaa
> -------------------------------------------------------------------------------
> Throughput (KB/s) 143,323 125,485 153,550 129,609 7% 3%
> elapsed time (sec) 24.97 25.42 23.90 25.19 4% 1%
> sys time (sec) 822.72 750.96 757.70 731.13 8% 3%
> memcg_high 132,743 169,825 148,075 192,744
> memcg_swap_fail 639,067 841,553 2,204 2,215
> pswpin 0 0 0 0
> pswpout 0 0 0 0
> zswpin 795 873 760 902
> zswpout 10,011,266 13,195,137 10,010,017 13,193,554
> thp_swpout 0 0 0 0
> thp_swpout_ 0 0 0 0
> fallback
> 64kB-mthp_ 639,065 841,553 2,204 2,215
> swpout_fallback
> pgmajfault 2,861 2,924 3,054 3,259
> ZSWPOUT-64kB n/a n/a 623,451 822,268
> SWPOUT-64kB 0 0 0 0
> -------------------------------------------------------------------------------
>
IIUC, the throughput is the sum of throughput of all usemem processes?
One possible issue of usemem test case is the "imbalance" issue. That
is, some usemem processes may swap-out/swap-in less, so the score is
very high; while some other processes may swap-out/swap-in more, so the
score is very low. Sometimes, the total score decreases, but the scores
of usemem processes are more balanced, so that the performance should be
considered better. And, in general, we should make usemem score
balanced among processes via say longer test time. Can you check this
in your test results?
[snip]
--
Best Regards,
Huang, Ying
Powered by blists - more mailing lists