[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87ikuqvfkl.fsf@yhuang6-desk2.ccr.corp.intel.com>
Date: Fri, 20 Sep 2024 17:29:14 +0800
From: "Huang, Ying" <ying.huang@...el.com>
To: "Sridhar, Kanchana P" <kanchana.p.sridhar@...el.com>
Cc: Yosry Ahmed <yosryahmed@...gle.com>, "linux-kernel@...r.kernel.org"
<linux-kernel@...r.kernel.org>, "linux-mm@...ck.org"
<linux-mm@...ck.org>, "hannes@...xchg.org" <hannes@...xchg.org>,
"nphamcs@...il.com" <nphamcs@...il.com>, "chengming.zhou@...ux.dev"
<chengming.zhou@...ux.dev>, "usamaarif642@...il.com"
<usamaarif642@...il.com>, "ryan.roberts@....com" <ryan.roberts@....com>,
"21cnbao@...il.com" <21cnbao@...il.com>, "akpm@...ux-foundation.org"
<akpm@...ux-foundation.org>, "Zou, Nanhai" <nanhai.zou@...el.com>,
"Feghali, Wajdi K" <wajdi.k.feghali@...el.com>, "Gopal, Vinodh"
<vinodh.gopal@...el.com>
Subject: Re: [PATCH v6 0/3] mm: ZSWAP swap-out of mTHP folios
"Sridhar, Kanchana P" <kanchana.p.sridhar@...el.com> writes:
[snip]
>
> Thanks, these are good points. I ran this experiment with mm-unstable 9-17-2024,
> commit 248ba8004e76eb335d7e6079724c3ee89a011389.
>
> Data is based on average of 3 runs of the vm-scalability "usemem" test.
>
> 4G SSD backing zswap, each process sleeps before exiting
> ========================================================
>
> 64KB mTHP (cgroup memory.high set to 60G, no swap limit):
> =========================================================
> CONFIG_THP_SWAP=Y
> Sapphire Rapids server with 503 GiB RAM and 4G SSD swap backing device
> for zswap.
>
> Experiment 1: Each process sleeps for 0 sec after allocating memory
> (usemem --init-time -w -O --sleep 0 -n 70 1g):
>
> -------------------------------------------------------------------------------
> mm-unstable 9-17-2024 zswap-mTHP v6 Change wrt
> Baseline Baseline
> "before" "after" (sleep 0)
> -------------------------------------------------------------------------------
> ZSWAP compressor zstd deflate- zstd deflate- zstd deflate-
> iaa iaa iaa
> -------------------------------------------------------------------------------
> Throughput (KB/s) 296,684 274,207 359,722 390,162 21% 42%
> sys time (sec) 92.67 93.33 251.06 237.56 -171% -155%
> memcg_high 3,503 3,769 44,425 27,154
> memcg_swap_fail 0 0 115,814 141,936
> pswpin 17 0 0 0
> pswpout 370,853 393,232 0 0
> zswpin 693 123 666 667
> zswpout 1,484 123 1,366,680 1,199,645
> thp_swpout 0 0 0 0
> thp_swpout_ 0 0 0 0
> fallback
> pgmajfault 3,384 2,951 3,656 3,468
> ZSWPOUT-64kB n/a n/a 82,940 73,121
> SWPOUT-64kB 23,178 24,577 0 0
> -------------------------------------------------------------------------------
>
>
> Experiment 2: Each process sleeps for 10 sec after allocating memory
> (usemem --init-time -w -O --sleep 10 -n 70 1g):
>
> -------------------------------------------------------------------------------
> mm-unstable 9-17-2024 zswap-mTHP v6 Change wrt
> Baseline Baseline
> "before" "after" (sleep 10)
> -------------------------------------------------------------------------------
> ZSWAP compressor zstd deflate- zstd deflate- zstd deflate-
> iaa iaa iaa
> -------------------------------------------------------------------------------
> Throughput (KB/s) 86,744 93,730 157,528 113,110 82% 21%
> sys time (sec) 308.87 315.29 477.55 629.98 -55% -100%
What is the elapsed time for all cases?
> memcg_high 169,450 188,700 143,691 177,887
> memcg_swap_fail 10,131,859 9,740,646 18,738,715 19,528,110
> pswpin 17 16 0 0
> pswpout 1,154,779 1,210,485 0 0
> zswpin 711 659 1,016 736
> zswpout 70,212 50,128 1,235,560 1,275,917
> thp_swpout 0 0 0 0
> thp_swpout_ 0 0 0 0
> fallback
> pgmajfault 6,120 6,291 8,789 6,474
> ZSWPOUT-64kB n/a n/a 67,587 68,912
> SWPOUT-64kB 72,174 75,655 0 0
> -------------------------------------------------------------------------------
>
>
> Conclusions from the experiments:
> =================================
> 1) zswap-mTHP improves throughput as compared to the baseline, for zstd and
> deflate-iaa.
>
> 2) Yosry's theory is proved correct in the 4G constrained swap setup.
> When the processes are constrained to sleep 10 sec after allocating
> memory, thereby keeping the memory allocated longer, the "Baseline" or
> "before" with mTHP getting stored in SSD shows a degradation of 71% in
> throughput and 238% in sys time, as compared to the "Baseline" with
Higher sys time may come from compression with CPU vs. disk writing?
> sleep 0 that benefits from serialization of disk IO not allowing all
> processes to allocate memory at the same time.
>
> 3) In the 4G SSD "sleep 0" case, zswap-mTHP shows an increase in sys time
> due to the cgroup charging and consequently higher memcg.high breaches
> and swapout activity.
>
> However, the "sleep 10" case's sys time seems to degrade less, and the
> memcg.high breaches and swapout activity are almost similar between the
> before/after (confirming Yosry's hypothesis). Further, the
> memcg_swap_fail activity in the "after" scenario is almost 2X that of
> the "before". This indicates failure to obtain swap offsets, resulting
> in the folio remaining active in memory.
>
> I tried to better understand this through the 64k mTHP swpout_fallback
> stats in the "sleep 10" zstd experiments:
>
> --------------------------------------------------------------
> "before" "after"
> --------------------------------------------------------------
> 64k mTHP swpout_fallback 627,308 897,407
> 64k folio swapouts 72,174 67,587
> [p|z]swpout events due to 64k mTHP 1,154,779 1,081,397
> 4k folio swapouts 70,212 154,163
> --------------------------------------------------------------
>
> The data indicates a higher # of 64k folio swpout_fallback with
> zswap-mTHP, that co-relates with the higher memcg_swap_fail counts and
> 4k folio swapouts with zswap-mTHP. Could the root-cause be fragmentation
> of the swap space due to zswap swapout being faster than SSD swapout?
>
[snip]
--
Best Regards,
Huang, Ying
Powered by blists - more mailing lists