[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAGsJ_4wgC+yaCYinv8FYm9RHJfT5wiFxHMn_WTGysdpiH0HS7g@mail.gmail.com>
Date: Thu, 22 Aug 2024 05:13:06 +0800
From: Barry Song <21cnbao@...il.com>
To: Shakeel Butt <shakeel.butt@...ux.dev>
Cc: hanchuanhua@...o.com, akpm@...ux-foundation.org, linux-mm@...ck.org,
baolin.wang@...ux.alibaba.com, chrisl@...nel.org, david@...hat.com,
hannes@...xchg.org, hughd@...gle.com, kaleshsingh@...gle.com,
kasong@...cent.com, linux-kernel@...r.kernel.org, mhocko@...e.com,
minchan@...nel.org, nphamcs@...il.com, ryan.roberts@....com,
senozhatsky@...omium.org, shy828301@...il.com, surenb@...gle.com,
v-songbaohua@...o.com, willy@...radead.org, xiang@...nel.org,
ying.huang@...el.com, yosryahmed@...gle.com, hch@...radead.org,
ryncsn@...il.com
Subject: Re: [PATCH v7 2/2] mm: support large folios swap-in for sync io devices
On Thu, Aug 22, 2024 at 1:31 AM Shakeel Butt <shakeel.butt@...ux.dev> wrote:
>
> On Wed, Aug 21, 2024 at 03:45:40PM GMT, hanchuanhua@...o.com wrote:
> > From: Chuanhua Han <hanchuanhua@...o.com>
> >
> >
> > 3. With both mTHP swap-out and swap-in supported, we offer the option to enable
> > zsmalloc compression/decompression with larger granularity[2]. The upcoming
> > optimization in zsmalloc will significantly increase swap speed and improve
> > compression efficiency. Tested by running 100 iterations of swapping 100MiB
> > of anon memory, the swap speed improved dramatically:
> > time consumption of swapin(ms) time consumption of swapout(ms)
> > lz4 4k 45274 90540
> > lz4 64k 22942 55667
> > zstdn 4k 85035 186585
> > zstdn 64k 46558 118533
>
> Are the above number with the patch series at [2] or without? Also can
> you explain your experiment setup or how can someone reproduce these?
Hi Shakeel,
The data was recorded after applying both this patch (swap-in mTHP) and
patch [2] (compressing/decompressing mTHP instead of page). However,
without the swap-in series, patch [2] becomes useless because:
If we have a large object, such as 16 pages in zsmalloc:
do_swap_page will happen 16 times:
1. decompress the whole large object and copy one page;
2. decompress the whole large object and copy one page;
3. decompress the whole large object and copy one page;
....
16. decompress the whole large object and copy one page;
So, patchset [2] will actually degrade performance rather than
enhance it if we don't have this swap-in series. This swap-in
series is a prerequisite for the zsmalloc/zram series.
We reproduced the data through the following simple steps:
1. Collected anonymous pages from a running phone and saved them to a file.
2. Used a small program to open and read the file into a mapped anonymous
memory.
3. Do the belows in the small program:
swapout_start_time
madv_pageout()
swapout_end_time
swapin_start_time
read_data()
swapin_end_time
We calculate the throughput of swapout and swapin using the difference between
end_time and start_time. Additionally, we record the memory usage of zram after
the swapout is complete.
>
> > [2] https://lore.kernel.org/all/20240327214816.31191-1-21cnbao@gmail.com/
>
Thanks
Barry
Powered by blists - more mailing lists