[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <i6jki2zocqzsjcjgraf6lyl7m3cjzv5lnsuluq5xnvznw7bsge@4easx2ucpxml>
Date: Fri, 23 Aug 2024 10:56:42 -0700
From: Shakeel Butt <shakeel.butt@...ux.dev>
To: Barry Song <21cnbao@...il.com>
Cc: hanchuanhua@...o.com, akpm@...ux-foundation.org, linux-mm@...ck.org,
baolin.wang@...ux.alibaba.com, chrisl@...nel.org, david@...hat.com, hannes@...xchg.org,
hughd@...gle.com, kaleshsingh@...gle.com, kasong@...cent.com,
linux-kernel@...r.kernel.org, mhocko@...e.com, minchan@...nel.org, nphamcs@...il.com,
ryan.roberts@....com, senozhatsky@...omium.org, shy828301@...il.com, surenb@...gle.com,
v-songbaohua@...o.com, willy@...radead.org, xiang@...nel.org, ying.huang@...el.com,
yosryahmed@...gle.com, hch@...radead.org, ryncsn@...il.com
Subject: Re: [PATCH v7 2/2] mm: support large folios swap-in for sync io
devices
Hi Barry,
On Thu, Aug 22, 2024 at 05:13:06AM GMT, Barry Song wrote:
> On Thu, Aug 22, 2024 at 1:31 AM Shakeel Butt <shakeel.butt@...ux.dev> wrote:
> >
> > On Wed, Aug 21, 2024 at 03:45:40PM GMT, hanchuanhua@...o.com wrote:
> > > From: Chuanhua Han <hanchuanhua@...o.com>
> > >
> > >
> > > 3. With both mTHP swap-out and swap-in supported, we offer the option to enable
> > > zsmalloc compression/decompression with larger granularity[2]. The upcoming
> > > optimization in zsmalloc will significantly increase swap speed and improve
> > > compression efficiency. Tested by running 100 iterations of swapping 100MiB
> > > of anon memory, the swap speed improved dramatically:
> > > time consumption of swapin(ms) time consumption of swapout(ms)
> > > lz4 4k 45274 90540
> > > lz4 64k 22942 55667
> > > zstdn 4k 85035 186585
> > > zstdn 64k 46558 118533
> >
> > Are the above number with the patch series at [2] or without? Also can
> > you explain your experiment setup or how can someone reproduce these?
>
> Hi Shakeel,
>
> The data was recorded after applying both this patch (swap-in mTHP) and
> patch [2] (compressing/decompressing mTHP instead of page). However,
> without the swap-in series, patch [2] becomes useless because:
>
> If we have a large object, such as 16 pages in zsmalloc:
> do_swap_page will happen 16 times:
> 1. decompress the whole large object and copy one page;
> 2. decompress the whole large object and copy one page;
> 3. decompress the whole large object and copy one page;
> ....
> 16. decompress the whole large object and copy one page;
>
> So, patchset [2] will actually degrade performance rather than
> enhance it if we don't have this swap-in series. This swap-in
> series is a prerequisite for the zsmalloc/zram series.
Thanks for the explanation.
>
> We reproduced the data through the following simple steps:
> 1. Collected anonymous pages from a running phone and saved them to a file.
> 2. Used a small program to open and read the file into a mapped anonymous
> memory.
> 3. Do the belows in the small program:
> swapout_start_time
> madv_pageout()
> swapout_end_time
>
> swapin_start_time
> read_data()
> swapin_end_time
>
> We calculate the throughput of swapout and swapin using the difference between
> end_time and start_time. Additionally, we record the memory usage of zram after
> the swapout is complete.
>
Please correct me if I am wrong but you are saying in your experiment,
100 MiB took 90540 ms to compress/swapout and 45274 ms to
decompress/swapin if backed by 4k pages but took 55667 ms and 22942 ms
if backed by 64k pages. Basically the table shows total time to compress
or decomress 100 MiB of memory, right?
> >
> > > [2] https://lore.kernel.org/all/20240327214816.31191-1-21cnbao@gmail.com/
> >
>
> Thanks
> Barry
Powered by blists - more mailing lists