linux-kernel - Re: [PATCH v7 2/2] mm: support large folios swap-in for sync io devices

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <i6jki2zocqzsjcjgraf6lyl7m3cjzv5lnsuluq5xnvznw7bsge@4easx2ucpxml>
Date: Fri, 23 Aug 2024 10:56:42 -0700
From: Shakeel Butt <shakeel.butt@...ux.dev>
To: Barry Song <21cnbao@...il.com>
Cc: hanchuanhua@...o.com, akpm@...ux-foundation.org, linux-mm@...ck.org, 
	baolin.wang@...ux.alibaba.com, chrisl@...nel.org, david@...hat.com, hannes@...xchg.org, 
	hughd@...gle.com, kaleshsingh@...gle.com, kasong@...cent.com, 
	linux-kernel@...r.kernel.org, mhocko@...e.com, minchan@...nel.org, nphamcs@...il.com, 
	ryan.roberts@....com, senozhatsky@...omium.org, shy828301@...il.com, surenb@...gle.com, 
	v-songbaohua@...o.com, willy@...radead.org, xiang@...nel.org, ying.huang@...el.com, 
	yosryahmed@...gle.com, hch@...radead.org, ryncsn@...il.com
Subject: Re: [PATCH v7 2/2] mm: support large folios swap-in for sync io
 devices

Hi Barry,

On Thu, Aug 22, 2024 at 05:13:06AM GMT, Barry Song wrote:
> On Thu, Aug 22, 2024 at 1:31 AM Shakeel Butt <shakeel.butt@...ux.dev> wrote:
> >
> > On Wed, Aug 21, 2024 at 03:45:40PM GMT, hanchuanhua@...o.com wrote:
> > > From: Chuanhua Han <hanchuanhua@...o.com>
> > >
> > >
> > > 3. With both mTHP swap-out and swap-in supported, we offer the option to enable
> > >    zsmalloc compression/decompression with larger granularity[2]. The upcoming
> > >    optimization in zsmalloc will significantly increase swap speed and improve
> > >    compression efficiency. Tested by running 100 iterations of swapping 100MiB
> > >    of anon memory, the swap speed improved dramatically:
> > >                 time consumption of swapin(ms)   time consumption of swapout(ms)
> > >      lz4 4k                  45274                    90540
> > >      lz4 64k                 22942                    55667
> > >      zstdn 4k                85035                    186585
> > >      zstdn 64k               46558                    118533
> >
> > Are the above number with the patch series at [2] or without? Also can
> > you explain your experiment setup or how can someone reproduce these?
> 
> Hi Shakeel,
> 
> The data was recorded after applying both this patch (swap-in mTHP) and
> patch [2] (compressing/decompressing mTHP instead of page). However,
> without the swap-in series, patch [2] becomes useless because:
> 
> If we have a large object, such as 16 pages in zsmalloc:
> do_swap_page will happen 16 times:
> 1. decompress the whole large object and copy one page;
> 2. decompress the whole large object and copy one page;
> 3. decompress the whole large object and copy one page;
> ....
> 16.  decompress the whole large object and copy one page;
> 
> So, patchset [2] will actually degrade performance rather than
> enhance it if we don't have this swap-in series. This swap-in
> series is a prerequisite for the zsmalloc/zram series.

Thanks for the explanation.

> 
> We reproduced the data through the following simple steps:
> 1. Collected anonymous pages from a running phone and saved them to a file.
> 2. Used a small program to open and read the file into a mapped anonymous
> memory.
> 3.  Do the belows in the small program:
> swapout_start_time
> madv_pageout()
> swapout_end_time
> 
> swapin_start_time
> read_data()
> swapin_end_time
> 
> We calculate the throughput of swapout and swapin using the difference between
> end_time and start_time. Additionally, we record the memory usage of zram after
> the swapout is complete.
> 

Please correct me if I am wrong but you are saying in your experiment,
100 MiB took 90540 ms to compress/swapout and 45274 ms to
decompress/swapin if backed by 4k pages but took 55667 ms and 22942 ms
if backed by 64k pages. Basically the table shows total time to compress
or decomress 100 MiB of memory, right?

> >
> > > [2] https://lore.kernel.org/all/20240327214816.31191-1-21cnbao@gmail.com/
> >
> 
> Thanks
> Barry