linux-kernel - Re: [PATCH 0/2] Improve Zram by separating compression context from kswapd

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <32d951629ab18bcb2cb59b0c0baab65de915dbea.camel@mediatek.com>
Date: Tue, 11 Mar 2025 14:12:55 +0000
From: Qun-wei Lin (林群崴) <Qun-wei.Lin@...iatek.com>
To: "21cnbao@...il.com" <21cnbao@...il.com>, "senozhatsky@...omium.org"
	<senozhatsky@...omium.org>
CC: Chinwen Chang (張錦文)
	<chinwen.chang@...iatek.com>, Andrew Yang (楊智強)
	<Andrew.Yang@...iatek.com>, Casper Li (李中榮)
	<casper.li@...iatek.com>, "nphamcs@...il.com" <nphamcs@...il.com>,
	"chrisl@...nel.org" <chrisl@...nel.org>,
	James Hsu (徐慶薰) <James.Hsu@...iatek.com>,
	AngeloGioacchino Del Regno <angelogioacchino.delregno@...labora.com>,
	"akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"linux-mediatek@...ts.infradead.org" <linux-mediatek@...ts.infradead.org>,
	"ira.weiny@...el.com" <ira.weiny@...el.com>, "linux-mm@...ck.org"
	<linux-mm@...ck.org>, "dave.jiang@...el.com" <dave.jiang@...el.com>,
	"vishal.l.verma@...el.com" <vishal.l.verma@...el.com>,
	"schatzberg.dan@...il.com" <schatzberg.dan@...il.com>,
	"viro@...iv.linux.org.uk" <viro@...iv.linux.org.uk>, "ryan.roberts@....com"
	<ryan.roberts@....com>, "minchan@...nel.org" <minchan@...nel.org>,
	"axboe@...nel.dk" <axboe@...nel.dk>, "linux-block@...r.kernel.org"
	<linux-block@...r.kernel.org>, "kasong@...cent.com" <kasong@...cent.com>,
	"nvdimm@...ts.linux.dev" <nvdimm@...ts.linux.dev>,
	"linux-arm-kernel@...ts.infradead.org"
	<linux-arm-kernel@...ts.infradead.org>, "matthias.bgg@...il.com"
	<matthias.bgg@...il.com>, "ying.huang@...el.com" <ying.huang@...el.com>,
	"dan.j.williams@...el.com" <dan.j.williams@...el.com>
Subject: Re: [PATCH 0/2] Improve Zram by separating compression context from
 kswapd

On Tue, 2025-03-11 at 22:33 +1300, Barry Song wrote:
> 
> External email : Please do not click links or open attachments until
> you have verified the sender or the content.
> 
> 
> On Tue, Mar 11, 2025 at 5:58 PM Sergey Senozhatsky
> <senozhatsky@...omium.org> wrote:
> > 
> > On (25/03/08 18:41), Barry Song wrote:
> > > On Sat, Mar 8, 2025 at 12:03 PM Nhat Pham <nphamcs@...il.com>
> > > wrote:
> > > > 
> > > > On Fri, Mar 7, 2025 at 4:02 AM Qun-Wei Lin
> > > > <qun-wei.lin@...iatek.com> wrote:
> > > > > 
> > > > > This patch series introduces a new mechanism called
> > > > > kcompressd to
> > > > > improve the efficiency of memory reclaiming in the operating
> > > > > system. The
> > > > > main goal is to separate the tasks of page scanning and page
> > > > > compression
> > > > > into distinct processes or threads, thereby reducing the load
> > > > > on the
> > > > > kswapd thread and enhancing overall system performance under
> > > > > high memory
> > > > > pressure conditions.
> > > > 
> > > > Please excuse my ignorance, but from your cover letter I still
> > > > don't
> > > > quite get what is the problem here? And how would decouple
> > > > compression
> > > > and scanning help?
> > > 
> > > My understanding is as follows:
> > > 
> > > When kswapd attempts to reclaim M anonymous folios and N file
> > > folios,
> > > the process involves the following steps:
> > > 
> > > * t1: Time to scan and unmap anonymous folios
> > > * t2: Time to compress anonymous folios
> > > * t3: Time to reclaim file folios
> > > 
> > > Currently, these steps are executed sequentially, meaning the
> > > total time
> > > required to reclaim M + N folios is t1 + t2 + t3.
> > > 
> > > However, Qun-Wei's patch enables t1 + t3 and t2 to run in
> > > parallel,
> > > reducing the total time to max(t1 + t3, t2). This likely improves
> > > the
> > > reclamation speed, potentially reducing allocation stalls.
> > 
> > If compression kthread-s can run (have CPUs to be scheduled on).
> > This looks a bit like a bottleneck.  Is there anything that
> > guarantees forward progress?  Also, if compression kthreads
> > constantly preempt kswapd, then it might not be worth it to
> > have compression kthreads, I assume?
> 
> Thanks for your critical insights, all of which are valuable.
> 
> Qun-Wei is likely working on an Android case where the CPU is
> relatively idle in many scenarios (though there are certainly cases
> where all CPUs are busy), but free memory is quite limited.
> We may soon see benefits for these types of use cases. I expect
> Android might have the opportunity to adopt it before it's fully
> ready upstream.
> 
> If the workload keeps all CPUs busy, I suppose this async thread
> won’t help, but at least we might find a way to mitigate regression.
> 
> We likely need to collect more data on various scenarios—when
> CPUs are relatively idle and when all CPUs are busy—and
> determine the proper approach based on the data, which we
> currently lack :-)
> 

Thanks for the explaining!

> > 
> > If we have a pagefault and need to map a page that is still in
> > the compression queue (not compressed and stored in zram yet, e.g.
> > dut to scheduling latency + slow compression algorithm) then what
> > happens?
> 
> This is happening now even without the patch?  Right now we are
> having 4 steps:
> 1. add_to_swap: The folio is added to the swapcache.
> 2. try_to_unmap: PTEs are converted to swap entries.
> 3. pageout: The folio is written back.
> 4. Swapcache is cleared.
> 
> If a swap-in occurs between 2 and 4, doesn't that mean
> we've already encountered the case where we hit
> the swapcache for a folio undergoing compression?
> 
> It seems we might have an opportunity to terminate
> compression if the request is still in the queue and
> compression hasn’t started for a folio yet? seems
> quite difficult to do?

As Barry explained, these folios that are being compressed are in the
swapcache. If a refault occurs during the compression process, its
correctness is already guaranteed by the swap subsystem (similar to 
other asynchronous swap devices).

Indeed, terminating a folio that is already in the queue waiting for
compression is a challenging task. Will this require some modifications
to the current architecture of swap subsystem?

> 
> Thanks
> Barry

Best Regards,
Qun-wei