[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <luzn25fgin43cnbmvmxwps7isqeq2pt5kfn26jqzly6hbnedlp@ojpw52ldzmuw>
Date: Thu, 8 Jan 2026 12:39:35 +0900
From: Sergey Senozhatsky <senozhatsky@...omium.org>
To: zhangdongdong <zhangdongdong925@...a.com>,
Jens Axboe <axboe@...nel.dk>
Cc: Sergey Senozhatsky <senozhatsky@...omium.org>,
Andrew Morton <akpm@...ux-foundation.org>, Richard Chang <richardycc@...gle.com>,
Minchan Kim <minchan@...nel.org>, Brian Geffon <bgeffon@...gle.com>,
David Stevens <stevensd@...gle.com>, linux-kernel@...r.kernel.org, linux-mm@...ck.org,
linux-block@...r.kernel.org, Minchan Kim <minchan@...gle.com>
Subject: Re: [PATCHv2 1/7] zram: introduce compressed data writeback
Hi,
On (26/01/08 10:57), zhangdongdong wrote:
> > Do you use any strategies for writeback? Compressed writeback
> > is supposed to be used for apps for which latency is not critical
> > or sensitive, because of on-demand decompression costs.
> >
>
> Hi Sergey,
>
> Sorry for the delayed reply — I had some urgent matters come up and only
> got back to this now ;)
No worries, you reply in a perfectly reasonable time frame.
> Yes, we do use writeback strategies on our side. The current implementation
> focuses on batched writeback of compressed data from
> zram, managed on a per-app / per-memcg basis. We track and control how
> much data from each app is written back to the backing storage, with the
> same assumption you mentioned: compressed writeback is primarily
> intended for workloads where latency is not critical.
>
> Accurate prefetching on swap-in is still an open problem for us. As you
> pointed out, both the I/O itself and on-demand decompression introduce
> additional latency on the readback path, and minimizing their impact
> remains challenging.
>
> Regarding the workqueue choice: initially we used system_dfl_wq for the
> read/decompression path. Later, based on observed scheduling latency
> under memory pressure, we switched to a dedicated workqueue created with
> WQ_HIGHPRI | WQ_UNBOUND. This change helped reduce scheduling
> interference, but it also reinforced our concern that deferring
> decompression to a worker still adds an extra scheduling hop on the
> swap-in path.
How bad (and often) is your memory pressure situation? I just wonder
if your case is an outlier, so to speak.
Just thinking aloud:
I really don't see a path back to atomic zram read/write. Those
were very painful and problematic, I do not consider a possibility
of re-introducing them, especially if the reason is an optional
feature (which comp-wb is). If we want to improve latency, we need
to find a way to do it without going back to atomic read/write,
assuming that latency becomes unbearable. But at the same time under
memory pressure everything becomes janky at some point, so I don't
know if comp-wb latency is the biggest problem in that case.
Dunno, *maybe* we can explore a possibility of grabbing both entry-lock
and per-CPU compression stream before we queue async bio, so that in
the bio completion we already *sort of* have everything we need.
However, that comes with a bunch of issues:
- the number of per-CPU compression streams is limited, naturally,
to the number of CPUs. So if we have a bunch of comp-wb reads we
can block all other activities: normal zram reads/writes, which
compete for the same per-CPU compressions streams.
- this still puts atomicity requirements on the compressors. I haven't
looked into, for instance, zstd *de*-compression code, but I know for
sure that zstd compression code allocates memory internally when
configured to use pre-trained CD-dictionaries, effectively making zstd
use GFP_ATOMIC allocations internally, if called from atomic context.
Do we have anything like that in decompression - I don't know. But in
general we cannot be sure that all compressors work in atomic context
in the same way as they do in non-atomic context.
I don't know if solving it on zram side alone is possible. Maybe we
can get some help from the block layer: some sort of two-stage bio
submission. First stage: submit chained bio-s, second stage: iterate
over all submitted and completed bio-s and decompress the data. Again,
just thinking out loud.
Powered by blists - more mailing lists