linux-kernel - Re: [PATCHv2 1/7] zram: introduce compressed data writeback

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <731f6e5b-f678-49ef-ad8e-fe6ff85d5422@sina.com>
Date: Thu, 8 Jan 2026 18:36:04 +0800
From: zhangdongdong <zhangdongdong925@...a.com>
To: Sergey Senozhatsky <senozhatsky@...omium.org>,
 Jens Axboe <axboe@...nel.dk>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
 Richard Chang <richardycc@...gle.com>, Minchan Kim <minchan@...nel.org>,
 Brian Geffon <bgeffon@...gle.com>, David Stevens <stevensd@...gle.com>,
 linux-kernel@...r.kernel.org, linux-mm@...ck.org,
 linux-block@...r.kernel.org, Minchan Kim <minchan@...gle.com>,
 xiongping1@...omi.com, huangjianan@...omi.com, wanghui33@...omi.com
Subject: Re: [PATCHv2 1/7] zram: introduce compressed data writeback


On 1/8/26 11:39, Sergey Senozhatsky wrote:
> Hi,
> 
> On (26/01/08 10:57), zhangdongdong wrote:
>>> Do you use any strategies for writeback?  Compressed writeback
>>> is supposed to be used for apps for which latency is not critical
>>> or sensitive, because of on-demand decompression costs.
>>>
>>
>> Hi Sergey,
>>
>> Sorry for the delayed reply — I had some urgent matters come up and only
>> got back to this now ;)
> 
> No worries, you reply in a perfectly reasonable time frame.
> 
>> Yes, we do use writeback strategies on our side. The current implementation
>> focuses on batched writeback of compressed data from
>> zram, managed on a per-app / per-memcg basis. We track and control how
>> much data from each app is written back to the backing storage, with the
>> same assumption you mentioned: compressed writeback is primarily
>> intended for workloads where latency is not critical.
>>
>> Accurate prefetching on swap-in is still an open problem for us. As you
>> pointed out, both the I/O itself and on-demand decompression introduce
>> additional latency on the readback path, and minimizing their impact
>> remains challenging.
>>
>> Regarding the workqueue choice: initially we used system_dfl_wq for the
>> read/decompression path. Later, based on observed scheduling latency
>> under memory pressure, we switched to a dedicated workqueue created with
>> WQ_HIGHPRI | WQ_UNBOUND. This change helped reduce scheduling
>> interference, but it also reinforced our concern that deferring
>> decompression to a worker still adds an extra scheduling hop on the
>> swap-in path.
> 
> How bad (and often) is your memory pressure situation?  I just wonder
> if your case is an outlier, so to speak.
> 
> 
> Just thinking aloud:
> 
> I really don't see a path back to atomic zram read/write.  Those
> were very painful and problematic, I do not consider a possibility
> of re-introducing them, especially if the reason is an optional
> feature (which comp-wb is).  If we want to improve latency, we need
> to find a way to do it without going back to atomic read/write,
> assuming that latency becomes unbearable.  But at the same time under
> memory pressure everything becomes janky at some point, so I don't
> know if comp-wb latency is the biggest problem in that case.
> 
> Dunno, *maybe* we can explore a possibility of grabbing both entry-lock
> and per-CPU compression stream before we queue async bio, so that in
> the bio completion we already *sort of* have everything we need.
> However, that comes with a bunch of issues:
> 
> - the number of per-CPU compression streams is limited, naturally,
>    to the number of CPUs.  So if we have a bunch of comp-wb reads we
>    can block all other activities: normal zram reads/writes, which
>    compete for the same per-CPU compressions streams.
> 
> - this still puts atomicity requirements on the compressors.  I haven't
>    looked into, for instance, zstd *de*-compression code, but I know for
>    sure that zstd compression code allocates memory internally when
>    configured to use pre-trained CD-dictionaries, effectively making zstd
>    use GFP_ATOMIC allocations internally, if called from atomic context.
>    Do we have anything like that in decompression - I don't know.  But in
>    general we cannot be sure that all compressors work in atomic context
>    in the same way as they do in non-atomic context.
> 
> I don't know if solving it on zram side alone is possible.  Maybe we
> can get some help from the block layer: some sort of two-stage bio
> submission.  First stage: submit chained bio-s, second stage: iterate
> over all submitted and completed bio-s and decompress the data.  Again,
> just thinking out loud.
> 

Hi Sergey,

My thinking is largely aligned with yours. I agree that relying on zram
alone is unlikely to fully solve this problem, especially without going
back to atomic read/write.

Our current mitigation approach is to introduce a hook at the swap layer
and move decompression there. By doing so, decompression happens in a
fully sleepable context, which avoids the atomic-context constraints
you outlined. This helps us sidestep the core issue rather than trying
to force decompression back into zram completion paths.

For reference, this is the change we are experimenting with:
https://android-review.googlesource.com/c/kernel/common/+/3724447

I also noticed that Richard proposed a similar optimization hook recently:
https://android-review.googlesource.com/c/kernel/common/+/3730147

Regarding your question about memory pressure: our current test case
runs on an 8 GB device, with around 50 apps being launched sequentially.
This creates fairly heavy memory pressure. In earlier tests using an
async kworker-based approach, we observed an average latency of about
1.3 ms,but with tail latencies occasionally reaching 30–100 ms.

If I recall correctly, this issue first became noticeable after a block
layer change was merged; I can try to dig that up and share more details
later.

Best regards,
dongdong