linux-kernel - Re: [RFC PATCH v2 0/8] mm: Hot page tracking and promotion infrastructure

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251020152345.00003d61@huawei.com>
Date: Mon, 20 Oct 2025 15:23:45 +0100
From: Jonathan Cameron <jonathan.cameron@...wei.com>
To: Yiannis Nikolakopoulos <yiannis.nikolakop@...il.com>
CC: Wei Xu <weixugc@...gle.com>, David Rientjes <rientjes@...gle.com>, Gregory
 Price <gourry@...rry.net>, Matthew Wilcox <willy@...radead.org>, Bharata B
 Rao <bharata@....com>, <linux-kernel@...r.kernel.org>, <linux-mm@...ck.org>,
	<dave.hansen@...el.com>, <hannes@...xchg.org>, <mgorman@...hsingularity.net>,
	<mingo@...hat.com>, <peterz@...radead.org>, <raghavendra.kt@....com>,
	<riel@...riel.com>, <sj@...nel.org>, <ying.huang@...ux.alibaba.com>,
	<ziy@...dia.com>, <dave@...olabs.net>, <nifan.cxl@...il.com>,
	<xuezhengchu@...wei.com>, <akpm@...ux-foundation.org>, <david@...hat.com>,
	<byungchul@...com>, <kinseyho@...gle.com>, <joshua.hahnjy@...il.com>,
	<yuanchu@...gle.com>, <balbirs@...dia.com>, <alok.rathore@...sung.com>,
	<yiannis@...corp.com>, Adam Manzanares <a.manzanares@...sung.com>
Subject: Re: [RFC PATCH v2 0/8] mm: Hot page tracking and promotion
 infrastructure

On Thu, 16 Oct 2025 18:16:31 +0200
Yiannis Nikolakopoulos <yiannis.nikolakop@...il.com> wrote:

> On Thu, Sep 25, 2025 at 5:01 PM Jonathan Cameron
> <jonathan.cameron@...wei.com> wrote:
> >
> > On Thu, 25 Sep 2025 16:03:46 +0200
> > Yiannis Nikolakopoulos <yiannis.nikolakop@...il.com> wrote:
> >
> > Hi Yiannis,  
> Hi Jonathan! Thanks for your response!
> 
Hi Yiannis,

This is way more fun than doing real work ;)

> [snip]
> > > There are several things that may be done on the device side. For now, I
> > > think the kernel should be unaware of these. But with what I described
> > > above, the goal is to have the capacity thresholds configured in a way
> > > that we can absorb the occasional dirty cache lines that are written back.  
> >
> > In worst case they are far from occasional. It's not hard to imagine a malicious  
> This is correct. Any simplification on my end is mainly based on the
> empirical evidence of the use cases we are testing for (tiering). But
> I fully respect that we need to be proactive and assume the worst case
> scenario.
> > program that ensures that all L3 in a system (say 256MiB+) is full of cache lines
> > from the far compressed memory all of which are changed in a fashion that makes
> > the allocation much less compressible.  If you are doing compression at cache line
> > granularity that's not so bad because it would only be 256MiB margin needed.
> > If the system in question is doing large block side compression, say 4KiB.
> > Then we have a 64x write amplification multiplier. If the virus is streaming over  
> This is insightful indeed :). However, even in the case of the 64x
> amplification, you implicitly assume that each of the cachelines in
> the L3 belongs to a different page. But then one cache-line would not
> deteriorate the compressed size of the entire page that much (the
> bandwidth amplification on the device is a different -performance-
> story).

This is putting limits on what compression algorithm is used. We could do
that but then we'd have to never support anything different. Maybe if the
device itself provided the worse case amplification numbers that would do
Any device that gets this wrong is buggy - but it might be hard to detect
that if people don't publish their compression algs and the proofs of worst
case blow up of compression blocks.

I guess we could do the maths on what the device manufacturer says and
if we don't believe them or they haven't provided enough info to check,
double it :)

> So even in the 4K case the two ends of the spectrum are to
> either have big amplification with low compression ratio impact, or
> small amplification with higher compression ratio impact.
> Another practical assumption here, is that the different HMU
> mechanisms would help promote the contended pages before this becomes
> a big issue. Which of course might still not be enough on the
> malicious streaming writes workload.

Using promotion to get you out of this is a non starter unless you have
a backstop because we'll have annoying things like pinning going on or
bandwidth bottlenecks at the promotion target.
Promotion might massively reduce the performance impact of course under
normal conditions.

> Overall, I understand these are heuristics and I do see your point
> that this needs to be robust even for the maliciously behaving
> programs.
> > memory the evictions we are seeing at the result of new lines being fetched
> > to be made much less compressible.
> >
> > Add a accelerator (say DPDK or other zero copy into userspace buffers) into the
> > mix and you have a mess. You'll need to be extremely careful with what goes  
> Good point about the zero copy stuff.
> > in this compressed memory or hold enormous buffer capacity against fast
> > changes in compressability.  
> To my experience the factor of buffer capacity would be closer to the
> benefit that you get from the compression (e.g. 2x the cache size in
> your example).
> But I understand the burden of proof is on our end. As we move further
> with this I will try to provide data as well.

If we are aiming for generality the nasty problem is that either we have to
write rules on what Linux will cope with, or design it to cope with the
worse possible implementation :(

I can think of lots of plausible sounding cases that have horrendous
multiplication factors if done in a naive fashion. 
* De-duplication
* Metadata flag for all 0s
* Some general purpose compression algs are very vulnerable to the tails
  of the probability distributions.  Some will flip between multiple modes
  with very different characteristics, perhaps to meet latency guarantees.

Would be fun to ask an information theorist / compression expert to lay
out an algorithm with the worst possible tail performance but with good
average.



> >
> > Key is that all software is potentially malicious (sometimes accidentally so ;)
> >
> > Now, if we can put this into a special pool where it is acceptable to drop the writes
> > and return poison (so the application crashes) then that may be fine.
> >
> > Or block writes.   Running compressed memory as read only CoW is one way to
> > avoid this problem.  
> These could be good starting points, as I see in the rest of the thread.
> 
Fun problems.  Maybe we start with very conservative handling and then
argue for relaxations later.

Jonathan

> Thanks,
> Yiannis