lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <Z5Mr8WQGEZZjp9Uu@casper.infradead.org>
Date: Fri, 24 Jan 2025 05:58:09 +0000
From: Matthew Wilcox <willy@...radead.org>
To: Joshua Hahn <joshua.hahnjy@...il.com>
Cc: gourry@...rry.net, hyeonggon.yoo@...com, ying.huang@...ux.alibaba.com,
	rafael@...nel.org, lenb@...nel.org, gregkh@...uxfoundation.org,
	akpm@...ux-foundation.org, honggyu.kim@...com, rakie.kim@...com,
	dan.j.williams@...el.com, Jonathan.Cameron@...wei.com,
	dave.jiang@...el.com, horen.chuang@...ux.dev, hannes@...xchg.org,
	linux-kernel@...r.kernel.org, linux-acpi@...r.kernel.org,
	linux-mm@...ck.org, kernel-team@...a.com
Subject: Re: [PATCH v3] Weighted interleave auto-tuning

On Wed, Jan 15, 2025 at 10:58:54AM -0800, Joshua Hahn wrote:
> On machines with multiple memory nodes, interleaving page allocations
> across nodes allows for better utilization of each node's bandwidth.
> Previous work by Gregory Price [1] introduced weighted interleave, which
> allowed for pages to be allocated across NUMA nodes according to
> user-set ratios.

I still don't get it.  You always want memory to be on the local node or
the fabric gets horribly congested and slows you right down.  But you're
not really talking about NUMA, are you?  You're talking about CXL.

And CXL is terrible for bandwidth.  I just ran the numbers.

On a current Intel top-end CPU, we're looking at 8x DDR5-4800 DIMMs,
each with a bandwidth of 38.4GB/s for a total of 300GB/s.

For each CXL lane, you take a lane of PCIe gen5 away.  So that's
notionally 32Gbit/s, or 4GB/s per lane.  But CXL is crap, and you'll be
lucky to get 3 cachelines per 256 byte packet, dropping you down to 3GB/s.
You're not going to use all 80 lanes for CXL (presumably these CPUs are
going to want to do I/O somehow), so maybe allocate 20 of them to CXL.
That's 60GB/s, or a 20% improvement in bandwidth.  On top of that,
it's slow, with a minimum of 10ns latency penalty just from the CXL
encode/decode penalty.

Putting page cache in the CXL seems like nonsense to me.  I can see it
making sense to swap to CXL, or allocating anonymous memory for tasks
with low priority on it.  But I just can't see the point of putting
pagecache on CXL.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ