lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <83bebb7f-f157-4179-b7ec-b25b2ee4270d@lucifer.local>
Date: Wed, 8 Jan 2025 21:46:31 +0000
From: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
To: lsf-pc@...ts.linux-foundation.org
Cc: linux-mm@...ck.org, linux-kernel@...r.kernel.org,
        Matthew Wilcox <willy@...radead.org>,
        Axel Rasmussen <axelrasmussen@...gle.com>,
        Guru Anbalagane <gurua@...gle.com>, Wei Xu <weixugc@...gle.com>,
        Yuanchu Xie <yuanchu@...gle.com>
Subject: [LSF/MM/BPF TOPIC] Physical LRU scanning feasibility

Hi all,

Not too long ago I took some time to investigate the possibility of
scanning physical memory directly by traversing the memory map directly
rather than the LRU linked list.

This was inspired by a post from Matthew [0] wherein he demonstrated just
how significant the difference is between traversing arrays of contiguous
data on a modern system vs. the almost worst-case scenario of traversing a
linked-list.

I tested how this might look by implementing code which simply traverses
and filters the memory map for LRU pages, simplifying as much as possible.

However no matter which machine (ranging from 16 GB - 192 GB) or whether
virtualised or real hardware, I found unfortunately disappointing results -
the act of having to scan such a large range of memory resulted in
performance significantly less than a typical LRU scan at low memory
utilisation and performance at best matching LRU scanning at high memory
utilisation (simulating higher memory pressure).

There are a number of factors at play here, and perhaps the shrinkage of
struct page (allowing for denser placement in cache lines), or an improved
algorithm might lead to more promising results.

Having discussed this with Matthew, he suggested I put forward a proposal
to discuss this area in order that we can learn from this should it appear
this approach is unworkable or perhaps determine whether there might be
something to this that we might still salvage.

I intend to do some more research and generate some more specific numbers
(feel free to give feedback here) before LSF so we can have something more
specific to talk about.

I always envisioned this approach being somehow integrated with MGLRU and I
wonder if some hybrid means of integrating this approach with the MGLRU one
might make sense, which could also be another area of discussion.

Thanks!

[0]:https://lore.kernel.org/linux-mm/ZTc7SHQ4RbPkD3eZ@casper.infradead.org/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ