linux-kernel - Re: [LSF/MM/BPF TOPIC] Weighted interleave auto-tuning

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20250313155705.1943522-1-joshua.hahnjy@gmail.com>
Date: Thu, 13 Mar 2025 08:57:04 -0700
From: Joshua Hahn <joshua.hahnjy@...il.com>
To: Joshua Hahn <joshua.hahnjy@...il.com>
Cc: lsf-pc@...ts.linux-foundation.org,
	linux-mm@...ck.org,
	linux-kernel@...r.kernel.org,
	gourry@...rry.net,
	ying.huang@...ux.alibaba.com,
	hyeonggon.yoo@...com,
	honggyu.kim@...com,
	kernel-team@...a.com
Subject: Re: [LSF/MM/BPF TOPIC] Weighted interleave auto-tuning

On Thu,  9 Jan 2025 13:50:48 -0500 Joshua Hahn <joshua.hahnjy@...il.com> wrote:

> Hello everyone, I hope everyone has had a great start to 2025!
> 
> Recently, I have been working on a patch series [1] with
> Gregory Price <gourry@...rry.net> that provides new default interleave
> weights, along with dynamic re-weighting on hotplug events and a series
> of UAPIs that allow users to configure how they want the defaults to behave.
> 
> In introducing these new defaults, discussions have opened up in the
> community regarding how best to create a UAPI that can provide
> coherent and transparent interactions for the user. In particular, consider
> this scenario: when a hotplug event happens and a node comes online
> with new bandwidth information (and therefore changing the bandwidth
> distributions across the system), should user-set weights be overwritten
> to reflect the new distributions? If so, how can we justify overwriting
> user-set values in a sysfs interface? If not, how will users manually
> adjust the node weights to the optimal weight?
> 
> I would like to revisit some of the design choices made for this patch,
> including how the defaults were derived, and open the conversation to
> hear what the community believes is a reasonable way to allow users to
> tune weighted interleave weights. More broadly, I hope to get gather
> community insight on how they use weighted interleave, and do my best to
> reflect those workflows in the patch.

Weighted interleave has since moved onto v7 [1], and a v8 is currently being
drafted. Through feedback from reviewers, we have landed on a coherent UAPI
that gives users two options: auto mode, which leaves all weight calculation
decisions to the system, and manual mode, which leaves weighted interleave
the same as it is without the patch.

Given that the patch's functionality is mostly concrete and that the questions
I hoped to raise during this slot were answered via patch feedback, I hope to
ask another question during the talk:

Should the system dynamically change what metrics it uses to weight the nodes,
based on what bottlenecks the system is currently facing?

In the patch, min(read_bandwidth, write_bandwidth) is used as the heuristic
to determine what a node's weight should be. However, what if the system is
not bottlenecked by bandwidth, but by latency? A system could also be
bottlenecked by read bandwidth, but not by write bandwidth.

Consider a scenario where a system has many memory nodes with varying
latencies and bandwidths. When the system is not bottlenecked by bandwidth,
it might prefer to allocate memory from nodes with lower latency. Once the
system starts feeling pressured by bandwidth, the weights for high bandwidth
(but also high latency) nodes would slowly increase to alleviate pressure
from the system. Once the system is back in a manageable state, weights for
low latency nodes would start increasing again. Users would not have to be
aware of any of this -- they would just see the system take control of the
weight changes as the system's needs continue to change.

This proposal also has some concerns that need to be addressed:
- How reactive should the system be, and how aggressively should it tune the
  weights? We don't want the system to overreact to short spikes in pressure.
- Does dynamic weight adjusting lead to pages being "misplaced"? Should those
  "misplaced" pages be migrated? (probably not)
- Does this need to be in the kernel? A userspace daemon that monitors kernel
  metrics has the ability to make the changes (via the nodeN interfaces).

Thoughts & comments are appreciated! Thank you, and have a great day!
Joshua

[1] https://lore.kernel.org/all/20250305200506.2529583-1-joshua.hahnjy@gmail.com/

Sent using hkml (https://github.com/sjp38/hackermail)