lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 31 Oct 2023 10:53:41 +0100
From:   Michal Hocko <mhocko@...e.com>
To:     Gregory Price <gourry.memverge@...il.com>
Cc:     linux-kernel@...r.kernel.org, linux-cxl@...r.kernel.org,
        linux-mm@...ck.org, ying.huang@...el.com,
        akpm@...ux-foundation.org, aneesh.kumar@...ux.ibm.com,
        weixugc@...gle.com, apopple@...dia.com, hannes@...xchg.org,
        tim.c.chen@...el.com, dave.hansen@...el.com, shy828301@...il.com,
        gregkh@...uxfoundation.org, rafael@...nel.org,
        Gregory Price <gregory.price@...verge.com>
Subject: Re: [RFC PATCH v3 0/4] Node Weights and Weighted Interleave

On Mon 30-10-23 20:38:06, Gregory Price wrote:
> This patchset implements weighted interleave and adds a new sysfs
> entry: /sys/devices/system/node/nodeN/accessM/il_weight.
> 
> The il_weight of a node is used by mempolicy to implement weighted
> interleave when `numactl --interleave=...` is invoked.  By default
> il_weight for a node is always 1, which preserves the default round
> robin interleave behavior.
> 
> Interleave weights may be set from 0-100, and denote the number of
> pages that should be allocated from the node when interleaving
> occurs.
> 
> For example, if a node's interleave weight is set to 5, 5 pages
> will be allocated from that node before the next node is scheduled
> for allocations.

I find this semantic rather weird TBH. First of all why do you think it
makes sense to have those weights global for all users? What if
different applications have different view on how to spred their
interleaved memory?

I do get that you might have a different tiers with largerly different
runtime characteristics but why would you want to interleave them into a
single mapping and have hard to predict runtime behavior?

[...]
> In this way it becomes possible to set an interleaving strategy
> that fits the available bandwidth for the devices available on
> the system. An example system:
> 
> Node 0 - CPU+DRAM, 400GB/s BW (200 cross socket)
> Node 1 - CPU+DRAM, 400GB/s BW (200 cross socket)
> Node 2 - CXL Memory. 64GB/s BW, on Node 0 root complex
> Node 3 - CXL Memory. 64GB/s BW, on Node 1 root complex
> 
> In this setup, the effective weights for nodes 0-3 for a task
> running on Node 0 may be [60, 20, 10, 10].
> 
> This spreads memory out across devices which all have different
> latency and bandwidth attributes at a way that can maximize the
> available resources.

OK, so why is this any better than not using any memory policy rely
on demotion to push out cold memory down the tier hierarchy?

What is the actual real life usecase and what kind of benefits you can
present?
-- 
Michal Hocko
SUSE Labs

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ