lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250627161318.1898633-1-joshua.hahnjy@gmail.com>
Date: Fri, 27 Jun 2025 09:13:18 -0700
From: Joshua Hahn <joshua.hahnjy@...il.com>
To: Gregory Price <gourry@...rry.net>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
	Alistair Popple <apopple@...dia.com>,
	Byungchul Park <byungchul@...com>,
	David Hildenbrand <david@...hat.com>,
	Matthew Brost <matthew.brost@...el.com>,
	Rakie Kim <rakie.kim@...com>,
	Ying Huang <ying.huang@...ux.alibaba.com>,
	Zi Yan <ziy@...dia.com>,
	linux-kernel@...r.kernel.org,
	linux-mm@...ck.org,
	kernel-team@...a.com
Subject: Re: [PATCH 2/2] mm/mempolicy: Skip extra call to __alloc_pages_bulk in weighted interleave

On Fri, 27 Jun 2025 00:28:33 -0400 Gregory Price <gourry@...rry.net> wrote:

Hi Gregory,

Hope you are doing well : -) Thanks for the review!

> On Thu, Jun 26, 2025 at 01:09:34PM -0700, Joshua Hahn wrote:
> > Currently, alloc_pages_bulk_weighted_interleave can make up to nr_node_ids+1
> > calls to __alloc_pages_bulk. The additional allocation can happen if the
> > previous call to this function finished the weighted round robin allocation
> > partially on a node. To make up for this, the next time this function is
> > called, an extra allocation is made to finish cleanly on the node boundaries
> > before performing the weighted round-robin cycles again.
> > 
> 
> task->il_weight can be operated on by other weighted_interleave
> functions in mempolicy, so it's not just
> alloc_pages_bulk_weighted_interleave that affects this.

This is true, and I think that's what makes testing this part of the code
tricky. I'll elaborate more below.

> Observations here are still true, just a correction for clarity.
> 
> i.e.:
> 
> The additional allocation happens for the current il_node/il_weight.
> 
> I *think* I did it this way just because it was easier to reason about
> the two chunks separately.  So I don't see a reason not to improve this.
> I will say that, at least at the time, I took the core math and
> validated the edge conditions in a separate program.
> 
> If you get it wrong in the kernel, you'd either fail to allocate - or more
> likely just get the wrong distribution of pages.  The latter is
> non-obvious unless you go looking for it, so it might be good to at
> least add this test result in the change log.  It's hard to write this
> in LTP or kernel selftest unfortunately.

I think so too : -(

One test that I wanted to do while developing this feature was to see if
I could figure out how many pages are really allocated from each node. The
difficulty in doing this (as you pointed out above) is that because there are
other ways that move the round robin forward (without necessarily calling the
bulk alloc function), it's hard to directly attribute the page allocations.

If this was the only place that we were modifying these values, then a
correctness check would be equivalent to just adding a printk of each node
and how many pages were allocated on it, then adding all the numbers up to
see if it matches the weight ratios in sysfs.

So I think I will do what you did as well -- I think that performing some
tests, at least on the edge cases in a separate program will help give
some confidence that the code works as intended. I'll also continue to think
about if there are better ways to be testing this instead.

Thanks again for reviewing this, have a great day!
Joshua

Sent using hkml (https://github.com/sjp38/hackermail)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ