linux-kernel - Re: [PATCH 2/2] mm/mempolicy: Skip extra call to __alloc_pages

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20250627161318.1898633-1-joshua.hahnjy@gmail.com>
Date: Fri, 27 Jun 2025 09:13:18 -0700
From: Joshua Hahn <joshua.hahnjy@...il.com>
To: Gregory Price <gourry@...rry.net>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
	Alistair Popple <apopple@...dia.com>,
	Byungchul Park <byungchul@...com>,
	David Hildenbrand <david@...hat.com>,
	Matthew Brost <matthew.brost@...el.com>,
	Rakie Kim <rakie.kim@...com>,
	Ying Huang <ying.huang@...ux.alibaba.com>,
	Zi Yan <ziy@...dia.com>,
	linux-kernel@...r.kernel.org,
	linux-mm@...ck.org,
	kernel-team@...a.com
Subject: Re: [PATCH 2/2] mm/mempolicy: Skip extra call to __alloc_pages_bulk in weighted interleave

On Fri, 27 Jun 2025 00:28:33 -0400 Gregory Price <gourry@...rry.net> wrote:

Hi Gregory,

Hope you are doing well : -) Thanks for the review!

> On Thu, Jun 26, 2025 at 01:09:34PM -0700, Joshua Hahn wrote:
> > Currently, alloc_pages_bulk_weighted_interleave can make up to nr_node_ids+1
> > calls to __alloc_pages_bulk. The additional allocation can happen if the
> > previous call to this function finished the weighted round robin allocation
> > partially on a node. To make up for this, the next time this function is
> > called, an extra allocation is made to finish cleanly on the node boundaries
> > before performing the weighted round-robin cycles again.
> > 
> 
> task->il_weight can be operated on by other weighted_interleave
> functions in mempolicy, so it's not just
> alloc_pages_bulk_weighted_interleave that affects this.

This is true, and I think that's what makes testing this part of the code
tricky. I'll elaborate more below.

> Observations here are still true, just a correction for clarity.
> 
> i.e.:
> 
> The additional allocation happens for the current il_node/il_weight.
> 
> I *think* I did it this way just because it was easier to reason about
> the two chunks separately.  So I don't see a reason not to improve this.
> I will say that, at least at the time, I took the core math and
> validated the edge conditions in a separate program.
> 
> If you get it wrong in the kernel, you'd either fail to allocate - or more
> likely just get the wrong distribution of pages.  The latter is
> non-obvious unless you go looking for it, so it might be good to at
> least add this test result in the change log.  It's hard to write this
> in LTP or kernel selftest unfortunately.

I think so too : -(

One test that I wanted to do while developing this feature was to see if
I could figure out how many pages are really allocated from each node. The
difficulty in doing this (as you pointed out above) is that because there are
other ways that move the round robin forward (without necessarily calling the
bulk alloc function), it's hard to directly attribute the page allocations.

If this was the only place that we were modifying these values, then a
correctness check would be equivalent to just adding a printk of each node
and how many pages were allocated on it, then adding all the numbers up to
see if it matches the weight ratios in sysfs.

So I think I will do what you did as well -- I think that performing some
tests, at least on the edge cases in a separate program will help give
some confidence that the code works as intended. I'll also continue to think
about if there are better ways to be testing this instead.

Thanks again for reviewing this, have a great day!
Joshua

Sent using hkml (https://github.com/sjp38/hackermail)