[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <62990a3f-524f-4362-8f64-2fc582986eba@redhat.com>
Date: Sun, 4 May 2025 10:58:44 +0200
From: David Hildenbrand <david@...hat.com>
To: Andrew Morton <akpm@...ux-foundation.org>
Cc: Petr Vaněk <arkamar@...as.cz>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org, Ryan Roberts <ryan.roberts@....com>,
xen-devel@...ts.xenproject.org, x86@...nel.org, stable@...r.kernel.org
Subject: Re: [PATCH v2 1/1] mm: fix folio_pte_batch() on XEN PV
On 04.05.25 09:15, Andrew Morton wrote:
> On Sun, 4 May 2025 08:47:45 +0200 David Hildenbrand <david@...hat.com> wrote:
>
>>>
>>> Methinks max_nr really wants to be unsigned long.
>>
>> We only batch within a single PTE table, so an integer was sufficient.
>>
>> The unsigned value is the result of a discussion with Ryan regarding similar/related
>> (rmap) functions:
>>
>> "
>> Personally I'd go with signed int (since
>> that's what all the counters in struct folio that we are manipulating are,
>> underneath the atomic_t) then check that nr_pages > 0 in
>> __folio_rmap_sanity_checks().
>> "
>>
>> https://lore.kernel.org/linux-mm/20231204142146.91437-14-david@redhat.com/T/#ma0bfff0102f0f2391dfa94aa22a8b7219b92c957
>>
>> As soon as we let "max_nr" be an "unsigned long", also the return value
>> should be an "unsigned long", and everybody calling that function.
>>
>> In this case here, we should likely just use whatever type "max_nr" is.
>>
>> Not sure myself if we should change that here to unsigned long or long. Some
>> callers also operate with the negative values IIRC (e.g., adjust the RSS by doing -= nr).
>
> "rss -= nr" doesn't require, expect or anticipate that `nr' can be negative!
The one thing I ran into with "unsigned int" around folio_nr_pages()
was that if you pass
-folio-nr_pages()
into a function that expects an "long" (add vs. remove a value to a counter), then
the result might not be what one would expect when briefly glimpsing at the code:
#include <stdio.h>
static __attribute__((noinline)) void print(long diff)
{
printf("%ld\n", diff);
}
static int value_int()
{
return 12345;
}
static unsigned int value_unsigned_int()
{
return 12345;
}
static int value_long()
{
return 12345;
}
static unsigned long value_unsigned_long()
{
return 12345;
}
int main(void)
{
print(-value_int());
print(-value_unsigned_int());
print(-value_long());
print(-value_unsigned_long());
return 0;
}
$ ./tmp
-12345
4294954951
-12345
-12345
So, I am fine with using "unsigned long" (as stated in that commit description below).
>
>>
>>> That will permit the
>>> cleanup of quite a bit of truncation, extension, signedness conversion
>>> and general type chaos in folio_pte_batch()'s various callers.
>>>> And...
>>>
>>> Why does folio_nr_pages() return a signed quantity? It's a count.
>>
>> A partial answer is in 1ea5212aed068 ("mm: factor out large folio handling
>> from folio_nr_pages() into folio_large_nr_pages()"), where I stumbled over the
>> reason for a signed value myself and at least made the other
>> functions be consistent with folio_nr_pages():
>>
>> "
>> While at it, let's consistently return a "long" value from all these
>> similar functions. Note that we cannot use "unsigned int" (even though
>> _folio_nr_pages is of that type), because it would break some callers that
>> do stuff like "-folio_nr_pages()". Both "int" or "unsigned long" would
>> work as well.
>>
>> "
>>
>> Note that folio_nr_pages() returned a "long" since the very beginning. Probably using
>> a signed value for consistency because also mapcounts / refcounts are all signed.
>
> Geeze.
>
> Can we step back and look at what we're doing? Anything which counts
> something (eg, has "nr" in the identifier) cannot be negative.
Yes. Unless we want to catch underflows (e.g., mapcount / refcount). For "nr_pages" I agree.
>
> It's that damn "int" thing. I think it was always a mistake that the C
> language's go-to type is a signed one.
Yeah. But see above that "unsigned int" in combination with long can also cause pain.
> It's a system programming
> language and system software rarely deals with negative scalars.
> Signed scalars are the rare case.
>
> I do expect that the code in and around here would be cleaner and more
> reliable if we were to do a careful expunging of inappropriately signed
> variables.
Maybe, but it would mostly be a "int -> unsigned long" conversion, probably not
much more. I'm not against cleaning that up at all.
>
>>
>>>
>>> And why the heck is folio_pte_batch() inlined? It's larger then my
>>> first hard disk and it has five callsites!
>>
>> :)
>>
>> In case of fork/zap we really want it inlined because
>>
>> (1) We want to optimize out all of the unnecessary checks we added for other users
>>
>> (2) Zap/fork code is very sensitive to function call overhead
>>
>> Probably, as that function sees more widespread use, we might want a
>> non-inlined variant that can be used in places where performance doesn't
>> matter all that much (although I am not sure there will be that many).
>
> a quick test.
>
> before:
> text data bss dec hex filename
> 12380 470 0 12850 3232 mm/madvise.o
> 52975 2689 24 55688 d988 mm/memory.o
> 25305 1448 2096 28849 70b1 mm/mempolicy.o
> 8573 924 4 9501 251d mm/mlock.o
> 20950 5864 16 26830 68ce mm/rmap.o
>
> (120183)
>
> after:
>
> text data bss dec hex filename
> 11916 470 0 12386 3062 mm/madvise.o
> 52990 2697 24 55711 d99f mm/memory.o
> 25161 1448 2096 28705 7021 mm/mempolicy.o
> 8381 924 4 9309 245d mm/mlock.o
> 20806 5864 16 26686 683e mm/rmap.o
>
> (119254)
>
> so uninlining saves a kilobyte of text - less than I expected but
> almost 1%.
As I said, for fork+zap/unmap we really want to inline -- the first two users
of that function when that function was still simpler and resided in mm/memory.o. For
the other users, probably okay to have a non-inlined one in mm/util.c .
--
Cheers,
David / dhildenb
Powered by blists - more mailing lists