lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <62990a3f-524f-4362-8f64-2fc582986eba@redhat.com>
Date: Sun, 4 May 2025 10:58:44 +0200
From: David Hildenbrand <david@...hat.com>
To: Andrew Morton <akpm@...ux-foundation.org>
Cc: Petr Vaněk <arkamar@...as.cz>, linux-mm@...ck.org,
 linux-kernel@...r.kernel.org, Ryan Roberts <ryan.roberts@....com>,
 xen-devel@...ts.xenproject.org, x86@...nel.org, stable@...r.kernel.org
Subject: Re: [PATCH v2 1/1] mm: fix folio_pte_batch() on XEN PV

On 04.05.25 09:15, Andrew Morton wrote:
> On Sun, 4 May 2025 08:47:45 +0200 David Hildenbrand <david@...hat.com> wrote:
> 
>>>
>>> Methinks max_nr really wants to be unsigned long.
>>
>> We only batch within a single PTE table, so an integer was sufficient.
>>
>> The unsigned value is the result of a discussion with Ryan regarding similar/related
>> (rmap) functions:
>>
>> "
>> Personally I'd go with signed int (since
>> that's what all the counters in struct folio that we are manipulating are,
>> underneath the atomic_t) then check that nr_pages > 0 in
>> __folio_rmap_sanity_checks().
>> "
>>
>> https://lore.kernel.org/linux-mm/20231204142146.91437-14-david@redhat.com/T/#ma0bfff0102f0f2391dfa94aa22a8b7219b92c957
>>
>> As soon as we let "max_nr" be an "unsigned long", also the return value
>> should be an "unsigned long", and everybody calling that function.
>>
>> In this case here, we should likely just use whatever type "max_nr" is.
>>
>> Not sure myself if we should change that here to unsigned long or long. Some
>> callers also operate with the negative values IIRC (e.g., adjust the RSS by doing -= nr).
> 
> "rss -= nr" doesn't require, expect or anticipate that `nr' can be negative!

The one thing I ran into with "unsigned int" around folio_nr_pages()
was that if you pass

-folio-nr_pages()

into a function that expects an "long" (add vs. remove a value to a counter), then
the result might not be what one would expect when briefly glimpsing at the code:

#include <stdio.h>

static __attribute__((noinline)) void print(long diff)
{
         printf("%ld\n", diff);
}

static int value_int()
{
         return 12345;
}

static unsigned int value_unsigned_int()
{
         return 12345;
}

static int value_long()
{
         return 12345;
}

static unsigned long value_unsigned_long()
{
         return 12345;
}

int main(void)
{
         print(-value_int());
         print(-value_unsigned_int());
         print(-value_long());
         print(-value_unsigned_long());
         return 0;
}


$ ./tmp
-12345
4294954951
-12345
-12345

So, I am fine with using "unsigned long" (as stated in that commit description below).

> 
>>
>>> That will permit the
>>> cleanup of quite a bit of truncation, extension, signedness conversion
>>> and general type chaos in folio_pte_batch()'s various callers.
>>>> And...
>>>
>>> Why does folio_nr_pages() return a signed quantity?  It's a count.
>>
>> A partial answer is in 1ea5212aed068 ("mm: factor out large folio handling
>> from folio_nr_pages() into folio_large_nr_pages()"), where I stumbled over the
>> reason for a signed value myself and at least made the other
>> functions be consistent with folio_nr_pages():
>>
>> "
>>       While at it, let's consistently return a "long" value from all these
>>       similar functions.  Note that we cannot use "unsigned int" (even though
>>       _folio_nr_pages is of that type), because it would break some callers that
>>       do stuff like "-folio_nr_pages()".  Both "int" or "unsigned long" would
>>       work as well.
>>
>> "
>>
>> Note that folio_nr_pages() returned a "long" since the very beginning. Probably using
>> a signed value for consistency because also mapcounts / refcounts are all signed.
> 
> Geeze.
> 
> Can we step back and look at what we're doing?  Anything which counts
> something (eg, has "nr" in the identifier) cannot be negative.

Yes. Unless we want to catch underflows (e.g., mapcount / refcount). For "nr_pages" I agree.

> 
> It's that damn "int" thing.  I think it was always a mistake that the C
> language's go-to type is a signed one. 

Yeah. But see above that "unsigned int" in combination with long can also cause pain.

> It's a system programming
> language and system software rarely deals with negative scalars.
> Signed scalars are the rare case.
> 
> I do expect that the code in and around here would be cleaner and more
> reliable if we were to do a careful expunging of inappropriately signed
> variables.

Maybe, but it would mostly be a "int -> unsigned long" conversion, probably not
much more. I'm not against cleaning that up at all.

> 
>>
>>>
>>> And why the heck is folio_pte_batch() inlined?  It's larger then my
>>> first hard disk and it has five callsites!
>>
>> :)
>>
>> In case of fork/zap we really want it inlined because
>>
>> (1) We want to optimize out all of the unnecessary checks we added for other users
>>
>> (2) Zap/fork code is very sensitive to function call overhead
>>
>> Probably, as that function sees more widespread use, we might want a
>> non-inlined variant that can be used in places where performance doesn't
>> matter all that much (although I am not sure there will be that many).
> 
> a quick test.
> 
> before:
>     text	   data	    bss	    dec	    hex	filename
>    12380	    470	      0	  12850	   3232	mm/madvise.o
>    52975	   2689	     24	  55688	   d988	mm/memory.o
>    25305	   1448	   2096	  28849	   70b1	mm/mempolicy.o
>     8573	    924	      4	   9501	   251d	mm/mlock.o
>    20950	   5864	     16	  26830	   68ce	mm/rmap.o
> 
>   (120183)
> 
> after:
> 
>     text	   data	    bss	    dec	    hex	filename
>    11916	    470	      0	  12386	   3062	mm/madvise.o
>    52990	   2697	     24	  55711	   d99f	mm/memory.o
>    25161	   1448	   2096	  28705	   7021	mm/mempolicy.o
>     8381	    924	      4	   9309	   245d	mm/mlock.o
>    20806	   5864	     16	  26686	   683e	mm/rmap.o
> 
>   (119254)
> 
> so uninlining saves a kilobyte of text - less than I expected but
> almost 1%.

As I said, for fork+zap/unmap we really want to inline -- the first two users
of that function when that function was still simpler and resided in mm/memory.o. For
the other users, probably okay to have a non-inlined one in mm/util.c .

-- 
Cheers,

David / dhildenb


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ