lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <97c19219-6055-46ae-865a-2833d8367db0@gmail.com>
Date: Mon, 1 Sep 2025 17:58:39 +0100
From: Usama Arif <usamaarif642@...il.com>
To: Alejandro Colomar <alx@...nel.org>
Cc: linux-man@...r.kernel.org, david@...hat.com, lorenzo.stoakes@...cle.com,
 hannes@...xchg.org, baohua@...nel.org, shakeel.butt@...ux.dev,
 ziy@...dia.com, laoar.shao@...il.com, baolin.wang@...ux.alibaba.com,
 Liam.Howlett@...cle.com, linux-kernel@...r.kernel.org, kernel-team@...a.com
Subject: Re: [PATCH] PR_*ET_THP_DISABLE.2const: document addition of
 PR_THP_DISABLE_EXCEPT_ADVISED



On 01/09/2025 17:36, Alejandro Colomar wrote:
> Hi Usama,
> 
> On Mon, Sep 01, 2025 at 05:09:03PM +0100, Usama Arif wrote:
>> PR_THP_DISABLE_EXCEPT_ADVISED extended PR_SET_THP_DISABLE to only provide
>> THPs when advised. IOW, it allows individual processes to opt-out of THP =
>> "always" into THP = "madvise", without affecting other workloads on the
>> system. The series has been merged in [1].
>>
>> This patch documents the changes introduced due to the addition of
>> PR_THP_DISABLE_EXCEPT_ADVISED flag:
>> - PR_GET_THP_DISABLE returns a value whose bits indicate how THP-disable
>>   is configured for the calling thread (with or without
>>   PR_THP_DISABLE_EXCEPT_ADVISED).
>> - PR_SET_THP_DISABLE now uses arg3 to specify whether to disable THP
>>   completely for the process, or disable except madvise
>>   (PR_THP_DISABLE_EXCEPT_ADVISED).
>>
>> [1] https://lore.kernel.org/all/20250815135549.130506-1-usamaarif642@gmail.com/
>>
>> Signed-off-by: Usama Arif <usamaarif642@...il.com>
> 
> Thanks for writing the patch!  Please see some comments below.


Thanks for the quick review! Its my first time writing a man page so was apologies
if there were some basic mistakes in formatting.

> 
>> ---
>>  man/man2/madvise.2                      |  4 +-
>>  man/man2const/PR_GET_THP_DISABLE.2const | 18 ++++++---
>>  man/man2const/PR_SET_THP_DISABLE.2const | 52 +++++++++++++++++++++----
>>  3 files changed, 61 insertions(+), 13 deletions(-)
>>
>> diff --git a/man/man2/madvise.2 b/man/man2/madvise.2
>> index 10cc21fa4..6a5290f67 100644
>> --- a/man/man2/madvise.2
>> +++ b/man/man2/madvise.2
>> @@ -373,7 +373,9 @@ nor can it be stack memory or backed by a DAX-enabled device
>>  (unless the DAX device is hot-plugged as System RAM).
>>  The process must also not have
>>  .B PR_SET_THP_DISABLE
>> -set (see
>> +set without the
>> +.B PR_THP_DISABLE_EXCEPT_ADVISED
>> +flag (see
>>  .BR prctl (2)).
> 
> Double negation is confusing.  Please rephrase to something like
> 
> 	The process can have X set
> 	only if Y is also set.
> 

Yes, makes sense, will change to belwow in the next revision:

The process can have
.B PR_SET_THP_DISABLE
set only if
.B PR_THP_DISABLE_EXCEPT_ADVISED
flag is set (see
.BR prctl (2)).

>>  .IP
>>  The
>> diff --git a/man/man2const/PR_GET_THP_DISABLE.2const b/man/man2const/PR_GET_THP_DISABLE.2const
>> index 38ff3b370..df239700f 100644
>> --- a/man/man2const/PR_GET_THP_DISABLE.2const
>> +++ b/man/man2const/PR_GET_THP_DISABLE.2const
>> @@ -6,7 +6,7 @@
>>  .SH NAME
>>  PR_GET_THP_DISABLE
>>  \-
>> -get the state of the "THP disable" flag for the calling thread
>> +get the state of the "THP disable" flags for the calling thread
>>  .SH LIBRARY
>>  Standard C library
>>  .RI ( libc ,\~ \-lc )
>> @@ -18,13 +18,21 @@ Standard C library
>>  .B int prctl(PR_GET_THP_DISABLE, 0L, 0L, 0L, 0L);
>>  .fi
>>  .SH DESCRIPTION
>> -Return the current setting of
>> -the "THP disable" flag for the calling thread:
>> -either 1, if the flag is set, or 0, if it is not.
>> +Returns a value whose bits indicate how THP-disable is configured
> 
> s/Returns/Return/
> 

ack

>> +for the calling thread.
>> +The returned value is interpreted as follows:
>> +.P
>> +.nf
>> +.B "Bits"
>> +.B " 1 0  Value  Description"
>> + 0 0    0    No THP-disable behaviour specified.
>> + 0 1    1    THP is entirely disabled for this process.
>> + 1 1    3    THP-except-advised mode is set for this process.
> 
> We should probably use a table with .TS/.TE.  See examples of this in
> other manual pages for how to use that (or read tbl(1) if you want).
> 
> If you don't know how to use that, I can do it myself.  tbl(1) is a bit
> weird.


I tried below, and it seemed to look ok in the output, but please let me know if
its ok:

.TS
allbox;
cb cb cb l
c c c l.
Bit 1	Bit 0	Value	Description
0	0	0	No THP-disable behaviour specified.
0	1	1	THP is entirely disabled for this process.
1	1	3	THP-except-advised mode is set for this process.
.TE


> 
>> +.fi
>>  .SH RETURN VALUE
>>  On success,
>>  .BR PR_GET_THP_DISABLE ,
>> -returns the boolean value described above.
>> +returns the value described above.
>>  On error, \-1 is returned, and
>>  .I errno
>>  is set to indicate the error.
>> diff --git a/man/man2const/PR_SET_THP_DISABLE.2const b/man/man2const/PR_SET_THP_DISABLE.2const
>> index 564e005d4..9f0f17702 100644
>> --- a/man/man2const/PR_SET_THP_DISABLE.2const
>> +++ b/man/man2const/PR_SET_THP_DISABLE.2const
>> @@ -6,7 +6,7 @@
>>  .SH NAME
>>  PR_SET_THP_DISABLE
>>  \-
>> -set the state of the "THP disable" flag for the calling thread
>> +set the state of the "THP disable" flags for the calling thread
>>  .SH LIBRARY
>>  Standard C library
>>  .RI ( libc ,\~ \-lc )
>> @@ -15,24 +15,62 @@ Standard C library
>>  .BR "#include <linux/prctl.h>" "  /* Definition of " PR_* " constants */"
>>  .B #include <sys/prctl.h>
>>  .P
>> -.BI "int prctl(PR_SET_THP_DISABLE, long " flag ", 0L, 0L, 0L);"
>> +.BI "int prctl(PR_SET_THP_DISABLE, long " thp_disable ", unsigned long " flags ", 0L, 0L);"
> 
> Hmmm, I'm reading this weirdly.
> 
> Old code doing prctl(PR_SET_THP_DIABLE, 1, 0L, 0L, 0L); would be
> transformed from setting the flag before, to now using 0L as flags?
> 
> Or how is backwards compatibility handled?
> 


Its still backwards compatible. The name of the arguments is changed, but the arg values have not.
Before you could do 2 things:

prctl(PR_SET_THP_DISABLE, 0, 0, 0, 0); // to reset THP setting.
prctl(PR_SET_THP_DISABLE, 1, 0, 0, 0); // to disable THPs completely.

Now in addition to the 2 calls above, you can do:
prctl(PR_SET_THP_DISABLE, 1, PR_THP_DISABLE_EXCEPT_ADVISED, 0, 0); // to disable THPs except madvise.


Before arg2 was called flags and arg3 was always 0.
Now arg2 is called thp_disable, and arg3 is called flags.


>>  .fi
>>  .SH DESCRIPTION
>> -Set the state of the "THP disable" flag for the calling thread.
>> +Set the state of the "THP disable" flags for the calling thread.
>>  If
>> -.I flag
>> -has a nonzero value, the flag is set, otherwise it is cleared.
>> +.I thp_disable
>> +has a nonzero value, the THP disable flag is set according to the value of
> 
> Please break the line after the comma.
> 

ack

>> +.I flags,
>> +otherwise it is cleared.
>>  .P
>> -Setting this flag provides a method
>> +This
>> +.BR prctl (2)
>> +provides a method
>>  for disabling transparent huge pages
>>  for jobs where the code cannot be modified,
>>  and using a malloc hook with
>>  .BR madvise (2)
>>  is not an option (i.e., statically allocated data).
>> -The setting of the "THP disable" flag is inherited by a child created via
>> +The setting of the "THP disable" flags is inherited by a child created via
>>  .BR fork (2)
>>  and is preserved across
>>  .BR execve (2).
>> +.P
>> +The behavior depends on the value of
>> +.IR flags:
>> +.TP
>> +.B 0
>> +The
>> +.BR prctl (2)
>> +call will disable THPs completely for the process,
>> +irrespective of global THP controls or
>> +.BR MADV_COLLAPSE .
>> +.TP
>> +.B PR_THP_DISABLE_EXCEPT_ADVISED
>> +The
>> +.BR prctl (2)
>> +call will disable THPs for the process except when the usage of THPs is
>> +advised.
> 
> Please break the line before 'except'.  See 'Use semantic newlines'
> in man-pages(7).

ack> 
>> +Consequently, THPs will only be used when:
>> +.RS
>> +.IP \[bu] 2
> 
> s/2/3/

ack

> 
> See man-pages(7) ("Lists"):
> 
>        There should always be exactly 2 spaces between the list  symbol
>        and  the  elements.   This doesn't apply to "tagged paragraphs",
>        which use the default indentation rules.
> 
> (If you grep(1) around, you'll see that number everywhere.)
> 
>> +Global THP controls are set to "always" or "madvise" and
>> +.BR madvise (...,
>> +.BR MADV_HUGEPAGE )
> 
> I'd say
> 
> 	.I madvise(..., MADV_HUGEPAGE)
> 
> as an inlined expression, which goes in full italics; that's simpler.

This results in the entire line being underlined, which is probably not what
not what we want?

> 
>> +or
>> +.BR madvise (...,
>> +.BR MADV_COLLAPSE )
>> +is used.
>> +.IP \[bu]
>> +Global THP controls are set to "never" and
>> +.BR madvise (...,
>> +.BR MADV_COLLAPSE )
>> +is used.
>> +This is the same behavior as if THPs would not be disabled on
>> +a process level.
> 
> Please break the line before "as if".

ack

> 
>> +.RE
>>  .SH RETURN VALUE
>>  On success,
>>  0 is returned.
> 
> Have a lovely day!
> Alex
> 


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ