lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ejz6kpdn6kxuspktab3m7sjwg3l7eevacoabgroxgsltognb7y@3edyqhpae4vn>
Date: Mon, 1 Sep 2025 18:36:47 +0200
From: Alejandro Colomar <alx@...nel.org>
To: Usama Arif <usamaarif642@...il.com>
Cc: linux-man@...r.kernel.org, david@...hat.com, 
	lorenzo.stoakes@...cle.com, hannes@...xchg.org, baohua@...nel.org, shakeel.butt@...ux.dev, 
	ziy@...dia.com, laoar.shao@...il.com, baolin.wang@...ux.alibaba.com, 
	Liam.Howlett@...cle.com, linux-kernel@...r.kernel.org, kernel-team@...a.com
Subject: Re: [PATCH] PR_*ET_THP_DISABLE.2const: document addition of
 PR_THP_DISABLE_EXCEPT_ADVISED

Hi Usama,

On Mon, Sep 01, 2025 at 05:09:03PM +0100, Usama Arif wrote:
> PR_THP_DISABLE_EXCEPT_ADVISED extended PR_SET_THP_DISABLE to only provide
> THPs when advised. IOW, it allows individual processes to opt-out of THP =
> "always" into THP = "madvise", without affecting other workloads on the
> system. The series has been merged in [1].
> 
> This patch documents the changes introduced due to the addition of
> PR_THP_DISABLE_EXCEPT_ADVISED flag:
> - PR_GET_THP_DISABLE returns a value whose bits indicate how THP-disable
>   is configured for the calling thread (with or without
>   PR_THP_DISABLE_EXCEPT_ADVISED).
> - PR_SET_THP_DISABLE now uses arg3 to specify whether to disable THP
>   completely for the process, or disable except madvise
>   (PR_THP_DISABLE_EXCEPT_ADVISED).
> 
> [1] https://lore.kernel.org/all/20250815135549.130506-1-usamaarif642@gmail.com/
> 
> Signed-off-by: Usama Arif <usamaarif642@...il.com>

Thanks for writing the patch!  Please see some comments below.

> ---
>  man/man2/madvise.2                      |  4 +-
>  man/man2const/PR_GET_THP_DISABLE.2const | 18 ++++++---
>  man/man2const/PR_SET_THP_DISABLE.2const | 52 +++++++++++++++++++++----
>  3 files changed, 61 insertions(+), 13 deletions(-)
> 
> diff --git a/man/man2/madvise.2 b/man/man2/madvise.2
> index 10cc21fa4..6a5290f67 100644
> --- a/man/man2/madvise.2
> +++ b/man/man2/madvise.2
> @@ -373,7 +373,9 @@ nor can it be stack memory or backed by a DAX-enabled device
>  (unless the DAX device is hot-plugged as System RAM).
>  The process must also not have
>  .B PR_SET_THP_DISABLE
> -set (see
> +set without the
> +.B PR_THP_DISABLE_EXCEPT_ADVISED
> +flag (see
>  .BR prctl (2)).

Double negation is confusing.  Please rephrase to something like

	The process can have X set
	only if Y is also set.

>  .IP
>  The
> diff --git a/man/man2const/PR_GET_THP_DISABLE.2const b/man/man2const/PR_GET_THP_DISABLE.2const
> index 38ff3b370..df239700f 100644
> --- a/man/man2const/PR_GET_THP_DISABLE.2const
> +++ b/man/man2const/PR_GET_THP_DISABLE.2const
> @@ -6,7 +6,7 @@
>  .SH NAME
>  PR_GET_THP_DISABLE
>  \-
> -get the state of the "THP disable" flag for the calling thread
> +get the state of the "THP disable" flags for the calling thread
>  .SH LIBRARY
>  Standard C library
>  .RI ( libc ,\~ \-lc )
> @@ -18,13 +18,21 @@ Standard C library
>  .B int prctl(PR_GET_THP_DISABLE, 0L, 0L, 0L, 0L);
>  .fi
>  .SH DESCRIPTION
> -Return the current setting of
> -the "THP disable" flag for the calling thread:
> -either 1, if the flag is set, or 0, if it is not.
> +Returns a value whose bits indicate how THP-disable is configured

s/Returns/Return/

> +for the calling thread.
> +The returned value is interpreted as follows:
> +.P
> +.nf
> +.B "Bits"
> +.B " 1 0  Value  Description"
> + 0 0    0    No THP-disable behaviour specified.
> + 0 1    1    THP is entirely disabled for this process.
> + 1 1    3    THP-except-advised mode is set for this process.

We should probably use a table with .TS/.TE.  See examples of this in
other manual pages for how to use that (or read tbl(1) if you want).

If you don't know how to use that, I can do it myself.  tbl(1) is a bit
weird.

> +.fi
>  .SH RETURN VALUE
>  On success,
>  .BR PR_GET_THP_DISABLE ,
> -returns the boolean value described above.
> +returns the value described above.
>  On error, \-1 is returned, and
>  .I errno
>  is set to indicate the error.
> diff --git a/man/man2const/PR_SET_THP_DISABLE.2const b/man/man2const/PR_SET_THP_DISABLE.2const
> index 564e005d4..9f0f17702 100644
> --- a/man/man2const/PR_SET_THP_DISABLE.2const
> +++ b/man/man2const/PR_SET_THP_DISABLE.2const
> @@ -6,7 +6,7 @@
>  .SH NAME
>  PR_SET_THP_DISABLE
>  \-
> -set the state of the "THP disable" flag for the calling thread
> +set the state of the "THP disable" flags for the calling thread
>  .SH LIBRARY
>  Standard C library
>  .RI ( libc ,\~ \-lc )
> @@ -15,24 +15,62 @@ Standard C library
>  .BR "#include <linux/prctl.h>" "  /* Definition of " PR_* " constants */"
>  .B #include <sys/prctl.h>
>  .P
> -.BI "int prctl(PR_SET_THP_DISABLE, long " flag ", 0L, 0L, 0L);"
> +.BI "int prctl(PR_SET_THP_DISABLE, long " thp_disable ", unsigned long " flags ", 0L, 0L);"

Hmmm, I'm reading this weirdly.

Old code doing prctl(PR_SET_THP_DIABLE, 1, 0L, 0L, 0L); would be
transformed from setting the flag before, to now using 0L as flags?

Or how is backwards compatibility handled?

>  .fi
>  .SH DESCRIPTION
> -Set the state of the "THP disable" flag for the calling thread.
> +Set the state of the "THP disable" flags for the calling thread.
>  If
> -.I flag
> -has a nonzero value, the flag is set, otherwise it is cleared.
> +.I thp_disable
> +has a nonzero value, the THP disable flag is set according to the value of

Please break the line after the comma.

> +.I flags,
> +otherwise it is cleared.
>  .P
> -Setting this flag provides a method
> +This
> +.BR prctl (2)
> +provides a method
>  for disabling transparent huge pages
>  for jobs where the code cannot be modified,
>  and using a malloc hook with
>  .BR madvise (2)
>  is not an option (i.e., statically allocated data).
> -The setting of the "THP disable" flag is inherited by a child created via
> +The setting of the "THP disable" flags is inherited by a child created via
>  .BR fork (2)
>  and is preserved across
>  .BR execve (2).
> +.P
> +The behavior depends on the value of
> +.IR flags:
> +.TP
> +.B 0
> +The
> +.BR prctl (2)
> +call will disable THPs completely for the process,
> +irrespective of global THP controls or
> +.BR MADV_COLLAPSE .
> +.TP
> +.B PR_THP_DISABLE_EXCEPT_ADVISED
> +The
> +.BR prctl (2)
> +call will disable THPs for the process except when the usage of THPs is
> +advised.

Please break the line before 'except'.  See 'Use semantic newlines'
in man-pages(7).

> +Consequently, THPs will only be used when:
> +.RS
> +.IP \[bu] 2

s/2/3/

See man-pages(7) ("Lists"):

       There should always be exactly 2 spaces between the list  symbol
       and  the  elements.   This doesn't apply to "tagged paragraphs",
       which use the default indentation rules.

(If you grep(1) around, you'll see that number everywhere.)

> +Global THP controls are set to "always" or "madvise" and
> +.BR madvise (...,
> +.BR MADV_HUGEPAGE )

I'd say

	.I madvise(..., MADV_HUGEPAGE)

as an inlined expression, which goes in full italics; that's simpler.

> +or
> +.BR madvise (...,
> +.BR MADV_COLLAPSE )
> +is used.
> +.IP \[bu]
> +Global THP controls are set to "never" and
> +.BR madvise (...,
> +.BR MADV_COLLAPSE )
> +is used.
> +This is the same behavior as if THPs would not be disabled on
> +a process level.

Please break the line before "as if".

> +.RE
>  .SH RETURN VALUE
>  On success,
>  0 is returned.

Have a lovely day!
Alex

-- 
<https://www.alejandro-colomar.es>
Use port 80 (that is, <...:80/>).

Download attachment "signature.asc" of type "application/pgp-signature" (834 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ