lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <0002cde37fcead62813006ab9516c5b2fdbf113a.camel@gmail.com>
Date: Wed, 14 Aug 2024 09:13:47 -0700
From: Alexander H Duyck <alexander.duyck@...il.com>
To: Yunsheng Lin <linyunsheng@...wei.com>, davem@...emloft.net,
 kuba@...nel.org,  pabeni@...hat.com
Cc: netdev@...r.kernel.org, linux-kernel@...r.kernel.org, Andrew Morton
	 <akpm@...ux-foundation.org>, linux-mm@...ck.org
Subject: Re: [PATCH net-next v13 07/14] mm: page_frag: reuse existing space
 for 'size' and 'pfmemalloc'

On Thu, 2024-08-08 at 20:37 +0800, Yunsheng Lin wrote:
> Currently there is one 'struct page_frag' for every 'struct
> sock' and 'struct task_struct', we are about to replace the
> 'struct page_frag' with 'struct page_frag_cache' for them.
> Before begin the replacing, we need to ensure the size of
> 'struct page_frag_cache' is not bigger than the size of
> 'struct page_frag', as there may be tens of thousands of
> 'struct sock' and 'struct task_struct' instances in the
> system.
> 
> By or'ing the page order & pfmemalloc with lower bits of
> 'va' instead of using 'u16' or 'u32' for page size and 'u8'
> for pfmemalloc, we are able to avoid 3 or 5 bytes space waste.
> And page address & pfmemalloc & order is unchanged for the
> same page in the same 'page_frag_cache' instance, it makes
> sense to fit them together.
> 
> After this patch, the size of 'struct page_frag_cache' should be
> the same as the size of 'struct page_frag'.
> 
> CC: Alexander Duyck <alexander.duyck@...il.com>
> Signed-off-by: Yunsheng Lin <linyunsheng@...wei.com>
> ---
>  include/linux/mm_types_task.h   | 16 +++++-----
>  include/linux/page_frag_cache.h | 52 +++++++++++++++++++++++++++++++--
>  mm/page_frag_cache.c            | 49 +++++++++++++++++--------------
>  3 files changed, 85 insertions(+), 32 deletions(-)
> 
> diff --git a/include/linux/mm_types_task.h b/include/linux/mm_types_task.h
> index b1c54b2b9308..f2610112a642 100644
> --- a/include/linux/mm_types_task.h
> +++ b/include/linux/mm_types_task.h
> @@ -50,18 +50,18 @@ struct page_frag {
>  #define PAGE_FRAG_CACHE_MAX_SIZE	__ALIGN_MASK(32768, ~PAGE_MASK)
>  #define PAGE_FRAG_CACHE_MAX_ORDER	get_order(PAGE_FRAG_CACHE_MAX_SIZE)
>  struct page_frag_cache {
> -	void *va;
> -#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE)
> +	/* encoded_va consists of the virtual address, pfmemalloc bit and order
> +	 * of a page.
> +	 */
> +	unsigned long encoded_va;
> +

Rather than calling this an "encoded_va" we might want to call this an
"encoded_page" as that would be closer to what we are actually working
with. We are just using the virtual address as the page pointer instead
of the page struct itself since we need quicker access to the virtual
address than we do the page struct.

> +#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) && (BITS_PER_LONG <= 32)
>  	__u16 remaining;
> -	__u16 size;
> +	__u16 pagecnt_bias;
>  #else
>  	__u32 remaining;
> +	__u32 pagecnt_bias;
>  #endif
> -	/* we maintain a pagecount bias, so that we dont dirty cache line
> -	 * containing page->_refcount every time we allocate a fragment.
> -	 */
> -	unsigned int		pagecnt_bias;
> -	bool pfmemalloc;
>  };
>  
>  /* Track pages that require TLB flushes */
> diff --git a/include/linux/page_frag_cache.h b/include/linux/page_frag_cache.h
> index 7c9125a9aed3..4ce924eaf1b1 100644
> --- a/include/linux/page_frag_cache.h
> +++ b/include/linux/page_frag_cache.h
> @@ -3,18 +3,66 @@
>  #ifndef _LINUX_PAGE_FRAG_CACHE_H
>  #define _LINUX_PAGE_FRAG_CACHE_H
>  
> +#include <linux/bits.h>
> +#include <linux/build_bug.h>
>  #include <linux/log2.h>
>  #include <linux/types.h>
>  #include <linux/mm_types_task.h>
>  
> +#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE)
> +/* Use a full byte here to enable assembler optimization as the shift
> + * operation is usually expecting a byte.
> + */
> +#define PAGE_FRAG_CACHE_ORDER_MASK		GENMASK(7, 0)
> +#define PAGE_FRAG_CACHE_PFMEMALLOC_BIT		BIT(8)
> +#define PAGE_FRAG_CACHE_PFMEMALLOC_SHIFT	8
> +#else
> +/* Compiler should be able to figure out we don't read things as any value
> + * ANDed with 0 is 0.
> + */
> +#define PAGE_FRAG_CACHE_ORDER_MASK		0
> +#define PAGE_FRAG_CACHE_PFMEMALLOC_BIT		BIT(0)
> +#define PAGE_FRAG_CACHE_PFMEMALLOC_SHIFT	0
> +#endif
> +

You should probably pull out PAGE_FRAG_CACHE_PFMEMALLOC_BIT and just
define it as:
#define PAGE_FRAG_CACHE_PFMEMALLOC_BIT \
	BIT(PAGE_FRAG_CACHE_PFMEMALLOC_SHIFT)

That way there is no risk of the bit and the shift somehow getting
split up and being different values.

> +static inline unsigned long encode_aligned_va(void *va, unsigned int order,
> +					      bool pfmemalloc)

Rather than passing the virtual address it might make more sense to
pass the page. With that you know it should be PAGE_SIZE aligned versus
just being passed some random virtual address.

> +{
> +	BUILD_BUG_ON(PAGE_FRAG_CACHE_MAX_ORDER > PAGE_FRAG_CACHE_ORDER_MASK);
> +	BUILD_BUG_ON(PAGE_FRAG_CACHE_PFMEMALLOC_SHIFT >= PAGE_SHIFT);

Rather than test the shift I would test the bit versus PAGE_SIZE.

> +
> +	return (unsigned long)va | (order & PAGE_FRAG_CACHE_ORDER_MASK) |
> +		((unsigned long)pfmemalloc << PAGE_FRAG_CACHE_PFMEMALLOC_SHIFT);
> +}
> +
> +static inline unsigned long encoded_page_order(unsigned long encoded_va)
> +{
> +	return encoded_va & PAGE_FRAG_CACHE_ORDER_MASK;
> +}
> +
> +static inline bool encoded_page_pfmemalloc(unsigned long encoded_va)
> +{
> +	return !!(encoded_va & PAGE_FRAG_CACHE_PFMEMALLOC_BIT);
> +}
> +
> +static inline void *encoded_page_address(unsigned long encoded_va)
> +{
> +	return (void *)(encoded_va & PAGE_MASK);
> +}
> +

This is one of the reasons why I am thinking "encoded_page" might be a
better name for it. The 3 functions above all have their equivilent for
a page struct but we pulled that data out and packed it all into the
encoded_page.



Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ