linux-kernel - Re: [PATCH net-next 3/6] mm/page_alloc: use initial zero offset for page_frag_alloc

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date: Fri, 05 Jan 2024 07:42:08 -0800
From: Alexander H Duyck <alexander.duyck@...il.com>
To: Yunsheng Lin <linyunsheng@...wei.com>, davem@...emloft.net,
 kuba@...nel.org,  pabeni@...hat.com
Cc: netdev@...r.kernel.org, linux-kernel@...r.kernel.org, Andrew Morton
	 <akpm@...ux-foundation.org>, linux-mm@...ck.org
Subject: Re: [PATCH net-next 3/6] mm/page_alloc: use initial zero offset for
 page_frag_alloc_align()

On Wed, 2024-01-03 at 17:56 +0800, Yunsheng Lin wrote:
> The next patch is above to use page_frag_alloc_align() to
> replace vhost_net_page_frag_refill(), the main difference
> between those two frag page implementations is whether we
> use a initial zero offset or not.
> 
> It seems more nature to use a initial zero offset, as it
> may enable more correct cache prefetching and skb frag
> coalescing in the networking, so change it to use initial
> zero offset.
> 
> Signed-off-by: Yunsheng Lin <linyunsheng@...wei.com>
> CC: Alexander Duyck <alexander.duyck@...il.com>

There are several advantages to running the offset as a countdown
rather than count-up value.

1. Specifically for the size of the chunks we are allocating doing it
from the bottom up doesn't add any value as we are jumping in large
enough amounts and are being used for DMA so being sequential doesn't
add any value.

2. By starting at the end and working toward zero we can use built in
functionality of the CPU to only have to check and see if our result
would be signed rather than having to load two registers with the
values and then compare them which saves us a few cycles. In addition
it saves us from having to read both the size and the offset for every
page.

Again this is another code cleanup at the cost of performance. I
realize many of the items you are removing would be considered micro-
optimizations but when we are dealing with millions of packets per
second those optimizations add up.