linux-kernel - Re: [PATCH v2] net: skbuff: set FLAG_SKB_NO_MERGE for skbuff_fclone

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CANn89iL2a2534d8QU9Xt6Gjm8M1+wWH03+YPdjSPQCq_Q4ZGxw@mail.gmail.com>
Date: Tue, 27 Feb 2024 13:55:07 +0100
From: Eric Dumazet <edumazet@...gle.com>
To: Huang Shijie <shijie@...amperecomputing.com>
Cc: kuba@...nel.org, patches@...erecomputing.com, davem@...emloft.net, 
	horms@...nel.org, ast@...nel.org, dhowells@...hat.com, linyunsheng@...wei.com, 
	aleksander.lobakin@...el.com, linux-kernel@...r.kernel.org, 
	netdev@...r.kernel.org, cl@...amperecomputing.com
Subject: Re: [PATCH v2] net: skbuff: set FLAG_SKB_NO_MERGE for skbuff_fclone_cache

On Tue, Feb 27, 2024 at 7:29 AM Huang Shijie
<shijie@...amperecomputing.com> wrote:
>
> Since we do not set FLAG_SKB_NO_MERGE for skbuff_fclone_cache,
> the current skbuff_fclone_cache maybe not really allocated, it maybe
> used an exist old kmem_cache. In NUMA, the fclone allocated by
> alloc_skb_fclone() maybe in remote node.

Why is this happening in the first place ? Whab about skb->head ?

Jesper patch [1] motivation was not about NUMA., but about
fragmentation and bulk allocations/freeing.

TCP fclones are not bulk allocated/freed, so I do not understand what
your patch is doing.
You need to give more details, and experimental results.

Using SLAB_NO_MERGE does not help, I am still seeing wrong allocations
on a dual socket
host with plenty of available memory.
(either sk_buff or skb->head being allocated on the other node).

fclones might be allocated from a cpu running on node A, and freed
from a cpu running on node B.
Maybe SLUB is not properly handling this case ?

SLAB_NO_MERGE will avoid merging fclone with kmalloc-512, it does not
really help.

I think we need help from mm/slub experts, instead of trying to 'fix'
networking stacks.

Perhaps we could augment trace_kmem_cache_alloc() to record/print the nodes
of the allocated chunk (we already have the cpu number giving us the
local node).
That would give us more confidence on any fixes.

BTW SLUB is gone, time to remove FLAG_SKB_NO_MERGE and simply use SLAB_NO_MERGE

[1]
commit 0a0643164da4a1976455aa12f0a96d08ee290752
Author: Jesper Dangaard Brouer <hawk@...nel.org>
Date:   Tue Aug 15 17:17:36 2023 +0200

    net: use SLAB_NO_MERGE for kmem_cache skbuff_head_cache



>
> So set FLAG_SKB_NO_MERGE for skbuff_fclone_cache to fix it.
>
> Signed-off-by: Huang Shijie <shijie@...amperecomputing.com>
> ---
> v1 --> v2:
>        set FLAG_SKB_NO_MERGE for skbuff_fclone_cache in initialization.
>
> v1: https://lkml.org/lkml/2024/2/20/121
> ---
>  net/core/skbuff.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 1f918e602bc4..5e3e130fb57a 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -5013,7 +5013,8 @@ void __init skb_init(void)
>         skbuff_fclone_cache = kmem_cache_create("skbuff_fclone_cache",
>                                                 sizeof(struct sk_buff_fclones),
>                                                 0,
> -                                               SLAB_HWCACHE_ALIGN|SLAB_PANIC,
> +                                               SLAB_HWCACHE_ALIGN|SLAB_PANIC|
> +                                               FLAG_SKB_NO_MERGE,
>                                                 NULL);
>         /* usercopy should only access first SKB_SMALL_HEAD_HEADROOM bytes.
>          * struct skb_shared_info is located at the end of skb->head,
> --
> 2.40.1
>