linux-kernel - Re: [PATCH v2] net: skbuff: set FLAG_SKB_NO_MERGE for skbuff_fclone

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CANn89iJpZ6udACMC9EF=zgYJU5rqLFiTuYJRf1UNA3UKu7CxJg@mail.gmail.com>
Date: Thu, 29 Feb 2024 18:07:36 +0100
From: Eric Dumazet <edumazet@...gle.com>
To: "Christoph Lameter (Ampere)" <cl@...ux.com>
Cc: Shijie Huang <shijie@...eremail.onmicrosoft.com>, 
	Huang Shijie <shijie@...amperecomputing.com>, kuba@...nel.org, 
	patches@...erecomputing.com, davem@...emloft.net, horms@...nel.org, 
	ast@...nel.org, dhowells@...hat.com, linyunsheng@...wei.com, 
	aleksander.lobakin@...el.com, linux-kernel@...r.kernel.org, 
	netdev@...r.kernel.org, cl@...amperecomputing.com
Subject: Re: [PATCH v2] net: skbuff: set FLAG_SKB_NO_MERGE for skbuff_fclone_cache

On Thu, Feb 29, 2024 at 6:01 PM Christoph Lameter (Ampere) <cl@...ux.com> wrote:
>
> On Wed, 28 Feb 2024, Shijie Huang wrote:
>
> >>
> >> Using SLAB_NO_MERGE does not help, I am still seeing wrong allocations
> >> on a dual socket
> >> host with plenty of available memory.
> >> (either sk_buff or skb->head being allocated on the other node).
> >
> > Do you mean you still can see the wrong fclone after using SLAB_NO_MERGE?
> >
> > If so, I guess there is bug in the slub.
>
> Mergin has nothing to do with memory locality.
>
> >> fclones might be allocated from a cpu running on node A, and freed
> >> from a cpu running on node B.
> >> Maybe SLUB is not properly handling this case ?
> >
> > Maybe.
>
> Basic functionality is broken??? Really?

It seems so.

>
> >> I think we need help from mm/slub experts, instead of trying to 'fix'
> >> networking stacks.
> >
> > @Christopher
> >
> > Any idea about this?
>
>
> If you want to force a local allocation then use GFP_THISNODE as a flag.
>
> If you do not specify a node or GFP_THISNODE then the slub allocator will
> opportunistically allocate sporadically from other nodes to avoid
> fragmentation of slabs. The page allocator also will sporadically go off
> node in order to avoid reclaim. The page allocator may go off node
> extensively if there is a imbalance of allocation between node. The page
> allocator has knobs to tune off node vs reclaim options. Doing more
> reclaim will slow things down but give you local data.

Maybe, maybe not.

Going back to CONFIG_SLAB=y removes all mismatches, without having to
use GFP_THISNODE at all,
on hosts with plenty of available memory on all nodes.

I think that is some kind of evidence that something is broken in SLUB land.