netdev - Re: [PATCH v2] net: skbuff: set FLAG_SKB_NO_MERGE for skbuff_fclone

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <b7825865-c368-1b72-3751-f1928443db32@linux.com>
Date: Thu, 29 Feb 2024 09:00:58 -0800 (PST)
From: "Christoph Lameter (Ampere)" <cl@...ux.com>
To: Shijie Huang <shijie@...eremail.onmicrosoft.com>
cc: Eric Dumazet <edumazet@...gle.com>, 
    Huang Shijie <shijie@...amperecomputing.com>, kuba@...nel.org, 
    patches@...erecomputing.com, davem@...emloft.net, horms@...nel.org, 
    ast@...nel.org, dhowells@...hat.com, linyunsheng@...wei.com, 
    aleksander.lobakin@...el.com, linux-kernel@...r.kernel.org, 
    netdev@...r.kernel.org, cl@...amperecomputing.com
Subject: Re: [PATCH v2] net: skbuff: set FLAG_SKB_NO_MERGE for
 skbuff_fclone_cache

On Wed, 28 Feb 2024, Shijie Huang wrote:

>> 
>> Using SLAB_NO_MERGE does not help, I am still seeing wrong allocations
>> on a dual socket
>> host with plenty of available memory.
>> (either sk_buff or skb->head being allocated on the other node).
>
> Do you mean you still can see the wrong fclone after using SLAB_NO_MERGE?
>
> If so, I guess there is bug in the slub.

Mergin has nothing to do with memory locality.

>> fclones might be allocated from a cpu running on node A, and freed
>> from a cpu running on node B.
>> Maybe SLUB is not properly handling this case ?
>
> Maybe.

Basic functionality is broken??? Really?

>> I think we need help from mm/slub experts, instead of trying to 'fix'
>> networking stacks.
>
> @Christopher
>
> Any idea about this?

If you want to force a local allocation then use GFP_THISNODE as a flag.

If you do not specify a node or GFP_THISNODE then the slub allocator will 
opportunistically allocate sporadically from other nodes to avoid 
fragmentation of slabs. The page allocator also will sporadically go off 
node in order to avoid reclaim. The page allocator may go off node 
extensively if there is a imbalance of allocation between node. The page 
allocator has knobs to tune off node vs reclaim options. Doing more 
reclaim will slow things down but give you local data.