lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 17 Oct 2023 19:07:04 +0200
From: Daniel Borkmann <daniel@...earbox.net>
To: Eric Dumazet <edumazet@...gle.com>
Cc: Florian Fainelli <f.fainelli@...il.com>, Coco Li <lixiaoyan@...gle.com>,
 Jakub Kicinski <kuba@...nel.org>, Neal Cardwell <ncardwell@...gle.com>,
 Mubashir Adnan Qureshi <mubashirq@...gle.com>,
 Paolo Abeni <pabeni@...hat.com>, netdev@...r.kernel.org,
 Chao Wu <wwchao@...gle.com>, Wei Wang <weiwan@...gle.com>
Subject: Re: [PATCH v2 net-next 0/5] Analyze and Reorganize core Networking
 Structs to optimize cacheline consumption

On 10/17/23 6:50 PM, Eric Dumazet wrote:
> On Tue, Oct 17, 2023 at 11:06 AM Daniel Borkmann <daniel@...earbox.net> wrote:
>> On 10/17/23 5:46 AM, Florian Fainelli wrote:
>>> On 10/16/2023 6:47 PM, Coco Li wrote:
>>>> Currently, variable-heavy structs in the networking stack is organized
>>>> chronologically, logically and sometimes by cache line access.
>>>>
>>>> This patch series attempts to reorganize the core networking stack
>>>> variables to minimize cacheline consumption during the phase of data
>>>> transfer. Specifically, we looked at the TCP/IP stack and the fast
>>>> path definition in TCP.
>>>>
>>>> For documentation purposes, we also added new files for each core data
>>>> structure we considered, although not all ended up being modified due
>>>> to the amount of existing cache line they span in the fast path. In
>>>> the documentation, we recorded all variables we identified on the
>>>> fast path and the reasons. We also hope that in the future when
>>>> variables are added/modified, the document can be referred to and
>>>> updated accordingly to reflect the latest variable organization.
>>>
>>> This is great stuff, while Eric mentioned this work during Netconf'23 one concern that came up however is how can we make sure that a future change which adds/removes/shuffles members in those structures is not going to be detrimental to the work you just did? Is there a way to "lock" the structure layout to avoid causing performance drops?
>>>
>>> I suppose we could use pahole before/after for these structures and ensure that the layout on a cacheline basis remains preserved, but that means adding custom scripts to CI.
>>
>> It should be possible without extra CI. We could probably have zero-sized markers
>> as we have in sk_buff e.g. __cloned_offset[0], and use some macros to force grouping.
>>
>> ASSERT_CACHELINE_GROUP() could then throw a build error for example if the member is
>> not within __begin_cacheline_group and __end_cacheline_group :
>>
>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>> index 9ea3ec906b57..c664e0594da4 100644
>> --- a/include/linux/netdevice.h
>> +++ b/include/linux/netdevice.h
>> @@ -2059,6 +2059,7 @@ struct net_device {
>>            */
>>
>>           /* TX read-mostly hotpath */
>> +       __begin_cacheline_group(tx_read_mostly);
>>           unsigned long long      priv_flags;
>>           const struct net_device_ops *netdev_ops;
>>           const struct header_ops *header_ops;
>> @@ -2085,6 +2086,7 @@ struct net_device {
>>    #ifdef CONFIG_NET_XGRESS
>>           struct bpf_mprog_entry __rcu *tcx_egress;
>>    #endif
>> +       __end_cacheline_group(tx_read_mostly);
>>
>>           /* TXRX read-mostly hotpath */
>>           unsigned int            flags;
>> diff --git a/net/core/dev.c b/net/core/dev.c
>> index 97e7b9833db9..2a91bd4077ad 100644
>> --- a/net/core/dev.c
>> +++ b/net/core/dev.c
>> @@ -11523,6 +11523,9 @@ static int __init net_dev_init(void)
>>
>>           BUG_ON(!dev_boot_phase);
>>
>> +       ASSERT_CACHELINE_GROUP(tx_read_mostly, priv_flags);
>> +       ASSERT_CACHELINE_GROUP(tx_read_mostly, netdev_ops);

nit, should have been sth like:

   ASSERT_CACHELINE_GROUP(struct net_device, netdev_ops, tx_read_mostly)

> Great idea, we only need to generate these automatically from the file
> describing the fields (currently in Documentation/ )
> 
> I think the initial intent was to find a way to generate the layout of
> the structure itself, but this looked a bit tricky.

Agree, ideally this could be scripted from the Documentation/ file of this
series, and perhaps the latter may not even be needed then if we have it
self-documented in code behind some macro magic with BUILD_BUG_ON assertion
which probes offsetof wrt the field being within markers.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ