lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Mon, 15 May 2023 18:36:36 +0200
From: Alexander Lobakin <aleksander.lobakin@...el.com>
To: Lorenzo Bianconi <lorenzo@...nel.org>
CC: <netdev@...r.kernel.org>, <lorenzo.bianconi@...hat.com>,
	<bpf@...r.kernel.org>, <davem@...emloft.net>, <edumazet@...gle.com>,
	<kuba@...nel.org>, <pabeni@...hat.com>, <ast@...nel.org>,
	<daniel@...earbox.net>, <hawk@...nel.org>, <john.fastabend@...il.com>,
	<linyunsheng@...wei.com>
Subject: Re: [RFC net-next] net: veth: reduce page_pool memory footprint using
 half page per-buffer

From: Lorenzo Bianconi <lorenzo@...nel.org>
Date: Fri, 12 May 2023 16:14:48 +0200

>> From: Lorenzo Bianconi <lorenzo@...nel.org>
>> Date: Fri, 12 May 2023 15:08:13 +0200
>>
>>> In order to reduce page_pool memory footprint, rely on
>>> page_pool_dev_alloc_frag routine and reduce buffer size
>>> (VETH_PAGE_POOL_FRAG_SIZE) to PAGE_SIZE / 2 in order to consume one page
>>> for two 1500B frames. Reduce VETH_XDP_PACKET_HEADROOM to 192 from 256
>>> (XDP_PACKET_HEADROOM) to fit max_head_size in VETH_PAGE_POOL_FRAG_SIZE.
>>> Please note, using default values (CONFIG_MAX_SKB_FRAGS=17), maximum
>>> supported MTU is now reduced to 36350B.
>>
>> I thought we're stepping away from page splitting bit by bit O_o
> 
> do you mean to driver private page_split implementation? AFAIK we are not
> stepping away from page_pool page split implementation (or maybe I missed it :))

Page split in general. Since early-mid 2021, Page Pool with 1 page per
frame shows results comparable to drivers doing page split, but it
doesn't have such MTU / headroom / ... limitations.

> 
>> Primarily for the reasons you mentioned / worked around here: it creates
>> several significant limitations and at least on 64-bit systems it
>> doesn't scale anymore. 192 bytes of headroom is less than what XDP
>> expects (isn't it? Isn't 256 standard-standard, so that skb XDP path
>> reallocates heads only to have 256+ there?), 384 bytes of shinfo can
>> change anytime and even now page split simply blocks you from increasing
>> MAX_SKB_FRAGS even by one. Not speaking of MTU limitations etc.
>> BTW Intel drivers suffer from the very same things due solely to page
>> split (and I'm almost done with converting at least some of them to Page
>> Pool and 1 page per buffer model), I don't recommend deliberately
>> falling into that pit =\ :D
> 
> I am not sure about the 192 vs 256 bytes of headroom (this is why I sent this
> patch as RFC, my main goal is to discuss about this requirement). In the
> previous discussion [0] we deferred this implementation since if we do not
> reduce requested xdp headroom, we will not be able to fit two 1500B frames
> into a single page (for skb_shared_info size [1]) and we introduce a performance
> penalty.

Yes, that's what I'm talking about. In order to fit two 1500-byte frames
onto one page on x86_64, you need to have at most 192 bytes of headroom
(192 + 320 of shinfo = 512 per frame / 1024 per page), but XDP requires
256. And then you have one more problem from the other side, I mean
shinfo size. It can change anytime because it's not UAPI or ABI or
whatever and nobody can say "hey, don't touch it, you break my page
split", at the same time with page splitting you're not able to increase
MAX_SKB_FRAGS.
And for MTU > 1536 this is all worthless, just a waste of cycles. With 1
page per frame you can have up to 3.5k per fragment.

You mentioned memory footprint. Do you have any exact numbers to show
this can help significantly?
Because I have PP on my home router with 16k-sized pages and 128 Mb of
RAM and there's no memory shortage there :D I realize it doesn't mean
anything and I'm mostly joking mentioning this, but still.

> 
> Regards,
> Lorenzo
> 
> [0] https://lore.kernel.org/netdev/6298f73f7cc7391c7c4a52a6a89b1ae21488bda1.1682188837.git.lorenzo@kernel.org/
> [1] $ pahole -C skb_shared_info vmlinux.o 
> struct skb_shared_info {
>         __u8                       flags;                /*     0     1 */
>         __u8                       meta_len;             /*     1     1 */
>         __u8                       nr_frags;             /*     2     1 */
>         __u8                       tx_flags;             /*     3     1 */
>         unsigned short             gso_size;             /*     4     2 */
>         unsigned short             gso_segs;             /*     6     2 */
>         struct sk_buff *           frag_list;            /*     8     8 */
>         struct skb_shared_hwtstamps hwtstamps;           /*    16     8 */
>         unsigned int               gso_type;             /*    24     4 */
>         u32                        tskey;                /*    28     4 */
>         atomic_t                   dataref;              /*    32     4 */
>         unsigned int               xdp_frags_size;       /*    36     4 */
>         void *                     destructor_arg;       /*    40     8 */
>         skb_frag_t                 frags[17];            /*    48   272 */
> 
>         /* size: 320, cachelines: 5, members: 14 */
> };
> 
>>
>>>
>>> Signed-off-by: Lorenzo Bianconi <lorenzo@...nel.org>
>>> ---
>>>  drivers/net/veth.c | 39 +++++++++++++++++++++++++--------------
>>>  1 file changed, 25 insertions(+), 14 deletions(-)
>> [...]
>>
>> Thanks,
>> Olek

Thanks,
Olek

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ