netdev - Re: [net-next PATCH 13/15] eth: fbnic: add basic Rx handling

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAKgT0Uf6MdYX_1OuAFAXadh86zDX_w1a_cwpoPGMxpmC4hGyEA@mail.gmail.com>
Date: Tue, 9 Apr 2024 08:08:02 -0700
From: Alexander Duyck <alexander.duyck@...il.com>
To: Yunsheng Lin <linyunsheng@...wei.com>
Cc: netdev@...r.kernel.org, Alexander Duyck <alexanderduyck@...com>, kuba@...nel.org, 
	davem@...emloft.net, pabeni@...hat.com
Subject: Re: [net-next PATCH 13/15] eth: fbnic: add basic Rx handling

On Tue, Apr 9, 2024 at 4:47 AM Yunsheng Lin <linyunsheng@...wei.com> wrote:
>
> On 2024/4/4 4:09, Alexander Duyck wrote:
> > From: Alexander Duyck <alexanderduyck@...com>

[...]

> > +     /* Unmap and free processed buffers */
> > +     if (head0 >= 0)
> > +             fbnic_clean_bdq(nv, budget, &qt->sub0, head0);
> > +     fbnic_fill_bdq(nv, &qt->sub0);
> > +
> > +     if (head1 >= 0)
> > +             fbnic_clean_bdq(nv, budget, &qt->sub1, head1);
> > +     fbnic_fill_bdq(nv, &qt->sub1);
>
> I am not sure how complicated the rx handling will be for the advanced
> feature. For the current code, for each entry/desc in both qt->sub0 and
> qt->sub1 at least need one page, and the page seems to be only used once
> no matter however small the page is used?
>
> I am assuming you want to do 'tightly optimized' operation for this by
> calling page_pool_fragment_page(), but manipulating page->pp_ref_count
> directly does not seems to add any value for the current code, but seem
> to waste a lot of memory by not using the frag API, especially PAGE_SIZE
> > 4K?

On this hardware both the header and payload buffers are fragmentable.
The hardware decides the partitioning and we just follow it. So for
example it wouldn't be uncommon to have a jumbo frame split up such
that the header is less than 128B plus SKB overhead while the actual
data in the payload is just over 1400. So for us fragmenting the pages
is a very likely case especially with smaller packets.

It is better for us to optimize for the small packet scenario than
optimize for the case where 4K slices are getting taken. That way when
we are CPU constrained handling small packets we are the most
optimized whereas for the larger frames we can spare a few cycles to
account for the extra overhead. The result should be a higher overall
packets per second.