netdev - Re: [net-next PATCH 13/15] eth: fbnic: add basic Rx handling

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAKgT0UeQS5q=Y2j3mmu9AhWyUMbey-iFL+sKES1UrBtoAXMdzw@mail.gmail.com>
Date: Wed, 10 Apr 2024 08:03:01 -0700
From: Alexander Duyck <alexander.duyck@...il.com>
To: Yunsheng Lin <linyunsheng@...wei.com>
Cc: netdev@...r.kernel.org, Alexander Duyck <alexanderduyck@...com>, kuba@...nel.org, 
	davem@...emloft.net, pabeni@...hat.com
Subject: Re: [net-next PATCH 13/15] eth: fbnic: add basic Rx handling

On Wed, Apr 10, 2024 at 4:54 AM Yunsheng Lin <linyunsheng@...wei.com> wrote:
>
> On 2024/4/9 23:08, Alexander Duyck wrote:
> > On Tue, Apr 9, 2024 at 4:47 AM Yunsheng Lin <linyunsheng@...wei.com> wrote:
> >>
> >> On 2024/4/4 4:09, Alexander Duyck wrote:
> >>> From: Alexander Duyck <alexanderduyck@...com>
> >
> > [...]
> >
> >>> +     /* Unmap and free processed buffers */
> >>> +     if (head0 >= 0)
> >>> +             fbnic_clean_bdq(nv, budget, &qt->sub0, head0);
> >>> +     fbnic_fill_bdq(nv, &qt->sub0);
> >>> +
> >>> +     if (head1 >= 0)
> >>> +             fbnic_clean_bdq(nv, budget, &qt->sub1, head1);
> >>> +     fbnic_fill_bdq(nv, &qt->sub1);
> >>
> >> I am not sure how complicated the rx handling will be for the advanced
> >> feature. For the current code, for each entry/desc in both qt->sub0 and
> >> qt->sub1 at least need one page, and the page seems to be only used once
> >> no matter however small the page is used?
> >>
> >> I am assuming you want to do 'tightly optimized' operation for this by
> >> calling page_pool_fragment_page(), but manipulating page->pp_ref_count
> >> directly does not seems to add any value for the current code, but seem
> >> to waste a lot of memory by not using the frag API, especially PAGE_SIZE
> >>> 4K?
> >
> > On this hardware both the header and payload buffers are fragmentable.
> > The hardware decides the partitioning and we just follow it. So for
> > example it wouldn't be uncommon to have a jumbo frame split up such
> > that the header is less than 128B plus SKB overhead while the actual
> > data in the payload is just over 1400. So for us fragmenting the pages
> > is a very likely case especially with smaller packets.
>
> I understand that is what you are trying to do, but the code above does
> not seems to match the description, as the fbnic_clean_bdq() and
> fbnic_fill_bdq() are called for qt->sub0 and qt->sub1, so the old pages
> of qt->sub0 and qt->sub1 just cleaned are drained and refill each sub
> with new pages, which does not seems to have any fragmenting?

That is because it is all taken care of by the completion queue. Take
a look in fbnic_pkt_prepare. We are taking the buffer from the header
descriptor and taking a slice out of it there via fbnic_page_pool_get.
Basically we store the fragment count locally in the rx_buf and then
subtract what is leftover when the device is done with it.

> The fragmenting can only happen when there is continuous small packet
> coming from wire so that hw can report the same pg_id for different
> packet with pg_offset before fbnic_clean_bdq() and fbnic_fill_bdq()
> is called? I am not sure how to ensure that considering that we might
> break out of while loop in fbnic_clean_rcq() because of 'packets < budget'
> checking.

We don't free the page until we have moved one past it, or the
hardware has indicated it will take no more slices via a PAGE_FIN bit
in the descriptor.

> > It is better for us to optimize for the small packet scenario than
> > optimize for the case where 4K slices are getting taken. That way when
> > we are CPU constrained handling small packets we are the most
> > optimized whereas for the larger frames we can spare a few cycles to
> > account for the extra overhead. The result should be a higher overall
> > packets per second.
>
> The problem is that small packet means low utilization of the bandwidth
> as more bandwidth is used to send header instead of payload that is useful
> for the user, so the question seems to be how often the small packet is
> seen in the wire?

Very often. Especially when you are running something like servers
where the flow usually consists of an incoming request which is often
only a few hundred bytes, followed by us sending a response which then
leads to a flow of control frames for it.