[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aQR2Z51Q45Zl99m_@lore-desk>
Date: Fri, 31 Oct 2025 09:42:15 +0100
From: Lorenzo Bianconi <lorenzo@...nel.org>
To: Jakub Kicinski <kuba@...nel.org>
Cc: Eric Dumazet <edumazet@...gle.com>, Andrew Lunn <andrew+netdev@...n.ch>,
	"David S. Miller" <davem@...emloft.net>,
	Paolo Abeni <pabeni@...hat.com>,
	linux-arm-kernel@...ts.infradead.org,
	linux-mediatek@...ts.infradead.org, netdev@...r.kernel.org
Subject: Re: [PATCH net-next] net: airoha: Add TCP LRO support
> > On Thu, 12 Jun 2025 23:02:30 +0200 Lorenzo Bianconi wrote:
> > > > I'm not Eric but FWIW 256B is not going to help much. It's best to keep
> > > > the len / truesize ratio above 50%, so with 32k buffers we're talking
> > > > about copying multiple frames.  
> > > 
> > > what I mean here is reallocate the skb if the true size is small (e.g. below
> > > 256B) in order to avoid consuming the high order page from the page_pool. Maybe
> > > we can avoid it if reducing the page order to 2 for LRO queues provide
> > > comparable results.
> > 
> > Hm, truesize is the buffer size, right? If the driver allocated n bytes
> > of memory for packets it sent up the stack, the truesizes of the skbs
> > it generated must add up to approximately n bytes.
> 
> With 'truesize' I am referring to the real data size contained in the x-order
> page returned by the hw. If this size is small, I was thinking to just allocate
> a skb for it, copy the data from the x-order page into it and re-insert the
> x-order page into the page_pool running page_pool_put_full_page().
> Let me do some tests with order-2 page to see if the GRO can compensate the
> reduced page size.
Sorry for the late reply about this item.
I carried out some comparison tests between GRO-only and GRO+LRO with order-2
pages [0]. The system is using a 2.5Gbps link. The device is receiving a single TCP
stream. MTU is set to 1500B.
- GRO only:			~1.6Gbps
- GRO+LRO (order-2 pages):	~2.1Gbps
In both cases we can't reach the line-rate. Do you think the difference can justify
the hw LRO support? Thanks in advance.
Regards,
Lorenzo
[0] the hw LRO requires contiguous memory pages to work. I reduced the size to
order-2 from order-5 (original implementation).
> 
> Regards,
> Lorenzo
> 
> > 
> > So if the HW places one aggregation session per buffer, and the buffer
> > is 32kB -- to avoid mem use ratio < 25% you'd need to copy all sessions
> > smaller than 8kB?
> > 
> > If I'm not making sense - just ignore, I haven't looked at the rest of
> > the driver :)
> > 
Download attachment "signature.asc" of type "application/pgp-signature" (229 bytes)
Powered by blists - more mailing lists
 
