[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <Zp6kXQkiOOI2IPT2@LQ3V64L9R2>
Date: Mon, 22 Jul 2024 11:26:37 -0700
From: Joe Damato <jdamato@...tly.com>
To: Elad Yifee <eladwf@...il.com>
Cc: daniel@...rotopia.org, Felix Fietkau <nbd@....name>,
Sean Wang <sean.wang@...iatek.com>,
Mark Lee <Mark-MC.Lee@...iatek.com>,
Lorenzo Bianconi <lorenzo@...nel.org>,
"David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
Matthias Brugger <matthias.bgg@...il.com>,
AngeloGioacchino Del Regno <angelogioacchino.delregno@...labora.com>,
Alexei Starovoitov <ast@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>,
Jesper Dangaard Brouer <hawk@...nel.org>,
John Fastabend <john.fastabend@...il.com>, netdev@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-arm-kernel@...ts.infradead.org,
linux-mediatek@...ts.infradead.org, bpf@...r.kernel.org
Subject: Re: [PATCH net-next RFC] net: ethernet: mtk_eth_soc: use prefetch
methods
On Mon, Jul 22, 2024 at 09:04:06PM +0300, Elad Yifee wrote:
> On Mon, Jul 22, 2024 at 7:17 PM Joe Damato <jdamato@...tly.com> wrote:
> >
> > On Sat, Jul 20, 2024 at 07:46:18PM +0300, Elad Yifee wrote:
> > > Utilize kernel prefetch methods for faster cache line access.
> > > This change boosts driver performance,
> > > allowing the CPU to handle about 5% more packets/sec.
> >
> > Nit: It'd be great to see before/after numbers and/or an explanation of
> > how you measured this in the commit message.
> Sure, I'll add iperf3 results in the next version.
Thanks, that'd be helpful!
[...]
> > > @@ -2039,7 +2040,7 @@ static int mtk_poll_rx(struct napi_struct *napi, int budget,
> > > idx = NEXT_DESP_IDX(ring->calc_idx, ring->dma_size);
> > > rxd = ring->dma + idx * eth->soc->rx.desc_size;
> > > data = ring->data[idx];
> > > -
> > > + prefetch(rxd);
> >
> > Maybe net_prefetch instead, as mentioned above?
> This is the only case where I think prefetch should be used since it's
> only the descriptor.
I think you are implying that the optimization in the case of
L1_CACHE_BYTES < 128 is unnecessary because because the
mtk_rx_dma_v2 descriptors will be too far (i *
eth->soc->rx.desc_size) apart to get any benefit from prefetching
more data ?
If my understanding is correct, then yes: I agree.
> Thank you for your suggestions
No problem!
Powered by blists - more mailing lists