[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CADjXwjjP_8mM8y6KRF6_VQDpM7-UAXKDW02gRKf3FeJijnjSPA@mail.gmail.com>
Date: Wed, 20 Sep 2023 23:47:34 -0700
From: Coco Li <lixiaoyan@...gle.com>
To: Andrew Lunn <andrew@...n.ch>
Cc: Jakub Kicinski <kuba@...nel.org>, Eric Dumazet <edumazet@...gle.com>,
Neal Cardwell <ncardwell@...gle.com>, Mubashir Adnan Qureshi <mubashirq@...gle.com>,
Paolo Abeni <pabeni@...hat.com>, netdev@...r.kernel.org, Chao Wu <wwchao@...gle.com>,
Wei Wang <weiwan@...gle.com>
Subject: Re: [PATCH v1 net-next 0/5] Analyze and Reorganize core Networking
Structs to optimize cacheline consumption
As replied in the other patch, we have arm64 platform in our testbeds
with smaller L3 cache (1.375MB vs 256Mb on AMD), but its L1/L2 cache
is similar or even bigger than our AMD platform cache. We will send
results with this platform in the next update.
Thank you for your suggestions.
On Sat, Sep 16, 2023 at 7:23 AM Andrew Lunn <andrew@...n.ch> wrote:
>
> On Sat, Sep 16, 2023 at 01:06:20AM +0000, Coco Li wrote:
> > Currently, variable-heavy structs in the networking stack is organized
> > chronologically, logically and sometimes by cache line access.
> >
> > This patch series attempts to reorganize the core networking stack
> > variables to minimize cacheline consumption during the phase of data
> > transfer. Specifically, we looked at the TCP/IP stack and the fast
> > path definition in TCP.
> >
> > For documentation purposes, we also added new files for each core data
> > structure we considered, although not all ended up being modified due
> > to the amount of existing cache line they span in the fast path. In
> > the documentation, we recorded all variables we identified on the
> > fast path and the reasons. We also hope that in the future when
> > variables are added/modified, the document can be referred to and
> > updated accordingly to reflect the latest variable organization.
> >
> > Tested:
> > Our tests were run with neper tcp_rr using tcp traffic. The tests have $cpu
> > number of threads and variable number of flows (see below).
> >
> > Tests were run on 6.5-rc1
> >
> > Efficiency is computed as cpu seconds / throughput (one tcp_rr round trip).
> > The following result shows Efficiency delta before and after the patch
> > series is applied.
> >
> > On AMD platforms with 100Gb/s NIC and 256Mb L3 cache:
>
> Would it be possible to run the same tests on a small ARM, MIPS or
> RISC-V machine? Something with small L1 and L2 cache, and no L3
> cache. You sometimes hear that the Linux network stack has become too
> big for small embedded systems, it is thrashing the caches. I suspect
> this change will help such machine. But i suppose it could also be bad
> for them. We won't know until it is tested.
>
> Andrew
>
Powered by blists - more mailing lists