[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220903004420.91740-1-kuniyu@amazon.com>
Date: Fri, 2 Sep 2022 17:44:20 -0700
From: Kuniyuki Iwashima <kuniyu@...zon.com>
To: <kuniyu@...zon.com>
CC: <davem@...emloft.net>, <edumazet@...gle.com>, <kuba@...nel.org>,
<kuni1840@...il.com>, <netdev@...r.kernel.org>, <pabeni@...hat.com>
Subject: Re: [PATCH v3 net-next 3/5] tcp: Access &tcp_hashinfo via net.
From: Kuniyuki Iwashima <kuniyu@...zon.com>
Date: Thu, 1 Sep 2022 15:12:16 -0700
> From: Eric Dumazet <edumazet@...gle.com>
> Date: Thu, 1 Sep 2022 14:30:43 -0700
> > On Thu, Sep 1, 2022 at 2:25 PM Kuniyuki Iwashima <kuniyu@...zon.com> wrote:
> > >
> > > From: Paolo Abeni <pabeni@...hat.com>
> >
> > > > /Me is thinking aloud...
> > > >
> > > > I'm wondering if the above has some measurable negative effect for
> > > > large deployments using only the main netns?
> > > >
> > > > Specifically, are net->ipv4.tcp_death_row and net->ipv4.tcp_death_row-
> > > > >hashinfo already into the working set data for established socket?
> > > > Would the above increase the WSS by 2 cache-lines?
> > >
> > > Currently, the death_row and hashinfo are touched around tw sockets or
> > > connect(). If connections on the deployment are short-lived or frequently
> > > initiated by itself, that would be host and included in WSS.
> > >
> > > If the workload is server and there's no active-close() socket or
> > > connections are long-lived, then it might not be included in WSS.
> > > But I think it's not likely than the former if the deployment is
> > > large enough.
> > >
> > > If this change had large impact, then we could revert fbb8295248e1
> > > which converted net->ipv4.tcp_death_row into pointer for 0dad4087a86a
> > > that tried to fire a TW timer after netns is freed, but 0dad4087a86a
> > > has already reverted.
> >
> >
> > Concern was fast path.
> >
> > Each incoming packet does a socket lookup.
> >
> > Fetching hashinfo (instead of &tcp_hashinfo) with a dereference of a
> > field in 'struct net' might inccurr a new cache line miss.
> >
> > Previously, first cache line of tcp_info was enough to bring a lot of
> > fields in cpu cache.
>
> Ok, let me test on that if there could be regressions.
I tested tcp_hashinfo vs tcp_death_row->hashinfo with super_netperf
and collected HW cache-related metrics with perf.
After the patch the number of L1 miss seems to increase, but the
instructions per cycle also increases, and cache miss rate did not
change. Also, there was not performance regression for netperf.
Tested:
# cat perf_super_netperf
echo 0 > /proc/sys/kernel/nmi_watchdog
echo 3 > /proc/sys/vm/drop_caches
perf stat -a \
-e cycles,instructions,cache-references,cache-misses,bus-cycles \
-e L1-dcache-loads,L1-dcache-load-misses,L1-dcache-stores \
-e dTLB-loads,dTLB-load-misses \
-e LLC-loads,LLC-load-misses,LLC-stores \
./super_netperf $(($(nproc) * 2)) -H 10.0.0.142 -l 60 -fM
echo 1 > /proc/sys/kernel/nmi_watchdog
Before:
# ./perf_super_netperf
2929.81
Performance counter stats for 'system wide':
494,002,600,338 cycles (23.07%)
241,230,662,890 instructions # 0.49 insn per cycle (30.76%)
6,303,603,008 cache-references (38.45%)
1,421,440,332 cache-misses # 22.550 % of all cache refs (46.15%)
4,861,179,308 bus-cycles (46.15%)
65,410,735,599 L1-dcache-loads (46.15%)
12,647,247,339 L1-dcache-load-misses # 19.34% of all L1-dcache accesses (30.77%)
32,912,656,369 L1-dcache-stores (30.77%)
66,015,779,361 dTLB-loads (30.77%)
81,293,994 dTLB-load-misses # 0.12% of all dTLB cache accesses (30.77%)
2,946,386,949 LLC-loads (30.77%)
257,223,942 LLC-load-misses # 8.73% of all LL-cache accesses (30.77%)
1,183,820,461 LLC-stores (15.38%)
62.132250590 seconds time elapsed
After:
# ./perf_super_netperf
2930.17
Performance counter stats for 'system wide':
479,595,776,631 cycles (23.07%)
243,318,957,230 instructions # 0.51 insn per cycle (30.76%)
6,169,892,840 cache-references (38.46%)
1,381,992,694 cache-misses # 22.399 % of all cache refs (46.15%)
4,534,304,190 bus-cycles (46.16%)
66,059,178,377 L1-dcache-loads (46.17%)
12,759,529,139 L1-dcache-load-misses # 19.32% of all L1-dcache accesses (30.78%)
33,292,513,002 L1-dcache-stores (30.78%)
66,482,176,008 dTLB-loads (30.77%)
72,877,970 dTLB-load-misses # 0.11% of all dTLB cache accesses (30.76%)
2,984,881,101 LLC-loads (30.76%)
234,747,930 LLC-load-misses # 7.86% of all LL-cache accesses (30.76%)
1,165,606,022 LLC-stores (15.38%)
62.110708964 seconds time elapsed
Powered by blists - more mailing lists