[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1342601753.2626.2040.camel@edumazet-glaptop>
Date: Wed, 18 Jul 2012 10:55:53 +0200
From: Eric Dumazet <eric.dumazet@...il.com>
To: Francois Romieu <romieu@...zoreil.com>
Cc: netdev@...r.kernel.org, Hayes Wang <hayeswang@...ltek.com>
Subject: Re: [RFC] r8169 : why SG / TX checksum are default disabled
On Wed, 2012-07-18 at 01:40 +0200, Francois Romieu wrote:
> > (I found that activating them with ethtool automatically enables GSO,
> > and performance with GSO is not good)
>
> It's still an improvement though, isn't it ?
>
On an old AMD machine, I can get line rate with default conf, but using
nearly all cpu cycles.
Following test is only partial, a real one should use forwarding for
example...
# perf stat netperf -H eric -C -c -t OMNI
OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to eric () port 0 AF_INET
tcpi_rto 201000 tcpi_ato 0 tcpi_pmtu 1500 tcpi_rcv_ssthresh 14600
tcpi_rtt 1000 tcpi_rttvar 750 tcpi_snd_ssthresh 16 tpci_snd_cwnd 62
tcpi_reordering 3 tcpi_total_retrans 0
Local Remote Local Elapsed Throughput Throughput Local Local Remote Remote Local Remote Service
Send Socket Recv Socket Send Time Units CPU CPU CPU CPU Service Service Demand
Size Size Size (sec) Util Util Util Util Demand Demand Units
Final Final % Method % Method
290160 549032 16384 10.00 915.44 10^6bits/s 44.93 S 3.61 S 8.042 7.755 usec/KB
Performance counter stats for 'netperf -H eric -C -c -t OMNI':
5206,301186 task-clock # 0,520 CPUs utilized
16 568 context-switches # 0,003 M/sec
2 CPU-migrations # 0,000 K/sec
366 page-faults # 0,070 K/sec
12 362 775 266 cycles # 2,375 GHz [66,99%]
2 529 275 760 stalled-cycles-frontend # 20,46% frontend cycles idle [67,00%]
6 878 915 080 stalled-cycles-backend # 55,64% backend cycles idle [66,24%]
5 272 222 150 instructions # 0,43 insns per cycle
# 1,30 stalled cycles per insn [66,85%]
819 922 185 branches # 157,487 M/sec [66,79%]
50 135 423 branch-misses # 6,11% of all branches [66,15%]
10,019141027 seconds time elapsed
If I switch to SG+TX (GSO is automatically enabled), bandwidth is lower.
# ethtool -K eth1 tx on sg on
Actual changes:
tx-checksumming: on
tx-checksum-ipv4: on
scatter-gather: on
tx-scatter-gather: on
generic-segmentation-offload: on
# perf stat netperf -H eric -C -c -t OMNI
OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to eric () port 0 AF_INET
tcpi_rto 201000 tcpi_ato 0 tcpi_pmtu 1500 tcpi_rcv_ssthresh 14600
tcpi_rtt 1875 tcpi_rttvar 750 tcpi_snd_ssthresh 21 tpci_snd_cwnd 169
tcpi_reordering 3 tcpi_total_retrans 0
Local Remote Local Elapsed Throughput Throughput Local Local Remote Remote Local Remote Service
Send Socket Recv Socket Send Time Units CPU CPU CPU CPU Service Service Demand
Size Size Size (sec) Util Util Util Util Demand Demand Units
Final Final % Method % Method
790920 704640 16384 10.01 762.29 10^6bits/s 38.00 S 3.38 S 8.167 8.720 usec/KB
Performance counter stats for 'netperf -H eric -C -c -t OMNI':
4526,838736 task-clock # 0,452 CPUs utilized
2 031 context-switches # 0,449 K/sec
3 CPU-migrations # 0,001 K/sec
366 page-faults # 0,081 K/sec
4 476 876 825 cycles # 0,989 GHz [66,41%]
899 080 378 stalled-cycles-frontend # 20,08% frontend cycles idle [66,56%]
2 430 763 937 stalled-cycles-backend # 54,30% backend cycles idle [66,87%]
1 685 481 163 instructions # 0,38 insns per cycle
# 1,44 stalled cycles per insn [66,93%]
280 404 977 branches # 61,943 M/sec [66,73%]
15 608 497 branch-misses # 5,57% of all branches [66,54%]
10,025486268 seconds time elapsed
Since most frames need between 2 and 3 segments
(one for the ip/tcp headers, and one or two frags for the payload), this
might be a MMIO issue, that Alexander tried to solve recently...
If I only switch to SG+TX its ok
# ethtool -K eth1 gso off
# perf stat netperf -H eric -C -c -t OMNI
OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to eric () port 0 AF_INET
tcpi_rto 201000 tcpi_ato 0 tcpi_pmtu 1500 tcpi_rcv_ssthresh 14600
tcpi_rtt 1000 tcpi_rttvar 750 tcpi_snd_ssthresh 18 tpci_snd_cwnd 60
tcpi_reordering 3 tcpi_total_retrans 0
Local Remote Local Elapsed Throughput Throughput Local Local Remote Remote Local Remote Service
Send Socket Recv Socket Send Time Units CPU CPU CPU CPU Service Service Demand
Size Size Size (sec) Util Util Util Util Demand Demand Units
Final Final % Method % Method
280800 549032 16384 10.00 916.61 10^6bits/s 40.05 S 3.62 S 7.159 7.774 usec/KB
Performance counter stats for 'netperf -H eric -C -c -t OMNI':
4827,259625 task-clock # 0,482 CPUs utilized
17 988 context-switches # 0,004 M/sec
3 CPU-migrations # 0,001 K/sec
366 page-faults # 0,076 K/sec
11 448 148 411 cycles # 2,372 GHz [66,57%]
2 278 563 777 stalled-cycles-frontend # 19,90% frontend cycles idle [66,38%]
6 420 123 655 stalled-cycles-backend # 56,08% backend cycles idle [66,38%]
4 471 468 064 instructions # 0,39 insns per cycle
# 1,44 stalled cycles per insn [67,48%]
757 302 269 branches # 156,880 M/sec [67,08%]
44 320 435 branch-misses # 5,85% of all branches [66,16%]
10,020331031 seconds time elapsed
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists