[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20120814011526.GB29337@windriver.com>
Date: Mon, 13 Aug 2012 21:15:26 -0400
From: Paul Gortmaker <paul.gortmaker@...driver.com>
To: Claudiu Manoil <claudiu.manoil@...escale.com>
CC: Tomas Hruby <thruby@...il.com>,
Eric Dumazet <eric.dumazet@...il.com>,
<netdev@...r.kernel.org>, "David S. Miller" <davem@...emloft.net>
Subject: Re: [RFC net-next 0/4] gianfar: Use separate NAPI for Tx
confirmation processing
[Re: [RFC net-next 0/4] gianfar: Use separate NAPI for Tx confirmation processing] On 13/08/2012 (Mon 19:23) Claudiu Manoil wrote:
> On 08/09/2012 06:07 PM, Claudiu Manoil wrote:
> >On 8/9/2012 2:06 AM, Tomas Hruby wrote:
> >>On Wed, Aug 8, 2012 at 9:44 AM, Eric Dumazet
> >><eric.dumazet@...il.com> wrote:
> >>>On Wed, 2012-08-08 at 12:24 -0400, Paul Gortmaker wrote:
> >>>>[[RFC net-next 0/4] gianfar: Use separate NAPI for Tx
> >>>>confirmation processing] On 08/08/2012 (Wed 15:26) Claudiu
> >>>>Manoil wrote:
> >>>>
> >>>>>Hi all,
> >>>>>This set of patches basically splits the existing napi
> >>>>>poll routine into
> >>>>>two separate napi functions, one for Rx processing
> >>>>>(triggered by frame
> >>>>>receive interrupts only) and one for the Tx confirmation
> >>>>>path processing
> >>>>>(triggerred by Tx confirmation interrupts only). The
> >>>>>polling algorithm
> >>>>>behind remains much the same.
> >>>>>
> >>>>>Important throughput improvements have been noted on low
> >>>>>power boards with
> >>>>>this set of changes.
> >>>>>For instance, for the following netperf test:
> >>>>>netperf -l 20 -cC -H 192.168.10.1 -t TCP_STREAM -- -m 1500
> >>>>>yields a throughput gain from oscilating ~500-~700 Mbps to
> >>>>>steady ~940 Mbps,
> >>>>>(if the Rx/Tx paths are processed on different cores), w/
> >>>>>no increase in CPU%,
> >>>>>on a p1020rdb - 2 core machine featuring etsec2.0
> >>>>>(Multi-Queue Multi-Group
> >>>>>driver mode).
> >>>>
> >>>>It would be interesting to know more about what was causing that large
> >>>>an oscillation -- presumably you will have it reappear once one core
> >>>>becomes 100% utilized. Also, any thoughts on how the change
> >>>>will change
> >>>>performance on an older low power single core gianfar system
> >>>>(e.g. 83xx)?
> >>>
> >>>I also was wondering if this low performance could be caused by BQL
> >>>
> >>>Since TCP stack is driven by incoming ACKS, a NAPI run could have to
> >>>handle 10 TCP acks in a row, and resulting xmits could hit BQL and
> >>>transit on qdisc (Because NAPI handler wont handle TX
> >>>completions in the
> >>>middle of RX handler)
> >>
> >>Does disabling BQL help? Is the BQL limit stable? To what value is it
> >>set? I would be very much interested in more data if the issue is BQL
> >>related.
> >>
> >>.
> >>
> >
> >I agree that more tests should be run to investigate why gianfar under-
> >performs on the low power p1020rdb platform, and BQL seems to be
> >a good starting point (thanks for the hint). What I can say now is that
> >the issue is not apparent on p2020rdb, for instance, which is a more
> >powerful platform: the CPUs - 1200 MHz instead of 800 MHz; twice the
> >size of L2 cache (512 KB), greater bus (CCB) frequency ... On this
> >board (p2020rdb) the netperf test reaches 940Mbps both w/ and w/o these
> >patches.
> >
> >For a single core system I'm not expecting any performance degradation,
> >simply because I don't see why the proposed napi poll implementation
> >would be slower than the existing one. I'll do some measurements on a
> >p1010rdb too (single core, CPU:800 MHz) and get back to you with the
> >results.
> >
>
> Hi all,
>
> Please find below the netperf measurements performed on a p1010rdb machine
> (single core, low power). Three kernel images were used:
> 1) Linux version 3.5.0-20970-gaae06bf -- net-next commit aae06bf
> 2) Linux version 3.5.0-20974-g2920464 -- commit aae06bf + Tx NAPI patches
> 3) Linux version 3.5.0-20970-gaae06bf-dirty -- commit aae06bf +
> CONFIG_BQL set to 'n'
For future reference, you don't need to dirty the tree to disable
CONFIG_BQL at compile time; there is a runtime disable:
http://permalink.gmane.org/gmane.linux.network/223107
>
> The results show that, on *Image 1)*, by adjusting
> tcp_limit_output_bytes no substantial
> improvements are seen, as the throughput stays in the 580-60x Mbps range .
This is a lot lower variation than what you reported earlier (20 versus
200, I think). It was the variation that raised a red flag for me...
> By changing the coalescing settings from default* (rx coalescing off,
> tx-usecs: 10, tx-frames: 16) to:
> "ethtool -C eth1 rx-frames 22 tx-frames 22 rx-usecs 32 tx-usecs 32"
> we get a throughput of ~710 Mbps.
>
> For *Image 2)*, using the default tcp_limit_output_bytes value
> (131072) - I've noticed
> that "tweaking" tcp_limit_output_bytes does not improve the
> throughput -, we get the
> following performance numbers:
> * default coalescing settings: ~650 Mbps
> * rx-frames tx-frames 22 rx-usecs 32 tx-usecs 32: ~860-880 Mbps
>
> For *Image 3)*, by disabling BQL (CONFIG_BQL = n), there's *no*
> relevant performance
> improvement compared to Image 1).
> (note:
> For all the measurements, rx and tx BD ring sizes have been set to
> 64, for best performance.)
>
> So, I really tend to believe that the performance degradation comes
> primarily from the driver,
> and the napi poll processing turns out to be an important source for
> that. The proposed patches
This would make sense, if the CPU was slammed at 100% load in dealing
with the tx processing, and the change made the driver considerably more
efficient. But is that really the case? Is the p1010 really going flat
out just to handle the Tx processing? Have you done any sort of
profiling to confirm/deny where the CPU is spending its time?
> show substantial improvement, especially for SMP systems where Tx
> and Rx processing may be
> done in parallel.
> What do you think?
> Is it ok to proceed by re-spinning the patches? Do you recommend
> additional measurements?
Unfortunately Eric is out this week, so we will be without his input for
a while. However, we are only at 3.6-rc1 -- meaning net-next will be
open for quite some time, hence no need to rush to try and jam stuff in.
Also, I have two targets I'm interested in testing your patches on. The
1st is a 500MHz mpc8349 board -- which should replicate what you see on
your p1010 (slow, single core). The other is an 8641D, which is
interesting since it will give us the SMP tx/rx as separate threads, but
without the MQ_MG_MODE support (is that a correct assumption?)
I don't have any fundamental problem with your patches (although 4/4
might be better as two patches) -- the above targets/tests are only
of interest, since I'm not convinced we yet understand _why_ your
changes give a performance boost, and there might be something
interesting hiding in there.
So, while Eric is out, let me see if I can collect some more data on
those two targets sometime this week.
Thanks,
Paul.
--
>
> Regards,
> Claudiu
>
> //=Image 1)================
> root@...10rdb:~# cat /proc/version
> Linux version 3.5.0-20970-gaae06bf [...]
>
> root@...10rdb:~# zcat /proc/config.gz | grep BQL
> CONFIG_BQL=y
> root@...10rdb:~# cat /proc/sys/net/ipv4/tcp_limit_output_bytes
> 131072
>
> root@...10rdb:~# netperf -l 20 -cC -H 192.168.10.1 -t TCP_STREAM -- -m 1500
> TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
> 192.168.10.1 (192.168.10.1) port 0 AF_INET
> Recv Send Send Utilization Service Demand
> Socket Socket Message Elapsed Send Recv Send Recv
> Size Size Size Time Throughput local remote local remote
> bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB
>
> 87380 16384 1500 20.00 580.76 99.95 11.76 14.099 1.659
> root@...10rdb:~# netperf -l 20 -cC -H 192.168.10.1 -t TCP_STREAM -- -m 1500
> TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
> 192.168.10.1 (192.168.10.1) port 0 AF_INET
> Recv Send Send Utilization Service Demand
> Socket Socket Message Elapsed Send Recv Send Recv
> Size Size Size Time Throughput local remote local remote
> bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB
>
> 87380 16384 1500 20.00 598.21 99.95 10.91 13.687 1.493
> root@...10rdb:~# netperf -l 20 -cC -H 192.168.10.1 -t TCP_STREAM -- -m 1500
> TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
> 192.168.10.1 (192.168.10.1) port 0 AF_INET
> Recv Send Send Utilization Service Demand
> Socket Socket Message Elapsed Send Recv Send Recv
> Size Size Size Time Throughput local remote local remote
> bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB
>
> 87380 16384 1500 20.00 583.04 99.95 11.25 14.043 1.581
>
>
> root@...10rdb:~# cat /proc/sys/net/ipv4/tcp_limit_output_bytes
> 65536
>
> root@...10rdb:~# netperf -l 20 -cC -H 192.168.10.1 -t TCP_STREAM -- -m 1500
> TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
> 192.168.10.1 (192.168.10.1) port 0 AF_INET
> Recv Send Send Utilization Service Demand
> Socket Socket Message Elapsed Send Recv Send Recv
> Size Size Size Time Throughput local remote local remote
> bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB
>
> 87380 16384 1500 20.00 604.29 99.95 11.15 13.550 1.512
> root@...10rdb:~# netperf -l 20 -cC -H 192.168.10.1 -t TCP_STREAM -- -m 1500
> TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
> 192.168.10.1 (192.168.10.1) port 0 AF_INET
> Recv Send Send Utilization Service Demand
> Socket Socket Message Elapsed Send Recv Send Recv
> Size Size Size Time Throughput local remote local remote
> bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB
>
> 87380 16384 1500 20.00 603.52 99.50 12.57 13.506 1.706
> root@...10rdb:~# netperf -l 20 -cC -H 192.168.10.1 -t TCP_STREAM -- -m 1500
> TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
> 192.168.10.1 (192.168.10.1) port 0 AF_INET
> Recv Send Send Utilization Service Demand
> Socket Socket Message Elapsed Send Recv Send Recv
> Size Size Size Time Throughput local remote local remote
> bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB
>
> 87380 16384 1500 20.00 596.18 99.95 12.81 13.734 1.760
>
>
>
> root@...10rdb:~# cat /proc/sys/net/ipv4/tcp_limit_output_bytes
> 32768
>
> root@...10rdb:~# netperf -l 20 -cC -H 192.168.10.1 -t TCP_STREAM -- -m 1500
> TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
> 192.168.10.1 (192.168.10.1) port 0 AF_INET
> Recv Send Send Utilization Service Demand
> Socket Socket Message Elapsed Send Recv Send Recv
> Size Size Size Time Throughput local remote local remote
> bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB
>
> 87380 16384 1500 20.00 582.32 99.95 12.96 14.061 1.824
> root@...10rdb:~# netperf -l 20 -cC -H 192.168.10.1 -t TCP_STREAM -- -m 1500
> TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
> 192.168.10.1 (192.168.10.1) port 0 AF_INET
> Recv Send Send Utilization Service Demand
> Socket Socket Message Elapsed Send Recv Send Recv
> Size Size Size Time Throughput local remote local remote
> bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB
>
> 87380 16384 1500 20.00 583.79 99.95 11.19 14.026 1.571
> root@...10rdb:~# netperf -l 20 -cC -H 192.168.10.1 -t TCP_STREAM -- -m 1500
> TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
> 192.168.10.1 (192.168.10.1) port 0 AF_INET
> Recv Send Send Utilization Service Demand
> Socket Socket Message Elapsed Send Recv Send Recv
> Size Size Size Time Throughput local remote local remote
> bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB
>
> 87380 16384 1500 20.00 584.16 99.95 11.36 14.016 1.592
>
>
>
> root@...10rdb:~# ethtool -C eth1 rx-frames 22 tx-frames 22 rx-usecs
> 32 tx-usecs 32
>
> root@...10rdb:~# netperf -l 20 -cC -H 192.168.10.1 -t TCP_STREAM -- -m 1500
> TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
> 192.168.10.1 (192.168.10.1) port 0 AF_INET
> Recv Send Send Utilization Service Demand
> Socket Socket Message Elapsed Send Recv Send Recv
> Size Size Size Time Throughput local remote local remote
> bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB
>
> 87380 16384 1500 20.00 708.77 99.85 13.32 11.541 1.540
> root@...10rdb:~# netperf -l 20 -cC -H 192.168.10.1 -t TCP_STREAM -- -m 1500
> TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
> 192.168.10.1 (192.168.10.1) port 0 AF_INET
> Recv Send Send Utilization Service Demand
> Socket Socket Message Elapsed Send Recv Send Recv
> Size Size Size Time Throughput local remote local remote
> bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB
>
> 87380 16384 1500 20.00 710.50 99.95 12.46 11.524 1.437
> root@...10rdb:~# netperf -l 20 -cC -H 192.168.10.1 -t TCP_STREAM -- -m 1500
> TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
> 192.168.10.1 (192.168.10.1) port 0 AF_INET
> Recv Send Send Utilization Service Demand
> Socket Socket Message Elapsed Send Recv Send Recv
> Size Size Size Time Throughput local remote local remote
> bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB
>
> 87380 16384 1500 20.00 709.95 99.95 14.15 11.533 1.633
>
>
> //=Image 2)================
>
> root@...10rdb:~# cat /proc/version
> Linux version 3.5.0-20974-g2920464 [...]
>
> root@...10rdb:~# netperf -l 20 -cC -H 192.168.10.1 -t TCP_STREAM -- -m 1500
> TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
> 192.168.10.1 (192.168.10.1) port 0 AF_INET
> Recv Send Send Utilization Service Demand
> Socket Socket Message Elapsed Send Recv Send Recv
> Size Size Size Time Throughput local remote local remote
> bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB
>
> 87380 16384 1500 20.00 652.60 99.95 13.05 12.547 1.638
> root@...10rdb:~# netperf -l 20 -cC -H 192.168.10.1 -t TCP_STREAM -- -m 1500
> TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
> 192.168.10.1 (192.168.10.1) port 0 AF_INET
> Recv Send Send Utilization Service Demand
> Socket Socket Message Elapsed Send Recv Send Recv
> Size Size Size Time Throughput local remote local remote
> bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB
>
> 87380 16384 1500 20.00 657.47 99.95 11.81 12.454 1.471
> root@...10rdb:~# netperf -l 20 -cC -H 192.168.10.1 -t TCP_STREAM -- -m 1500
> TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
> 192.168.10.1 (192.168.10.1) port 0 AF_INET
> Recv Send Send Utilization Service Demand
> Socket Socket Message Elapsed Send Recv Send Recv
> Size Size Size Time Throughput local remote local remote
> bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB
>
> 87380 16384 1500 20.00 655.77 99.95 11.80 12.486 1.474
>
>
> root@...10rdb:~# ethtool -C eth1 rx-frames 22 rx-usecs 32 tx-frames
> 22 tx-usecs 32
>
> root@...10rdb:~# cat /proc/sys/net/ipv4/tcp_limit_output_bytes
> 131072
> root@...10rdb:~# netperf -l 20 -cC -H 192.168.10.1 -t TCP_STREAM -- -m 1500
> TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
> 192.168.10.1 (192.168.10.1) port 0 AF_INET
> Recv Send Send Utilization Service Demand
> Socket Socket Message Elapsed Send Recv Send Recv
> Size Size Size Time Throughput local remote local remote
> bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB
>
> 87380 16384 1500 20.01 882.42 99.20 18.06 9.209 1.676
> root@...10rdb:~# netperf -l 20 -cC -H 192.168.10.1 -t TCP_STREAM -- -m 1500
> TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
> 192.168.10.1 (192.168.10.1) port 0 AF_INET
> Recv Send Send Utilization Service Demand
> Socket Socket Message Elapsed Send Recv Send Recv
> Size Size Size Time Throughput local remote local remote
> bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB
>
> 87380 16384 1500 20.00 867.02 99.75 16.21 9.425 1.531
> root@...10rdb:~# netperf -l 20 -cC -H 192.168.10.1 -t TCP_STREAM -- -m 1500
> TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
> 192.168.10.1 (192.168.10.1) port 0 AF_INET
> Recv Send Send Utilization Service Demand
> Socket Socket Message Elapsed Send Recv Send Recv
> Size Size Size Time Throughput local remote local remote
> bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB
>
> 87380 16384 1500 20.01 874.29 99.85 15.25 9.356 1.429
>
>
> //=Image 3)================
>
> Linux version 3.5.0-20970-gaae06bf-dirty [...] //CONFIG_BQL = n
>
> root@...10rdb:~# cat /proc/version
> Linux version 3.5.0-20970-gaae06bf-dirty
> (b08782@...04-ws574.ea.freescale.net) (gcc version 4.6.2 (GCC) ) #3
> Mon Aug 13 13:58:25 EEST 2012
> root@...10rdb:~# zcat /proc/config.gz | grep BQL
> # CONFIG_BQL is not set
>
> root@...10rdb:~# netperf -l 20 -cC -H 192.168.10.1 -t TCP_STREAM -- -m 1500
> TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
> 192.168.10.1 (192.168.10.1) port 0 AF_INET
> Recv Send Send Utilization Service Demand
> Socket Socket Message Elapsed Send Recv Send Recv
> Size Size Size Time Throughput local remote local remote
> bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB
>
> 87380 16384 1500 20.00 595.08 99.95 12.51 13.759 1.722
> root@...10rdb:~# netperf -l 20 -cC -H 192.168.10.1 -t TCP_STREAM -- -m 1500
> TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
> 192.168.10.1 (192.168.10.1) port 0 AF_INET
> Recv Send Send Utilization Service Demand
> Socket Socket Message Elapsed Send Recv Send Recv
> Size Size Size Time Throughput local remote local remote
> bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB
>
> 87380 16384 1500 20.00 593.95 99.95 10.96 13.785 1.511
> root@...10rdb:~# netperf -l 20 -cC -H 192.168.10.1 -t TCP_STREAM -- -m 1500
> TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
> 192.168.10.1 (192.168.10.1) port 0 AF_INET
> Recv Send Send Utilization Service Demand
> Socket Socket Message Elapsed Send Recv Send Recv
> Size Size Size Time Throughput local remote local remote
> bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB
>
> 87380 16384 1500 20.00 595.30 99.90 11.11 13.747 1.528
>
> root@...10rdb:~# ethtool -C eth1 rx-frames 22 rx-usecs 32 tx-frames
> 22 tx-usecs 32
> root@...10rdb:~# netperf -l 20 -cC -H 192.168.10.1 -t TCP_STREAM -- -m 1500
> TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
> 192.168.10.1 (192.168.10.1) port 0 AF_INET
> Recv Send Send Utilization Service Demand
> Socket Socket Message Elapsed Send Recv Send Recv
> Size Size Size Time Throughput local remote local remote
> bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB
>
> 87380 16384 1500 20.00 710.46 99.95 12.46 11.525 1.437
> root@...10rdb:~# netperf -l 20 -cC -H 192.168.10.1 -t TCP_STREAM -- -m 1500
> TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
> 192.168.10.1 (192.168.10.1) port 0 AF_INET
> Recv Send Send Utilization Service Demand
> Socket Socket Message Elapsed Send Recv Send Recv
> Size Size Size Time Throughput local remote local remote
> bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB
>
> 87380 16384 1500 20.00 714.27 99.95 14.05 11.463 1.611
> root@...10rdb:~# netperf -l 20 -cC -H 192.168.10.1 -t TCP_STREAM -- -m 1500
> TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
> 192.168.10.1 (192.168.10.1) port 0 AF_INET
> Recv Send Send Utilization Service Demand
> Socket Socket Message Elapsed Send Recv Send Recv
> Size Size Size Time Throughput local remote local remote
> bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB
>
> 87380 16384 1500 20.00 717.69 99.95 12.56 11.409 1.433
> .
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists