[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4FFF1EFE.7070002@intel.com>
Date: Thu, 12 Jul 2012 12:01:18 -0700
From: Alexander Duyck <alexander.h.duyck@...el.com>
To: Stephen Hemminger <shemminger@...tta.com>
CC: netdev@...r.kernel.org, davem@...emloft.net,
jeffrey.t.kirsher@...el.com, edumazet@...gle.com,
bhutchings@...arflare.com, therbert@...gle.com,
alexander.duyck@...il.com
Subject: Re: [RFC PATCH 0/2] Coalesce MMIO writes for transmits
On 07/12/2012 10:23 AM, Stephen Hemminger wrote:
> On Wed, 11 Jul 2012 17:25:58 -0700
> Alexander Duyck <alexander.h.duyck@...el.com> wrote:
>
>> This patch set is meant to address recent issues I found with ixgbe
>> performance being bound by Tx tail writes. With these changes in place
>> and the dispatch_limit set to 1 or more I see a significant increase in
>> performance.
>>
>> In the case of one of my systems I saw the routing rate for 7 queues jump
>> from 10.5 to 11.7Mpps. The overall increase I have seen on most systems is
>> something on the order of about 15%. In the case of pktgen I have also
>> seen a noticeable increase as the previous limit for transmits was
>> ~12.5Mpps, but with this patch set in place and the dispatch_limit enabled
>> the value increases to ~14.2Mpps.
>>
>> I expected there to be an increase in latency, however so far I have not
>> ran into that. I have tried running NPtcp tests for latency and seen no
>> difference in the coalesced and non-coalesced transaction times. I welcome
>> any suggestions for tests I might run that might expose any latency issues
>> as a result of this patch.
>>
>> ---
>>
>> Alexander Duyck (2):
>> ixgbe: Add functionality for delaying the MMIO write for Tx
>> net: Add new network device function to allow for MMIO batching
>>
>>
>> drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 22 +++++++-
>> include/linux/netdevice.h | 57 +++++++++++++++++++++
>> net/core/dev.c | 67 +++++++++++++++++++++++++
>> net/core/net-sysfs.c | 36 +++++++++++++
>> 4 files changed, 180 insertions(+), 2 deletions(-)
>>
> This is a good idea. I was thinking of adding a multi-skb operation
> to netdevice_ops to allow this. Something like ndo_start_xmit_pkts but
> the problem is how to deal with the boundary case where there is only
> a limited number of slots in the ring. Using a "that's all folks"
> operation seems better.
I had considered a multi-skb operation originally, but the problem was
in my case I would have had to come up with a more complex buffering
mechanism to generate a stream of skbs before handing them off to the
device. By letting the transmit path proceed normally I shouldn't have
any effect on things like the byte queue limits for the transmit queues
and such.
The wierd bit is how this issue was showing up. I don't know if you
recall my presentation from plumbers last year, but one of the things I
had brought up was the qdisc spinlock being an issue. However it was
actually this MMIO write that was causing the problem because it was
posting a write to non-coherent memory and then the spinlock was getting
stalled behind the write and couldn't complete until the write was
completed. With this change in place and the dispatch_limit set to
something like 31 I see the CPU utilization for spinlocks drop from 15%
(90% sch_direct_xmit / 10% dev_queue_xmit) to 5% (66% sch_direct_xmit /
33% dev_queue_xmit). Makes me wonder what other hotspots we have in the
drivers that can be resolved by avoiding MMIO followed by locked
operations.
Thanks,
Alex
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists