[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4E52DEEF.40504@intel.com>
Date: Mon, 22 Aug 2011 15:57:51 -0700
From: Alexander Duyck <alexander.h.duyck@...el.com>
To: David Miller <davem@...emloft.net>
CC: bhutchings@...arflare.com, jeffrey.t.kirsher@...el.com,
netdev@...r.kernel.org, gospo@...hat.com
Subject: Re: [net-next 03/10] ixgbe: Drop the TX work limit and instead just
leave it to budget
On 08/22/2011 01:56 PM, David Miller wrote:
> From: Alexander Duyck<alexander.h.duyck@...el.com>
> Date: Mon, 22 Aug 2011 10:29:51 -0700
>
>> The only problem I was seeing with that was that in certain cases it
>> seemed like the TX cleanup could consume enough CPU time to cause
>> pretty significant delays in processing the RX cleanup. This in turn
>> was causing single queue bi-directional routing tests to come out
>> pretty unbalanced since what seemed to happen is that one CPUs RX work
>> would overwhelm the other CPU with the TX processing resulting in an
>> unbalanced flow that was something like a 60/40 split between the
>> upstream and downstream throughput.
> But the problem is that now you're applying the budget to two operations
> that have much differing costs. Freeing up a TX ring packet is probably
> on the order of 1/10th the cost of processing an incoming RX ring frame.
>
> I've advocated to not apply the budget at all to TX ring processing.
I fully understand that the TX path is much cheaper than the RX path.
One step I have taken in all of this code is that the TX path only
counts SKBs cleaned, it doesn't count descriptors. So a single
descriptor 60byte transmit will cost the same as a 64K 18 descriptor
TSO. All I am really counting is the number of times I have called
dev_kfree_skb_any();
> I can see your delimma with respect to RX ring processing being delayed,
> but if that's really happening you can consider whether the TX ring is
> simply too large.
The problem was occurring even without large rings. I was seeing issues
with rings just 256 descriptors in size. The problem seemed to be that
the TX cleanup being a multiple of budget was allowing one CPU to
overwhelm the other and the fact that the TX was essentially unbounded
was just allowing the issue to feedback on itself.
In the routing test case I was actually seeing significant advantages to
this approach as we were essentially cleaning just the right number of
buffers to make room for the next set of transmits when the RX cleanup
came though. In addition since the RX and TX workload was balanced it
kept both locked into polling while the CPU was saturated instead of
allowing the TX to become interrupt driven. In addition since the TX
was working on the same budget as the RX the number of SKBs freed up in
the TX path would match the number consumed when being reallocated on
the RX path.
> In any event can you try something like dampening the cost applied to
> budget for TX work (1/2, 1/4, etc.)? Because as far as I can tell, if
> you are really hitting the budget limit on TX then you won't be doing
> any RX work on that device until a future NAPI round that depletes the
> TX ring work without going over the budget.
The problem seemed to be present as long as I allowed the TX budget to
be a multiple of the RX budget. The easiest way to keep things balanced
and avoid allowing the TX from one CPU to overwhelm the RX on another
was just to keep the budgets equal.
I'm a bit confused by this last comment. The full budget is used for TX
and RX, it isn't divided. I do a budget worth of TX cleanup and a
budget worth of RX cleanup within the ixgbe_poll routine, and if either
of them consume their full budget then I return the budget value as the
work done.
If you are referring to the case where two devices are sharing the CPU
then I would suspect this might lead to faster consumption of the
netdev_budget, but other than that I don't see any starvation issues for
RX or TX.
Thanks,
Alex
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists