[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1324064044.2621.20.camel@edumazet-laptop>
Date: Fri, 16 Dec 2011 20:34:04 +0100
From: Eric Dumazet <eric.dumazet@...il.com>
To: Jesse Brandeburg <jesse.brandeburg@...il.com>
Cc: Rick Jones <rick.jones2@...com>,
Stephen Hemminger <shemminger@...tta.com>,
Vijay Subramanian <subramanian.vijay@...il.com>,
tcpdump-workers@...ts.tcpdump.org, netdev@...r.kernel.org,
Matthew Vick <matthew.vick@...el.com>,
Jeff Kirsher <jeffrey.t.kirsher@...el.com>
Subject: Re: twice past the taps, thence out to net?
Le vendredi 16 décembre 2011 à 10:28 -0800, Jesse Brandeburg a écrit :
> On Thu, Dec 15, 2011 at 8:27 PM, Eric Dumazet <eric.dumazet@...il.com> wrote:
> > Le jeudi 15 décembre 2011 à 14:22 -0800, Rick Jones a écrit :
> >> On 12/15/2011 11:00 AM, Eric Dumazet wrote:
> >> >> Device's work better if the driver proactively manages stop_queue/wake_queue.
> >> >> Old devices used TX_BUSY, but newer devices tend to manage the queue
> >> >> themselves.
> >> >>
> >> >
> >> > Some 'new' drivers like igb can be fooled in case skb is gso segmented ?
> >> >
> >> > Because igb_xmit_frame_ring() needs skb_shinfo(skb)->nr_frags + 4
> >> > descriptors, igb should stop its queue not at MAX_SKB_FRAGS + 4, but
> >> > MAX_SKB_FRAGS*4
>
> can you please help me understand the need for MAX_SKB_FRAGS * 4 as
> the requirement? Currently driver uses logic like
>
> in hard_start_tx: hey I just finished a tx, I should stop the qdisc if
> I don't have room (in tx descriptors) for a worst case transmit skb
> (MAX_SKB_FRAGS + 4) the next time I'm called.
> when cleaning from interrupt: My cleanup is done, do I have enough
> free tx descriptors (should be MAX_SKB_FRAGS + 4) for a worst case
> transmit? If yes, restart qdisc.
>
> I'm missing the jump from the above logic to using MAX_SKB_FRAGS * 4
> (== (18 * 4) == 72) as the minimum number of descriptors I need for a
> worst case TSO. Each descriptor can point to up to 16kB of contiguous
> memory, typically we use 1 for offload context setup, 1 for skb->data,
> and 1 for each page. I think we may be overestimating with
> MAX_SKB_FRAGS + 4, but that should be no big deal.
Did you read my second patch ?
Problem is you wakeup the queue too soon (16 available descriptors,
while a full TSO packet needs more than that)
How would you explain high 'requeues' number if it was not the problem ?
Also, its suboptimal to wakeup the queue if available space is very low,
since only _one_ packet may be dequeued from qdisc (you pay high cost in
cache line bouncing)
My first patch was about a very rare event : A full TSO packet is
segmented in gso_segment() [ say if you dynamically disable sg on eth
device and an old tcp buffer is retransmitted ] : You end with 16 skbs
delivered to NIC : In this case we can hit tx ring limit at 4th or 5th
skb, and Rick complains tcpdump outputs some packets several times ;)
Since igb needs 4 descriptors for linear skb, I said : 4 *
MAX_SKB_FRAGS, but real problem is addressed in my second patch, I
believe ?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists