netdev - Re: twice past the taps, thence out to net?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1324064044.2621.20.camel@edumazet-laptop>
Date:	Fri, 16 Dec 2011 20:34:04 +0100
From:	Eric Dumazet <eric.dumazet@...il.com>
To:	Jesse Brandeburg <jesse.brandeburg@...il.com>
Cc:	Rick Jones <rick.jones2@...com>,
	Stephen Hemminger <shemminger@...tta.com>,
	Vijay Subramanian <subramanian.vijay@...il.com>,
	tcpdump-workers@...ts.tcpdump.org, netdev@...r.kernel.org,
	Matthew Vick <matthew.vick@...el.com>,
	Jeff Kirsher <jeffrey.t.kirsher@...el.com>
Subject: Re: twice past the taps, thence out to net?

Le vendredi 16 décembre 2011 à 10:28 -0800, Jesse Brandeburg a écrit :
> On Thu, Dec 15, 2011 at 8:27 PM, Eric Dumazet <eric.dumazet@...il.com> wrote:
> > Le jeudi 15 décembre 2011 à 14:22 -0800, Rick Jones a écrit :
> >> On 12/15/2011 11:00 AM, Eric Dumazet wrote:
> >> >> Device's work better if the driver proactively manages stop_queue/wake_queue.
> >> >> Old devices used TX_BUSY, but newer devices tend to manage the queue
> >> >> themselves.
> >> >>
> >> >
> >> > Some 'new' drivers like igb can be fooled in case skb is gso segmented ?
> >> >
> >> > Because igb_xmit_frame_ring() needs skb_shinfo(skb)->nr_frags + 4
> >> > descriptors, igb should stop its queue not at MAX_SKB_FRAGS + 4, but
> >> > MAX_SKB_FRAGS*4
> 
> can you please help me understand the need for MAX_SKB_FRAGS * 4 as
> the requirement?  Currently driver uses logic like
> 
> in hard_start_tx: hey I just finished a tx, I should stop the qdisc if
> I don't have room (in tx descriptors) for a worst case transmit skb
> (MAX_SKB_FRAGS + 4) the next time I'm called.
> when cleaning from interrupt: My cleanup is done, do I have enough
> free tx descriptors (should be MAX_SKB_FRAGS + 4) for a worst case
> transmit?  If yes, restart qdisc.
> 
> I'm missing the jump from the above logic to using MAX_SKB_FRAGS * 4
> (== (18 * 4) == 72) as the minimum number of descriptors I need for a
> worst case TSO.  Each descriptor can point to up to 16kB of contiguous
> memory, typically we use 1 for offload context setup, 1 for skb->data,
> and 1 for each page.  I think we may be overestimating with
> MAX_SKB_FRAGS + 4, but that should be no big deal.

Did you read my second patch ?

Problem is you wakeup the queue too soon (16 available descriptors,
while a full TSO packet needs more than that)

How would you explain high 'requeues' number if it was not the problem ?

Also, its suboptimal to wakeup the queue if available space is very low,
since only _one_ packet may be dequeued from qdisc (you pay high cost in
cache line bouncing)

My first patch was about a very rare event : A full TSO packet is
segmented in gso_segment() [ say if you dynamically disable sg on eth
device and an old tcp buffer is retransmitted ] : You end with 16 skbs
delivered to NIC : In this case we can hit tx ring limit at 4th or 5th
skb, and Rick complains tcpdump outputs some packets several times ;)

Since igb needs 4 descriptors for linear skb, I said : 4 *
MAX_SKB_FRAGS, but real problem is addressed in my second patch, I
believe ?



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html