lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 1 Sep 2015 09:22:00 -0700
From:	Alexander Duyck <alexander.duyck@...il.com>
To:	yzhu1 <Yanjun.Zhu@...driver.com>,
	Daniel Borkmann <daniel@...earbox.net>,
	David Miller <davem@...emloft.net>
Cc:	netdev@...r.kernel.org, therbert@...gle.com, jhs@...atatu.com,
	hannes@...essinduktion.org, edumazet@...gle.com,
	jeffrey.t.kirsher@...el.com, rusty@...tcorp.com.au,
	brouer@...hat.com
Subject: Re: [PATCH 1/2] net: Remove ndo_xmit_flush netdev operation, use
 signalling instead.

On 09/01/2015 02:21 AM, yzhu1 wrote:
> On 09/01/2015 04:23 PM, Daniel Borkmann wrote:
>> On 09/01/2015 09:10 AM, yzhu1 wrote:
>>> On 09/01/2015 03:00 PM, David Miller wrote:
>>>> From: yzhu1 <Yanjun.Zhu@...driver.com>
>>>> Date: Tue, 1 Sep 2015 14:46:38 +0800
>>>>
>>>>> After I applied this patch, the skb->xmit_more is not always zero.
>>>> There have been thousands upon thousands of commits since that
>>>> change.
>>>>
>>>> You should be testing the tree as it currently stands, to see
>>>> if xmit_more behaves correctly or not.
>>>>
>>>> If xmit_more were incorrectly set to 1 in the current tree, it
>>>> would stall the TX queue of the networking device and we would
>>>> be seeing lots of reports of this.
>>>>
>>> Thanks for your reply.
>>> Yes. After running for several days, the following messages will 
>>> appear.
>>
>> Your below trace says 3.14.29ltsi-WR7.0.0.0 ...
>>
>> As Dave said, please retest with something up to date, like 4.2 kernel,
>> or latest -net git tree.
>>
>> Besides, the *upstream* xmit_more changes first went into 3.18 ...
>> nearest git describe is at:
>>
>>   $ git describe 0b725a2ca61bedc33a2a63d0451d528b268cf975
>>   v3.17-rc1-251-g0b725a2
>>
>> So, that only tells me, that you are reporting a possible bug based on
>> some non-upstream kernel ... ? Thus, it's not even possible to verify
>> if the actual backport was correct ?
>
> Sorry. There is something wrong with backporting this patch.
>
> Thanks for your help.
>
> Zhu Yanjun

You may very well be missing the code that forced the tail to write if 
the Tx descriptor ring was full.  Double check your igb driver code and 
compare it to the upstream kernel.  You should double check and verify 
that you backported commit 6f19e12f62306 "igb: flush when in xmit_more 
mode and under descriptor pressure".

You need to have both checked before you skip writing the next 
descriptor to the ring.  The original code only checked one and this can 
result in Tx hangs if the ring fills without ever notifying the hardware.

- Alex
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ