lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:	Mon, 25 Jul 2011 13:01:52 +0100
From:	Richard Kennedy <richard@....demon.co.uk>
To:	netdev@...r.kernel.org
CC:	Francois Romieu <romieu@...zoreil.com>
Subject: Re: v3.0-rc* intermittent network failure: Test case found!

On 21/07/11 16:18, Richard Kennedy wrote:
>> Richard Kennedy<richard@....demon.co.uk>  :
>>> I keep seeing a total network failure on v3.0.0-rc* , it is highly
>>> intermittent, anything from 1 hour to 12+, and I don't have a reliable
>>> test case.
>>> When it fails I lose all network comms, but there are no errors in the
>>> system log, no hung tasks reported, nothing. But after it fails the
>>> machine hangs during shutdown, it just never turns off. So I guess
>>> something is getting stuck but I can't find it.
>>

I have found a reliable test case, I can instantly trigger my problem by 
starting 2 instances of rsync at the same time. [this is on x86_64 AMDX2]

e.g.
rsync -a linux-2.6 server:t1 & ;rsync -a linux-2.6 server:t2 &


If I have a ping running when I trigger the problem, it pauses then 
errors with :-

	ping: sendmsg: No buffer space available

But if I start a ping after, it fails with

...	Destination Host Unreachable
.

I have a serial console attached but don't really understand what it's 
telling me.
AFAICT -- I have no blocked tasks  - sysrq w shows :-


SysRq : Show Blocked State
   task                        PC stack   pid father
Sched Debug Version: v0.10, 3.0.0 #46
ktime                                   : 7129717.783042
sched_clk                               : 7126380.221722
cpu_clk                                 : 7129711.544071
jiffies                                 : 4301797008
sched_clock_stable                      : 0
.....[lots more schedule & cpu info]

But now I've got a reliable test case I can find a last know good kernel 
and have a stab at bisecting this, unless anyone has got any better 
suggestions?

regards
Richard



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ