lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Sun, 15 Nov 2015 16:58:33 -0800
From:	Grant Zhang <gzhang@...tly.com>
To:	Eric Dumazet <eric.dumazet@...il.com>,
	Patrick Schaaf <kernelorg@....de>
Cc:	NETDEV <netdev@...r.kernel.org>,
	linux-kernel <linux-kernel@...r.kernel.org>
Subject: Re: Kernel 4.1 hang, apparently in __inet_lookup_established

Hi Patrick,

Have you tried the two patches Eric mentioned? One of my 4.1.11 server 
just hanged with very similar stack trace and I am wondering whether the 
aforementioned patches would help.

Thanks,

Grant

On 23/09/2015 09:31, Eric Dumazet wrote:
> On Wed, 2015-09-23 at 10:25 +0200, Patrick Schaaf wrote:
>> Dear kernel developers,
>>
>> I recently started to upgrade my production hosts and VMs from the 3.14 series
>> to 4.1 kernels, starting with 4.1.6. Yesterday, for the second time after I
>> started these upgrades, I experienced one of our webserver VMs hanging.
>>
>> The first time this happened, the VM hung completely, all 5 virtual cores
>> spinning at 100%, ping still worked, but nothing else, including no virsh
>> console reaction - I had to destroy and restart that VM. No messages were to
>> be found.
>>
>> Yesterday, when it happened the second time, I found the VM spinning on a
>> single core only, and could still connect to it via ssh - but it stopped
>> accepting apache connections. The core it spun on showed 100% time used in
>> "si", with top, and it produced the messages appended below. The VM did not
>> shutdown properly when told to, and had to be destroyed again.
>>
>> If I read that dmesg output correctly it spins in __inet_lookup_established,
>> which indeed reads like it has infinite spin potential. But that code itself
>> did not change relative to the 3.14 series we've been running for a long time
>> without the issues - so the root cause would be something else.
>>
>> For our production systems I'll revert to the 3.14 series, but maybe this
>> report may help somebody understand what's going on.
>>
>> best regards
>>    Patrick
>
>
> You could try following commits :
>
> http://git.kernel.org/cgit/linux/kernel/git/davem/net.git/commit/?id=ed2e923945892a8372ab70d2f61d364b0b6d9054
>
> http://git.kernel.org/cgit/linux/kernel/git/davem/net.git/commit/?id=29c6852602e259d2c1882f320b29d5c3fec0de04
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists