lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <2116485.usCtNS6d1V@rofl>
Date:	Tue, 24 May 2016 17:44:13 +0200
From:	Patrick Schaaf <netdev@....de>
To:	NETDEV <netdev@...r.kernel.org>, lvs-users@...uxvirtualserver.org
Subject: kernel panic with kernel 3.14.70, LVS on keepalived restart

Dear LVS users / netdev readers,

today I've got a pretty peculiar problem.

I've been running 3.14.48 (and some earlier 3.14 kernels) for a long time now 
in an LVS / keepalived driven loadbalancing cluster. See below for more detail 
on the setup.

Today I started to upgrade to the current 3.14.70 kernel. At first glance 
everything seems fine, I can failover _to_ the box with the new kernel, and 
traffic is flowing fine.

However, when I then switch BACK to a different box, the 3.14.70 kernel 
crashes. I've got an incomplete console dump (IPMI avi capture, single 
frame...) you can see here:

 https://plus.google.com/u/0/photos/photo/114613285248943487324/6288270822391242162

I usually manually failover by (having some automation) fiddle with 
keepalived.conf VRRP priority settings, then restart the daemon.

The issue / reboot only manifests when I RESTART the ACTIVE keepalived. It 
always happens, then, with 3.14.70. That never happened before.

On the other hand just reloading keepalived, with the prio-modified config, 
works fine!

As I don't remember why I restarted instead of reloading, I can for now change 
my automation easily - but the issue is weird anyway.

More info on the setup:

1) kernel is vanilla 3.14.70 (and was vanilla 3.14.48 without the issue), with 
a single (self written) patch to bonding applied (see 
http://permalink.gmane.org/gmane.linux.network/316758). Unfortunately I cannot 
live without that patch, i.e. can't try to reproduce with a pure vanilla 
vanilla kernel.

2) keepalived is 1.2.13

3) config uses "use_vmac" / "vmac_xmit_base", on multiple interfaces, i.e. 
MACVLAN interfaces on top of:

4) "normal" interfaces are both bridge-over-VLAN-over-LACP-bond-over-eth and 
ARP-bond-over-VLAN-over-eth

5) there is active conntracking including conntrackd (but excluding LVS 
state), LVS loadbalancing of some 15k pps, LVS sync, and heavy iptables use 
including ipset matching, going on. Just for completeness.

Anybody got any idea what the root cause might be?

best regards
  Patrick

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ