lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Date:	Fri, 16 Dec 2011 15:43:55 -0800 (PST)
From:	d.stussy@...oo.com
To:	linux-kernel@...r.kernel.org
Subject: Kernel non-fatal(?) bug - IPsec - Holding atomic when calling scheduler.

I'm getting a whole bunch of these (hostname deleted):

Dec  5 11:09:27 - kernel: BUG: scheduling while atomic:
named/909/0x00000200
Dec  5 11:09:27 - kernel: Modules linked in: xt_geoip ipt_set
ip_set_nethash ip_set xt_recent xt_TARPIT compat_xtables
Dec  5 11:09:27 - kernel: Pid: 909, comm: named Not tainted 3.1.4 #6
Dec  5 11:09:27 - kernel: Call Trace:
Dec  5 11:09:27 - kernel:  [<ffffffff8142b633>] ? __schedule+0x5d3/0x7a0
Dec  5 11:09:27 - kernel:  [<ffffffff8102ff3d>] ?
select_task_rq_fair+0x3ad/0x800
Dec  5 11:09:28 - kernel:  [<ffffffff8102f680>] ?
check_preempt_wakeup+0xe0/0x140
Dec  5 11:09:28 - kernel:  [<ffffffff8142bd2d>] ?
schedule_timeout+0x1bd/0x220
Dec  5 11:09:28 - kernel:  [<ffffffff8142aeca>] ?
wait_for_common+0xda/0x190
Dec  5 11:09:28 - kernel:  [<ffffffff81034cd0>] ?
try_to_wake_up+0x260/0x260
Dec  5 11:09:28 - kernel:  [<ffffffff81306b75>] ?
flow_cache_flush+0x75/0x90
Dec  5 11:09:28 - kernel:  [<ffffffff81383a8b>] ?
__xfrm_garbage_collect+0xb/0x90
Dec  5 11:09:28 - kernel:  [<ffffffff813c2171>] ?
xfrm6_garbage_collect+0x11/0x30
Dec  5 11:09:28 - kernel:  [<ffffffff812fc79b>] ? dst_alloc+0x13b/0x170
Dec  5 11:09:28 - kernel:  [<ffffffff81387c47>] ?
xfrm_bundle_lookup+0x287/0x3d0
Dec  5 11:09:28 - kernel:  [<ffffffff81306929>] ?
flow_cache_lookup+0x259/0x430
Dec  5 11:09:28 - kernel:  [<ffffffff813879c0>] ?
xfrm_policy_lookup_bytype.clone.42+0x250/0x250
Dec  5 11:09:28 - kernel:  [<ffffffff81386ef8>] ? xfrm_lookup+0x238/0x4d0
Dec  5 11:09:28 - kernel:  [<ffffffff813997d8>] ?
ip6_sk_dst_lookup_flow+0xe8/0x170
...
After this point, the call chain varies, and so does the process causing
it.  After about 1000 of such reports, the system usually crashes.

Prior to activating IPsec, I did not see these problems.  I have both IPv4 and IPv6 stacks on an x86-64 bit system.  I have ipsec-tools v0.8.0 as the user interface to IPsec.  I am using only transport mode, although tunnel mode is available (an unloaded module) and the user program does present a listener on UDP 4500 (as well as 500).  I have set all 3 IPsec options (ah, esp, and comp) to optional ("use" if available).

I see this ONLY when the kernel finds an IPsec policy statement (SPD) where there is no corresponding IPsec authorization definition (SAD) and therefore it is presumedly calling the userspace process via the PF_KEY interface to contact the remote side to do IPsec key exchange (IKE) via UDP port 500 (or 4500 if I had tunnel mode defined).  The problem does NOT surface when I have IPsec not loaded/compiled or when I do but the SPD table is empty.  If the SPD has policies which define none or discard, the issue does seem to happen.

I have seen this bug with kernel releases 3.1.4 and 3.1.5.  It may exist prior to those but I wasn't actively using IPsec before then.

Fix - guide to solution:  What atomic lock is being held when the scheduler is called?

Please fix this soon.  After a few hundred of these, the kernel seems to get sufficiently confused that it crashes and I have to hard-reset the machine.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ