lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20111118192639.GA10531@elliptictech.com>
Date:	Fri, 18 Nov 2011 14:26:39 -0500
From:	Nick Bowler <nbowler@...iptictech.com>
To:	Timo Teräs <timo.teras@....fi>
Cc:	Eric Dumazet <eric.dumazet@...il.com>, netdev@...r.kernel.org,
	"David S. Miller" <davem@...emloft.net>
Subject: Re: Occasional oops with IPSec and IPv6.

On 2011-11-18 20:27 +0200, Timo Teräs wrote:
> On 11/18/2011 06:39 PM, Eric Dumazet wrote:
> > Le vendredi 18 novembre 2011 à 11:27 -0500, Nick Bowler a écrit :
> >> On 2011-11-17 14:09 -0500, Nick Bowler wrote:
> >>> One of the tests we do with IPsec involves sending and receiving UDP
> >>> datagrams of all sizes from 1 to N bytes, where N is much larger than
> >>> the MTU.  In this particular instance, the MTU is 1500 bytes and N is
> >>> 10000 bytes.  This test works fine with IPv4, but I'm getting an
> >>> occasional oops on Linus' master with IPv6 (output at end of email).  We
> >>> also run the same test where N is less than the MTU, and it does not
> >>> trigger this issue.  The resulting fallout seems to eventually lock up
> >>> the box (although it continues to work for a little while afterwards).
> >>>
> >>> The issue appears timing related, and it doesn't always occur.  This
> >>> probably also explains why I've not seen this issue before now, as we
> >>> recently upgraded all our lab systems to machines from this century
> >>> (with newfangled dual core processors).  This also makes it somewhat
> >>> hard to reproduce, but I can trigger it pretty reliably by running 'yes'
> >>> in an ssh session (which doesn't use IPsec) while running the test:
> >>> it'll usually trigger in 2 or 3 runs.  The choice of cipher suite
> >>> appears to be irrelevant.
[...]
> > Please note commit 80c802f307 added a known bug, fixed in commit
> > 0b150932197b (xfrm: avoid possible oopse in xfrm_alloc_dst)
> > 
> > Given commit 80c802f307 complexity, we can assume other bugs are to be
> > fixed as well.
[...]
> This looks quite different. And I've been trying to figure out what
> causes this. However, the OOPS happens at ip6_fragment(), indicating
> that there was not enough allocated headroom (skb underrun). My initial
> thought is ipv6 bug that just got uncovered by my commit; especially
> since ipv4 side is happy. But I haven't yet been able to figure this one
> out.
> 
> Could you also try Herbert's latest patch set:
>   [0/6] Replace LL_ALLOCATED_SPACE to allow needed_headroom adjustment
> 
> This changes how the headroom is calculated, and *might* fix this issue
> too if it's caused by the same SMP race condition which got uncovered by
> my other commit earlier.

I applied all six of those patches, but I still see a crash.  However,
the call trace seems to be slightly different.  I've appended the trace
from the run with these paches applied, just in case it's significant.

NOTE: I did not carefully look at the traces of all the crashes I've
triggered.  This particular backtrace could potentially have appeared
before applying these patches and I would not have noticed.

[   45.318137] NET: Registered protocol family 15
[  125.153082] skb_under_panic: text:c1215d1d len:1462 put:14 head:f2ff1000 data:f2ff0ffa tail:0xf2ff15b0 end:0xf2ff1780 dev:p10p1
[  125.165124] ------------[ cut here ]------------
[  125.166001] kernel BUG at net/core/skbuff.c:147!
[  125.166001] invalid opcode: 0000 [#1] PREEMPT SMP 
[  125.166001] Modules linked in: authenc esp6 xfrm6_mode_transport deflate zlib_deflate ctr twofish_generic twofish_common camellia serpent blowfish_generic blowfish_common cast5 des_generic cbc xcbc rmd160 sha512_generic sha256_generic sha1_generic md5 hmac crypto_null af_key nfs lockd auth_rpcgss sunrpc rng_core iptable_filter ip_tables ip6table_filter ip6_tables x_tables psmouse sg r8169 mii evdev button ipv6 autofs4 usbhid ohci_hcd ehci_hcd usbcore usb_common sd_mod radeon ttm drm_kms_helper drm backlight i2c_algo_bit cfbcopyarea cfbimgblt cfbfillrect [last unloaded: scsi_wait_scan]
[  125.196579] 
[  125.196579] Pid: 2792, comm: udp_burst Not tainted 3.2.0-rc2-00115-g8b662f5 #54 System manufacturer System Product Name/M4A785T-M
[  125.196579] EIP: 0060:[<c11ff2af>] EFLAGS: 00010246 CPU: 0
[  125.196579] EIP is at skb_push+0x52/0x5b
[  125.196579] EAX: 00000089 EBX: f39cb000 ECX: 00000080 EDX: 00000003
[  125.196579] ESI: f39cb000 EDI: f39cb000 EBP: f29abb10 ESP: f29abae4
[  125.196579]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
[  125.196579] Process udp_burst (pid: 2792, ti=f29aa000 task=f3d8a2c0 task.ti=f29aa000)
[  125.196579] Stack:
[  125.196579]  c13a2756 c1215d1d 000005b6 0000000e f2ff1000 f2ff0ffa f2ff15b0 f2ff1780
[  125.196579]  f39cb000 00000000 f4fcdec0 f29abb28 c1215d1d 000086dd c1215d01 f29a5600
[  125.196579]  00000002 f29abb44 c120ee6d f4fcdec0 00000000 000005a8 f4fcde00 f29a5600
[  125.196579] Call Trace:
[  125.196579]  [<c1215d1d>] ? eth_header+0x1c/0x8b
[  125.196579]  [<c1215d1d>] eth_header+0x1c/0x8b
[  125.196579]  [<c1215d01>] ? eth_rebuild_header+0x53/0x53
[  125.196579]  [<c120ee6d>] dev_hard_header.constprop.12+0x28/0x32
[  125.196579]  [<c120ef74>] neigh_resolve_output+0xfd/0x138
[  125.196579]  [<f838af19>] ip6_finish_output2+0x280/0x31a [ipv6]
[  125.196579]  [<f838bf61>] ip6_fragment+0x3bd/0x939 [ipv6]
[  125.196579]  [<f838ac99>] ? NF_HOOK.constprop.4+0x30/0x30 [ipv6]
[  125.196579]  [<f838c51c>] ip6_finish_output+0x3f/0x4c [ipv6]
[  125.196579]  [<f838c5e1>] ip6_output+0xb8/0xc0 [ipv6]
[  125.196579]  [<c12520f1>] xfrm_output_resume+0x75/0x2c5
[  125.196579]  [<c125234e>] xfrm_output2+0xd/0xf
[  125.196579]  [<c12523e3>] xfrm_output+0x93/0x9c
[  125.196579]  [<f83a8b32>] xfrm6_output_finish+0x13/0x15 [ipv6]
[  125.196579]  [<f83a8a1f>] __xfrm6_output+0x108/0x10d [ipv6]
[  125.196579]  [<f83a8b7b>] xfrm6_output+0x47/0x4c [ipv6]
[  125.196579]  [<f838a7b4>] dst_output+0x12/0x15 [ipv6]
[  125.196579]  [<f838b36a>] ip6_local_out+0x17/0x1a [ipv6]
[  125.196579]  [<f838d27b>] ip6_push_pending_frames+0x2a4/0x346 [ipv6]
[  125.196579]  [<f839a035>] udp_v6_push_pending_frames+0x213/0x271 [ipv6]
[  125.196579]  [<f839ae84>] ? udpv6_sendmsg+0x68d/0x832 [ipv6]
[  125.196579]  [<f839aea6>] udpv6_sendmsg+0x6af/0x832 [ipv6]
[  125.196579]  [<c123fe84>] ? ip_fast_csum+0x30/0x30
[  125.196579]  [<c12403c0>] inet_sendmsg+0x4e/0x57
[  125.196579]  [<c11f8de6>] sock_sendmsg+0xbe/0xd9
[  125.196579]  [<c1052d64>] ? trace_hardirqs_off+0xb/0xd
[  125.196579]  [<c1270f48>] ? restore_all+0xf/0xf
[  125.196579]  [<c1055715>] ? trace_hardirqs_on_caller+0x10e/0x13f
[  125.196579]  [<c10542df>] ? mark_lock+0x26/0x1ea
[  125.196579]  [<c10acdbb>] ? fget_light+0x28/0x7c
[  125.196579]  [<c11fa23a>] sys_sendto+0xb1/0xcd
[  125.196579]  [<c10548e7>] ? __lock_acquire+0x444/0xb17
[  125.196579]  [<c1270bb1>] ? _raw_spin_unlock_irq+0x39/0x45
[  125.196579]  [<c1055038>] ? lock_release_non_nested+0x7e/0x1bb
[  125.196579]  [<c11fa26e>] sys_send+0x18/0x1a
[  125.196579]  [<c11fa877>] sys_socketcall+0xce/0x19a
[  125.196579]  [<c11507c0>] ? trace_hardirqs_on_thunk+0xc/0x10
[  125.196579]  [<c1271650>] sysenter_do_call+0x12/0x36
[  125.196579] Code: c1 85 f6 0f 45 de 53 ff b1 98 00 00 00 ff b1 94 00 00 00 50 ff b1 9c 00 00 00 52 ff 71 50 ff 75 04 68 56 27 3a c1 e8 5a c7 06 00 <0f> 0b 8d 65 f8 5b 5e 5d c3 55 89 c1 89 e5 56 53 83 79 54 00 8b 
[  125.196579] EIP: [<c11ff2af>] skb_push+0x52/0x5b SS:ESP 0068:f29abae4
[  125.544777] ---[ end trace 3ca7fd586035bfb5 ]---
[  125.549588] BUG: sleeping function called from invalid context at kernel/rwsem.c:21
[  125.557655] in_atomic(): 0, irqs_disabled(): 0, pid: 2792, name: udp_burst
[  125.565415] INFO: lockdep is turned off.
[  125.569682] Pid: 2792, comm: udp_burst Tainted: G      D      3.2.0-rc2-00115-g8b662f5 #54
[  125.578640] Call Trace:
[  125.581476]  [<c10307b1>] ? console_unlock+0x1b6/0x1c9
[  125.587209]  [<c1024dbd>] __might_sleep+0xe2/0xe9
[  125.592457]  [<c126ff47>] down_read+0x17/0x3b
[  125.597311]  [<c105fc85>] acct_collect+0x39/0x134
[  125.602749]  [<c1032c08>] do_exit+0x188/0x5de
[  125.607604]  [<c1031464>] ? kmsg_dump+0xdf/0xe7
[  125.612710]  [<c1004737>] oops_end+0x92/0x9a
[  125.617647]  [<c1004868>] die+0x51/0x59
[  125.622008]  [<c1002626>] do_trap+0x89/0xa2
[  125.626665]  [<c1002776>] ? do_bounds+0x52/0x52
[  125.631781]  [<c10027e7>] do_invalid_op+0x71/0x7b
[  125.637157]  [<c11ff2af>] ? skb_push+0x52/0x5b
[  125.642175]  [<c1270f48>] ? restore_all+0xf/0xf
[  125.647256]  [<c10307b1>] ? console_unlock+0x1b6/0x1c9
[  125.653106]  [<c102369b>] ? need_resched+0x14/0x1e
[  125.658517]  [<c126f1f7>] ? preempt_schedule+0x40/0x46
[  125.664271]  [<c1030c19>] ? vprintk+0x390/0x3ae
[  125.669417]  [<c1052d01>] ? trace_hardirqs_off_caller+0x2e/0x86
[  125.675999]  [<c11507d0>] ? trace_hardirqs_off_thunk+0xc/0x10
[  125.682561]  [<c127140b>] error_code+0x5f/0x64
[  125.687553]  [<c1002776>] ? do_bounds+0x52/0x52
[  125.692621]  [<c11ff2af>] ? skb_push+0x52/0x5b
[  125.697723]  [<c1215d1d>] ? eth_header+0x1c/0x8b
[  125.702905]  [<c1215d1d>] eth_header+0x1c/0x8b
[  125.707963]  [<c1215d01>] ? eth_rebuild_header+0x53/0x53
[  125.713945]  [<c120ee6d>] dev_hard_header.constprop.12+0x28/0x32
[  125.720617]  [<c120ef74>] neigh_resolve_output+0xfd/0x138
[  125.726714]  [<f838af19>] ip6_finish_output2+0x280/0x31a [ipv6]
[  125.733397]  [<f838bf61>] ip6_fragment+0x3bd/0x939 [ipv6]
[  125.739483]  [<f838ac99>] ? NF_HOOK.constprop.4+0x30/0x30 [ipv6]
[  125.746261]  [<f838c51c>] ip6_finish_output+0x3f/0x4c [ipv6]
[  125.752772]  [<f838c5e1>] ip6_output+0xb8/0xc0 [ipv6]
[  125.758684]  [<c12520f1>] xfrm_output_resume+0x75/0x2c5
[  125.764729]  [<c125234e>] xfrm_output2+0xd/0xf
[  125.769960]  [<c12523e3>] xfrm_output+0x93/0x9c
[  125.775292]  [<f83a8b32>] xfrm6_output_finish+0x13/0x15 [ipv6]
[  125.781988]  [<f83a8a1f>] __xfrm6_output+0x108/0x10d [ipv6]
[  125.788515]  [<f83a8b7b>] xfrm6_output+0x47/0x4c [ipv6]
[  125.794659]  [<f838a7b4>] dst_output+0x12/0x15 [ipv6]
[  125.800633]  [<f838b36a>] ip6_local_out+0x17/0x1a [ipv6]
[  125.806889]  [<f838d27b>] ip6_push_pending_frames+0x2a4/0x346 [ipv6]
[  125.814176]  [<f839a035>] udp_v6_push_pending_frames+0x213/0x271 [ipv6]
[  125.821792]  [<f839ae84>] ? udpv6_sendmsg+0x68d/0x832 [ipv6]
[  125.828447]  [<f839aea6>] udpv6_sendmsg+0x6af/0x832 [ipv6]
[  125.834931]  [<c123fe84>] ? ip_fast_csum+0x30/0x30
[  125.840522]  [<c12403c0>] inet_sendmsg+0x4e/0x57
[  125.845919]  [<c11f8de6>] sock_sendmsg+0xbe/0xd9
[  125.851343]  [<c1052d64>] ? trace_hardirqs_off+0xb/0xd
[  125.857271]  [<c1270f48>] ? restore_all+0xf/0xf
[  125.862642]  [<c1055715>] ? trace_hardirqs_on_caller+0x10e/0x13f
[  125.869618]  [<c10542df>] ? mark_lock+0x26/0x1ea
[  125.875028]  [<c10acdbb>] ? fget_light+0x28/0x7c
[  125.880431]  [<c11fa23a>] sys_sendto+0xb1/0xcd
[  125.885688]  [<c10548e7>] ? __lock_acquire+0x444/0xb17
[  125.891665]  [<c1270bb1>] ? _raw_spin_unlock_irq+0x39/0x45
[  125.898057]  [<c1055038>] ? lock_release_non_nested+0x7e/0x1bb
[  125.904803]  [<c11fa26e>] sys_send+0x18/0x1a
[  125.909815]  [<c11fa877>] sys_socketcall+0xce/0x19a
[  125.915539]  [<c11507c0>] ? trace_hardirqs_on_thunk+0xc/0x10
[  125.922127]  [<c1271650>] sysenter_do_call+0x12/0x36
[  185.166028] INFO: rcu_preempt detected stalls on CPUs/tasks: {} (detected by 1, t=60002 jiffies)
[  185.167017] INFO: Stall ended before state dump start

-- 
Nick Bowler, Elliptic Technologies (http://www.elliptictech.com/)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ