netdev

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <web-5939356@patton.com>
Date:	Wed, 15 Dec 2010 14:14:41 -0500
From:	"Stephen Ochs" <stochs@...ton.com>
To:	netdev@...r.kernel.org
Subject: 

I observed the following oops:

Oops: Kernel access of bad area, sig: 11 [#1]
ngvideo
Modules linked in: xt_MARK xfrm4_mode_beet 
xfrm4_mode_transport cls_u32 sch_htb nf_conntrack_ipv4 
xt_state nf_conntrack iptable_mangle ip_tables x_tables 
option usbserial snd_usb_audio snd_pcm snd_timer 
snd_page_alloc snd_usb_lib snd_rawmidi snd_seq_device 
snd_hwdep snd soundcore ngvideo_plat_video_7712 esp4 ah4 
af_key
NIP: c027b50c LR: c027b500 CTR: c02247b0
REGS: c3517a20 TRAP: 0300   Not tainted 
 (2.6.25.5-00031-g5974727)
MSR: 00029030 <EE,ME,IR,DR>  CR: 48222082  XER: 00000000
DEAR: 6b6b6b9f, ESR: 00000000
TASK = c6fed080[3317] 'python' THREAD: c3516000
GPR00: c027b500 c3517ad0 c6fed080 00000001 c03bc550 
c03bc550 00000001 6b6b6b6b
GPR08: 00000094 00000200 c6da7ed8 00008000 08e980d2 
10019cec 1005f750 1005f734
GPR16: 00000000 4809c230 00000000 48028030 00000001 
00000018 00000000 00000000
GPR24: c3480c40 c353bbd4 c027b634 c03de014 c3534480 
00000000 c7bbe1c0 00000000
NIP [c027b50c] xfrm_output_resume+0x2a0/0x3c8
LR [c027b500] xfrm_output_resume+0x294/0x3c8
Call Trace:
[c3517ad0] [c027b500] xfrm_output_resume+0x294/0x3c8 
(unreliable)
[c3517b00] [c02730d0] xfrm4_output_finish+0x5c/0x6c
[c3517b10] [c0273174] xfrm4_output+0x94/0xb0
[c3517b20] [c02383d8] ip_local_out+0x38/0x54
[c3517b30] [c026fad0] ipgre_tunnel_xmit+0x56c/0x784
[c3517c10] [c0211e34] dev_hard_start_xmit+0x1a4/0x2bc
[c3517c30] [c021214c] dev_queue_xmit+0x200/0x2d4
[c3517c50] [c023966c] ip_finish_output+0x11c/0x2f4
[c3517c80] [c0239b24] ip_mc_output+0x1f4/0x214
[c3517ca0] [c026caa0] ipmr_queue_xmit+0x3ac/0x560
[c3517db0] [c026ce40] ip_mr_forward+0x1ac/0x224
[c3517de0] [c026cf1c] ip_mr_input+0x64/0x200
[c3517e00] [c0235644] ip_rcv_finish+0x68/0x384
[c3517e30] [c0235b08] ip_rcv+0x1a8/0x294
[c3517e50] [c0212858] netif_receive_skb+0x344/0x4b4
[c3517e80] [c0212a64] process_backlog+0x9c/0x154
[c3517eb0] [c0212bf4] net_rx_action+0xd8/0x1b0
[c3517ee0] [c00257ac] __do_softirq+0x80/0xe4
[c3517f10] [c0004404] do_softirq+0x58/0x60
[c3517f20] [c002586c] irq_exit+0x48/0x58
[c3517f30] [c000435c] do_IRQ+0x78/0xc8
[c3517f40] [c000e6b8] ret_from_except+0x0/0x18
Instruction dump:
419e00c0 817e001c 38000000 901e0078 812b0040 7fc3f378 
81290024 7d2903a6
4e800421 2f830001 409e0064 80fe001c <80070034> 2f800000 
419e00d4 81270040
Kernel panic - not syncing: Fatal exception in interrupt
Rebooting in 180 seconds..


My board has an AMCC PowerPC 405EX processor. I am running 
kernel version
2.6.25.5.

I have an IPsec tunnel with NAT traversal. Inside the 
IPsec tunnel I have a
GRE tunnel. I am using tc to rate-limit the outgoing 
traffic on eth0.

The oops is triggered in function xfrm_output_resume. It 
calls ip_local_out
(via skb->dst->ops->local_out) which calls dst_output, 
which calls ip_output
(via skb->dst->output), which calls ip_finish_output, 
which calls
ip_finish_output2, which calls neigh_connected_output (via
dst->neighbour->output), which calls dev_queue_xmit (via
neigh->ops->queue_xmit), which calls pfifo_enqueue (via 
q->enqueue), which
calls qdisc_reshape_fail, which frees the skb and returns 
NET_XMIT_DROP(1).
This return value is finally returned to 
xfrm_output_resume, which then
dereferences skb->dst->xfrm. Because skb has been freed, 
this is invalid.

I am using kernel boot arg 'slub_debug=FZPU' to poison 
freed memory. Without
this, the oops still occurs, but it happens at other, less 
predictable
locations.

I have searched for patches that may have addressed this 
issue, but didn't
come up with anything. If anyone is aware of one, I would 
appreciate being
pointed to it. It is not really feasible for me to try the 
latest kernel, so I
have not.

If there is not already a patch to address this, I am not 
very familiar with
this code, so I am not sure what the right way to address 
this would be.

I am wondering:
  1. Why is xfrm_output_resume checking for a return value 
of 1. Is it checking
  if the skb was dropped?
  2. If the skb is dropped, who should be responsible for 
freeing it?

Any other guidance would appreciated, too. If there is any 
other information
that would be helpful, feel free to ask.

Please CC me on any replies to this message.


Stephen Ochs
stochs@...ton.com
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html