lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 24 Dec 2014 21:42:12 +0800
From:	Chris Ruehl <chris.ruehl@...ys.com.hk>
To:	netdev@...r.kernel.org
CC:	davem@...emloft.net, steffen.klassert@...unet.com
Subject: Re: ipv6: oops in datagram.c line 260

On Wednesday, December 10, 2014 01:43 PM, Chris Ruehl wrote:
> Hi all,
>
> We running a Dell server which crash frequently with (dell crash video snapshot)
> vanilla 3.14.25
>
> Capture viewed here: http://www.gtsys.com.hk/~chris/datagram_c_line260.png
>
> The capture sadly don't show the full trace, so we lack on information.
> 1st line I can see in the crash video from the idrac : tcp_transmit_skb+0x461
>
> RIP [<ffffffff815da587>] ipv6_local_error+0x17/0x140
>
> The null pointer happen:
>   Type "apropos word" to search for commands related to "word"...
> Reading symbols from net/ipv6/datagram.o...done.
> (gdb) list *(ipv6_local_error+0x17)
> 0xae7 is in ipv6_local_error (net/ipv6/datagram.c:260).
> 255        struct ipv6_pinfo *np = inet6_sk(sk);
> 256        struct sock_exterr_skb *serr;
> 257        struct ipv6hdr *iph;
> 258        struct sk_buff *skb;
> 259
> 260        if (!np->recverr)
> 261            return;
> 262
> 263        skb = alloc_skb(sizeof(struct ipv6hdr), GFP_ATOMIC);
> 264        if (!skb)
> (gdb) quit
>
>
> We running a 6in4 with ipsec tunnel on the 6. I found a pull request from
> Steffen Klassert
> here:
>      http://article.gmane.org/gmane.linux.network/281469
>
> Which might be relevant to this problem.
>
> For time being I add a
>
>          if (np == NULL){
>                  LIMIT_NETDEBUG(KERN_DEBUG "ipv6_pinfo is NULL\n");
>                  return;
>          }
>
> as work around to stop the server crashing
>
>
> With kind regards
> Chris
>

Catch it!

Update the kernel to 3.14.27 and add a WARN_ON() to the function and catch the 
OOPS after 5 Days.

As mentioned we running a IPv6 in IPv4 with a couple of IPSec tunnels on the v6.

Code change:
void ipv6_local_error(struct sock *sk, int err, struct flowi6 *fl6, u32 info)
{
         struct ipv6_pinfo *np = inet6_sk(sk);
         struct sock_exterr_skb *serr;
         struct ipv6hdr *iph;
         struct sk_buff *skb;

         if (np == NULL){
                 LIMIT_NETDEBUG(KERN_CRIT "ipv6_pinfo is NULL\n");
                 WARN_ON(1);
                 return;
         }



[447604.244357] ipv6_pinfo is NULL
[447604.273733] ------------[ cut here ]------------
[447604.303628] WARNING: CPU: 7 PID: 0 at net/ipv6/datagram.c:262 
ipv6_local_error+0x16b/0x1a0()
[447604.366173] Modules linked in: ipmi_si vhost_net vhost macvtap macvlan 
xt_policy authenc esp6 xfrm4_mode_tunnel xfrm6_mode_tunnel mpt3sas mpt2sas 
raid_class scsi_transport_sas mptctl mptbase ipt_MASQUERADE iptable_nat 
nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack 
ipt_REJECT xt_CHECKSUM iptable_mangle xt_tcpudp ipmi_devintf dell_rbu 
ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat ebtables 
x_tables xfrm_user xfrm4_tunnel ipcomp xfrm_ipcomp esp4 ah4 deflate ctr 
twofish_generic twofish_avx_x86_64 twofish_x86_64_3way twofish_x86_64 
twofish_common camellia_generic camellia_aesni_avx_x86_64 camellia_x86_64 
serpent_avx_x86_64 serpent_sse2_x86_64 xts serpent_generic blowfish_generic 
blowfish_x86_64 blowfish_common cast5_avx_x86_64 cast5_generic cast_common 
des_generic cmac xcbc rmd160 crypto_null af_key xfrm_algo sit ip_tunnel tunnel4 
bridge stp llc xfs libcrc32c intel_rapl x86_pkg_temp_thermal intel_powerclamp 
coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul gpio_ich 
ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul joydev glue_helper 
ablk_helper cryptd dcdbas shpchp wmi mei_me mei acpi_power_meter lpc_ich dummy 
lp parport hid_generic tg3 usbhid hid ahci megaraid_sas ptp libahci pps_core 
[last unloaded: ipmi_si]
[447605.087999] CPU: 7 PID: 0 Comm: swapper/7 Not tainted 3.14.27 #11
[447605.139687] Hardware name: Dell Inc. PowerEdge R420/0CN7CM, BIOS 2.3.3 
07/10/2014
[447605.242931]  0000000000000009 ffff8806172e3b48 ffffffff815ffd58 0000000000000000
[447605.349130]  ffff8806172e3b80 ffffffff81043c23 ffff8800a16322e8 ffff880037daa1c0
[447605.459659]  ffff88000b026800 0000000000000000 ffff880037daa4b8 ffff8806172e3b90
[447605.576385] Call Trace:
[447605.634243]  <IRQ>  [<ffffffff815ffd58>] dump_stack+0x45/0x56
[447605.692870]  [<ffffffff81043c23>] warn_slowpath_common+0x73/0x90
[447605.751097]  [<ffffffff81043cf5>] warn_slowpath_null+0x15/0x20
[447605.808000]  [<ffffffff815da6db>] ipv6_local_error+0x16b/0x1a0
[447605.863821]  [<ffffffff815e29d0>] xfrm6_local_error+0x60/0x90
[447605.918493]  [<ffffffff8150b485>] ? skb_dequeue+0x15/0x70
[447605.971871]  [<ffffffff815a6cc1>] xfrm_local_error+0x51/0x70
[447606.024218]  [<ffffffff8159ca15>] xfrm4_extract_output+0x75/0xb0
[447606.075630]  [<ffffffff815a6c5a>] xfrm_inner_extract_output+0x6a/0x80
[447606.126055]  [<ffffffff815e27a2>] xfrm6_prepare_output+0x12/0x60
[447606.175310]  [<ffffffff815a6ed0>] xfrm_output_resume+0x1f0/0x370
[447606.223406]  [<ffffffff8151a486>] ? skb_checksum_help+0x76/0x190
[447606.270572]  [<ffffffff815a709b>] xfrm_output+0x3b/0xf0
[447606.316454]  [<ffffffff815e2ae0>] ? xfrm6_extract_output+0xe0/0xe0
[447606.361803]  [<ffffffff815e2af7>] xfrm6_output_finish+0x17/0x20
[447606.406053]  [<ffffffff8159cad6>] xfrm4_output+0x46/0x80
[447606.448694]  [<ffffffff81550a80>] ip_local_out+0x20/0x30
[447606.489952]  [<ffffffff81550dd5>] ip_queue_xmit+0x135/0x3c0
[447606.530017]  [<ffffffff815672e1>] tcp_transmit_skb+0x461/0x8c0
[447606.569362]  [<ffffffff8156786e>] tcp_write_xmit+0x12e/0xb20
[447606.607876]  [<ffffffff815669ff>] ? tcp_current_mss+0x4f/0x70
[447606.645723]  [<ffffffff8156b320>] ? tcp_write_timer_handler+0x1b0/0x1b0
[447606.682837]  [<ffffffff81569487>] tcp_send_loss_probe+0x37/0x1f0
[447606.719000]  [<ffffffff8156b320>] ? tcp_write_timer_handler+0x1b0/0x1b0
[447606.754537]  [<ffffffff8156b1bb>] tcp_write_timer_handler+0x4b/0x1b0
[447606.789266]  [<ffffffff8156b320>] ? tcp_write_timer_handler+0x1b0/0x1b0
[447606.823242]  [<ffffffff8156b378>] tcp_write_timer+0x58/0x60
[447606.856047]  [<ffffffff8104e848>] call_timer_fn.isra.32+0x18/0x80
[447606.888029]  [<ffffffff8104ea1a>] run_timer_softirq+0x16a/0x200
[447606.920224]  [<ffffffff81047efc>] __do_softirq+0xec/0x250
[447606.951850]  [<ffffffff810482f5>] irq_exit+0xf5/0x100
[447606.982665]  [<ffffffff8102bc6f>] smp_apic_timer_interrupt+0x3f/0x50
[447607.014382]  [<ffffffff8160d98a>] apic_timer_interrupt+0x6a/0x70
[447607.046175]  <EOI>  [<ffffffff8104f336>] ? get_next_timer_interrupt+0x1d6/0x250
[447607.111311]  [<ffffffff814d45a7>] ? cpuidle_enter_state+0x47/0xc0
[447607.145850]  [<ffffffff814d45a3>] ? cpuidle_enter_state+0x43/0xc0
[447607.179625]  [<ffffffff814d46b6>] cpuidle_idle_call+0x96/0x130
[447607.213531]  [<ffffffff8100b909>] arch_cpu_idle+0x9/0x20
[447607.247052]  [<ffffffff810925ba>] cpu_startup_entry+0xda/0x1d0
[447607.280775]  [<ffffffff81029d22>] start_secondary+0x212/0x2c0
[447607.314555] ---[ end trace 6ff3826b6e4fdf67 ]---


Can someone have a closer look into this problem?

Regards
Chris
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ