[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1470290142.13693.11.camel@edumazet-glaptop3.roam.corp.google.com>
Date: Thu, 04 Aug 2016 07:55:42 +0200
From: Eric Dumazet <eric.dumazet@...il.com>
To: Marcelo Ricardo Leitner <marcelo.leitner@...il.com>
Cc: netdev@...r.kernel.org, Neal Cardwell <ncardwell@...gle.com>
Subject: Re: Hitting WARN_ON_ONCE on skb_try_coalesce
On Wed, 2016-08-03 at 20:36 -0300, Marcelo Ricardo Leitner wrote:
> Hi,
>
> I have two namespaces linked with a veth and netem in one of them adding 20ms
> latency, and doing netperf from one to another. I'm on commit
> 7cf210dc267861360cf6968b69bf512475aca985, net updated today, and I'm hitting
> this:
>
> [ 1043.024555] WARNING: CPU: 3 PID: 19902 at ...../linux/net/core/skbuff.c:4289 skb_try_coalesce+0x414/0x430
> [ 1043.024556] Kernel panic - not syncing: panic_on_warn set ...
>
> [ 1043.024601] CPU: 3 PID: 19902 Comm: netserver Not tainted 4.7.0-standard+ #20
> [ 1043.024631] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.1-1.fc24 04/01/2014
> [ 1043.024667] 0000000000000086 00000000db033d46 ffff880035733ad8 ffffffff813d630d
> [ 1043.024705] 0000000000000000 ffffffff81a35655 ffff880035733b60 ffffffff811b5e3d
> [ 1043.024741] ffff880000000008 ffff880035733b70 ffff880035733b08 00000000db033d46
> [ 1043.024777] Call Trace:
> [ 1043.024794] [<ffffffff813d630d>] dump_stack+0x63/0x86
> [ 1043.024818] [<ffffffff811b5e3d>] panic+0xe4/0x226
> [ 1043.024842] [<ffffffff816ae244>] ? skb_try_coalesce+0x414/0x430
> [ 1043.024870] [<ffffffff810a5f13>] __warn+0xe3/0xf0
> [ 1043.024892] [<ffffffff810a602d>] warn_slowpath_null+0x1d/0x20
> [ 1043.024919] [<ffffffff816ae244>] skb_try_coalesce+0x414/0x430
> [ 1043.024945] [<ffffffff817165f3>] tcp_try_coalesce+0x63/0xd0
> [ 1043.024977] [<ffffffff81718497>] tcp_queue_rcv+0x57/0x140
> [ 1043.025003] [<ffffffff8171e132>] tcp_rcv_established+0x3f2/0x6f0
> [ 1043.025029] [<ffffffff81728815>] tcp_v4_do_rcv+0x145/0x200
> [ 1043.025054] [<ffffffff816a4387>] __release_sock+0x87/0xf0
> [ 1043.025078] [<ffffffff816a4420>] release_sock+0x30/0xa0
> [ 1043.025102] [<ffffffff81711d8c>] tcp_recvmsg+0x5fc/0xb70
> [ 1043.025126] [<ffffffff8174051e>] inet_recvmsg+0x7e/0xb0
> [ 1043.025151] [<ffffffff8169f61d>] sock_recvmsg+0x3d/0x50
> [ 1043.025175] [<ffffffff8169f85a>] SYSC_recvfrom+0xda/0x150
> [ 1043.025201] [<ffffffff811157df>] ? do_setitimer+0x1bf/0x220
> [ 1043.025226] [<ffffffff81115891>] ? alarm_setitimer+0x51/0x90
> [ 1043.025252] [<ffffffff816a0bbe>] SyS_recvfrom+0xe/0x10
> [ 1043.025277] [<ffffffff817df132>] entry_SYSCALL_64_fastpath+0x1a/0xa4
>
> That's:
> WARN_ON_ONCE(delta < len);
>
> Netperf command line is:
> netperf -cC -l 10 -t TCP_STREAM \
> -H 192.168.10.1 -L 192.168.10.2
>
> netem is added on netperf/client side.
>
> It triggers it right at the beginning of the test.
>
> I can reproduce it 100% of the times with above setup.
> - Removing netem made it stop reproducing.
> - Reducing netem latency to 10ms, still warns
> - Reducing netem latency to 1ms, still warns
> - Reducing netem latency to 0ms, it stops reproducing it
>
> This system is not generating a kdump, I'm looking into it.
This is caused by the skb_orphan_partial() call from netem.
This breaks if packet is re-injected instead of being sent to a physical
NIC.
Powered by blists - more mailing lists