netdev - Re: 4.6.3, pppoe + shaper workload, skb_panic / skb_push / ppp_start

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20160728110948.GA3046@alphalink.fr>
Date:	Thu, 28 Jul 2016 13:09:48 +0200
From:	Guillaume Nault <g.nault@...halink.fr>
To:	Cong Wang <xiyou.wangcong@...il.com>
Cc:	nuclearcat@...learcat.com,
	Linux Kernel Network Developers <netdev@...r.kernel.org>
Subject: Re: 4.6.3, pppoe + shaper workload, skb_panic / skb_push /
 ppp_start_xmit

On Tue, Jul 12, 2016 at 10:31:18AM -0700, Cong Wang wrote:
> On Mon, Jul 11, 2016 at 12:45 PM,  <nuclearcat@...learcat.com> wrote:
> > Hi
> >
> > On latest kernel i noticed kernel panic happening 1-2 times per day. It is
> > also happening on older kernel (at least 4.5.3).
> >
> ...
> >  [42916.426463] Call Trace:
> >  [42916.426658]  <IRQ>
> >
> >  [42916.426719]  [<ffffffff81843786>] skb_push+0x36/0x37
> >  [42916.427111]  [<ffffffffa00e8ce5>] ppp_start_xmit+0x10f/0x150
> > [ppp_generic]
> >  [42916.427314]  [<ffffffff81853467>] dev_hard_start_xmit+0x25a/0x2d3
> >  [42916.427516]  [<ffffffff818530f2>] ?
> > validate_xmit_skb.isra.107.part.108+0x11d/0x238
> >  [42916.427858]  [<ffffffff8186dee3>] sch_direct_xmit+0x89/0x1b5
> >  [42916.428060]  [<ffffffff8186e142>] __qdisc_run+0x133/0x170
> >  [42916.428261]  [<ffffffff81850034>] net_tx_action+0xe3/0x148
> >  [42916.428462]  [<ffffffff810c401a>] __do_softirq+0xb9/0x1a9
> >  [42916.428663]  [<ffffffff810c4251>] irq_exit+0x37/0x7c
> >  [42916.428862]  [<ffffffff8102b8f7>] smp_apic_timer_interrupt+0x3d/0x48
> >  [42916.429063]  [<ffffffff818cb15c>] apic_timer_interrupt+0x7c/0x90
> 
> Interesting, we call a skb_cow_head() before skb_push() in ppp_start_xmit(),
> I have no idea why this could happen.
>
The skb is corrupted: head is at ffff8800b0bf2800 while data is at
ffa00500b0bf284c.

Figuring out how this corruption happened is going to be hard without a
way to reproduce the problem.

Denys, can you confirm you're using a vanilla kernel?
Also I guess the ppp devices and tc settings are handled by accel-ppp.
If so, can you share more info about your setup (accel-ppp.conf, radius
attributes, iptables...) so that I can try to reproduce it on my
machines?

Regards

Guillaume