[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20151130204208.GA6046@alphalink.fr>
Date: Mon, 30 Nov 2015 21:42:08 +0100
From: Guillaume Nault <g.nault@...halink.fr>
To: Andrew <nitr0@...i.kr.ua>
Cc: Alexander Duyck <alexander.duyck@...il.com>,
netdev@...r.kernel.org, Simon Farnsworth <simon@...nz.org.uk>
Subject: Re: Kernel 4.1.12 crash
[Adding Simon to the discussion]
On Mon, Nov 30, 2015 at 04:03:37PM +0100, Guillaume Nault wrote:
> On Mon, Nov 30, 2015 at 12:05:13AM +0200, Andrew wrote:
> > 26.11.2015 18:44, Guillaume Nault пишет:
> > >On Wed, Nov 25, 2015 at 04:58:54PM +0200, Andrew wrote:
> > >>25.11.2015 16:10, Guillaume Nault пишет:
> > >>>On Wed, Nov 25, 2015 at 12:59:52AM +0200, Andrew wrote:
> > >>>>Hi.
> > >>>>
> > >>>>I tried to reproduce errors in virtual environment (some VMs on my
> > >>>>notebook).
> > >>>>
> > >>>>I've tried to create 1000 client PPPoE sessions from this box via script:
> > >>>>for i in `seq 1 1000`; do pppd plugin rp-pppoe.so user test password test
> > >>>>nodefaultroute maxfail 0 persist nodefaultroute holdoff 1 noauth eth0; done
> > >>>>
> > >>>I've tried to reproduce the bug with your script, but couldn't get
> > >>>anything to crash (VM is Debian Jessie i386 running on KVM with upstream
> > >>>kernel 4.1.12). Does the crash happen before all sessions get
> > >>>established?
> > >>Yes, crash happens even before all daemon instances are started. Sessions
> > >>don't get established because BRAS configured to reject sessions (so a lot
> > >>of concurrent connection retries happens) - I still didn't created account
> > >>for test user on it.
> > >>
> > >Ok, I got the crash too. In fact I had misunderstood your previous
> > >message, crash happens when PPP sessions don't get established
> > >(authentication failures in my case).
> > >
> > >I'll investigate on that and let you know.
> >
> > It seems like bug appears on mass ppp devices removing (I planned to use
> > this test environment to reproduce BRAS periodical crashes, but suddenly
> > I've got crashes on test client).
> >
> > I've checked it with some kernels - it's present in 4.3.0, but it isn't
> > present in 3.10.57. I'll try to build 3.14/3.18 kernels to look how they
> > will work in this case.
>
> Yes, it most likely was introduced by 287f3a943fef ("pppoe: Use
> workqueue to die properly when a PADT is received"). I still have to
> figure out why.
I confirm the bug comes from this commit.
It happens if pppoe_connect() reinitialises po->proto.pppoe.padt_work
after pppoe_disc_rcv() has added it to the system's work queue, and
before that work got scheduled. Then when scheduling occurs, the worker
thread tries to run a corrupted structure and crashes.
I'm going to work on a patch.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists