[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170918194630.GH4914@intel.com>
Date: Mon, 18 Sep 2017 22:46:30 +0300
From: Ville Syrjälä <ville.syrjala@...ux.intel.com>
To: Eric Dumazet <eric.dumazet@...il.com>
Cc: Eric Dumazet <edumazet@...gle.com>,
"David S. Miller" <davem@...emloft.net>, netdev@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [regression v4.11] 617f01211baf ("8139too: use
napi_complete_done()")
On Mon, Apr 10, 2017 at 03:11:02PM +0300, Ville Syrjälä wrote:
> On Fri, Apr 07, 2017 at 11:38:49AM -0700, Eric Dumazet wrote:
> > On Fri, 2017-04-07 at 21:17 +0300, Ville Syrjälä wrote:
> > > Hi,
> > >
> > > My old P3 laptop started to die on me in the middle of larger compile
> > > jobs (using distcc) after v4.11-rc<something>. I bisected the problem
> > > to 617f01211baf ("8139too: use napi_complete_done()").
> > >
> > > Unfortunately I wasn't able to capture a full oops as the machine doesn't
> > > have serial and ramoops failed me. I did get one partial oops on vgacon
> > > which showed rtl8139_poll() being involved (EIP was around
> > > _raw_spin_unlock_irqrestore() supposedly), so seems to agree with my
> > > bisect result.
> > >
> > > So maybe some kind of nasty thing going between the hard irq and
> > > softirq? Perhaps UP related? I tried to stare at the locking around
> > > rtl8139_poll() for a while but it looked mostly sane to me.
> > >
> >
> > Thanks a lot for the detective work, I am so sorry for this !
> >
> > Could you try the following patch ?
> >
> > I do not really see what could be wrong, the code should run just fine
> > on UP.
> >
> > Thanks.
> >
> > diff --git a/drivers/net/ethernet/realtek/8139too.c b/drivers/net/ethernet/realtek/8139too.c
> > index 89631753e79962d91456d93b71929af768917da1..cd2dbec331dd796f5296cd378561b3443f231673 100644
> > --- a/drivers/net/ethernet/realtek/8139too.c
> > +++ b/drivers/net/ethernet/realtek/8139too.c
> > @@ -2135,11 +2135,12 @@ static int rtl8139_poll(struct napi_struct *napi, int budget)
> > if (likely(RTL_R16(IntrStatus) & RxAckBits))
> > work_done += rtl8139_rx(dev, tp, budget);
> >
> > - if (work_done < budget && napi_complete_done(napi, work_done)) {
> > + if (work_done < budget) {
> > unsigned long flags;
> >
> > spin_lock_irqsave(&tp->lock, flags);
> > - RTL_W16_F(IntrMask, rtl8139_intr_mask);
> > + if (napi_complete_done(napi, work_done))
> > + RTL_W16_F(IntrMask, rtl8139_intr_mask);
> > spin_unlock_irqrestore(&tp->lock, flags);
> > }
> > spin_unlock(&tp->rx_lock);
> >
> >
>
> Yep, that patch does appear to make it stable again.
>
> Tested-by: Ville Syrjälä <ville.syrjala@...ux.intel.com>
And five months later I'm still waiting for this patch to land...
--
Ville Syrjälä
Intel OTC
Powered by blists - more mailing lists