netdev - Re: [PATCH 0/3] URGENT for 3.9: net: fec: revert NAPI introduction

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Mon, 22 Apr 2013 10:56:38 +0200
From:	Lucas Stach <l.stach@...gutronix.de>
To:	Frank Li <lznuaa@...il.com>
Cc:	Fabio Estevam <festevam@...il.com>,
	"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
	David Miller <davem@...emloft.net>,
	Frank Li <Frank.Li@...escale.com>,
	Shawn Guo <shawn.guo@...aro.org>
Subject: Re: [PATCH 0/3] URGENT for 3.9: net: fec: revert NAPI introduction

Hi all,

Am Samstag, den 20.04.2013, 20:35 +0800 schrieb Frank Li:
> 2013/4/20 Fabio Estevam <festevam@...il.com>
> >
> > Lucas,
> >
> > On Fri, Apr 19, 2013 at 11:36 AM, Lucas Stach <l.stach@...gutronix.de> wrote:
> > > Those patches introduce instability to the point of kernel OOPSes with
> > > NULL-ptr dereferences.
> > >
> > > The patches drop locks from the code without justifying why this would
> > > be safe at all. In fact it isn't safe as now the controller restart can
> > > happily free the RX and TX ring buffers while the NAPI poll function is
> > > still accessing them. So with a heavily loaded but slightly instable
> 
> I think a possible solution is disable NAPI in restart function.
> So only one thread can reset BD queue.
> 
> BD queue is nolock design.
> 
It doesn't matter at all that the hardware BD queue is designed to be
operated lockless, you still have to synchronize the driver functions to
each other and explicit locks are a far better way to achieve this than
some implicit tunneling through a single thread or other such things.

Let us please try and concentrate on making things safe and easy to
understand and not introduce possibilities for breakage in the future,
when the next change goes into the driver.

> Can you provide test case?
> 
The test case is already described in my original mail: heavily loaded
link, so NAPI has to do some spinning in the receive function while
having the link flapping.

> > > link we regularly end up with OOPSes because link change restarts
> > > the FEC and bombs away buffers still in use.
> > >
> > > Also the NAPI enabled interrupt handler ACKs the INT and only later
> > > masks it, this way introducing a window where new interrupts could sneak
> > > in while we are already in polling mode.
> > >
> > > As it's way too late in the cycle to try and fix this up just revert the
> > > relevant patches for now.
> >
> > What about restoring the spinlocks and masking the int first?
> >
While reintroducing the spinlocks might fix the problem (I'll retest
that today) we are now holding a big lock for extended periods of time,
so while we are spinning in the receive poll function we are not able to
enqueue new TX buffers. This is also a problem with the original
patches, as they are mashing together the TX and RX interrupts.

Dave, even if the reverts are intrusive I'm still not convinced that we
should try and fix this up in the short period of time we have left
until the 3.9 final release.

To fix all this properly we would have to fix at least the following
things:
1. Split up the spinlock into two independent locks for RX and TX.
Interrupt handlers should only take their respective lock, things like
the FEC restart, who want to mess with both queues have to take both
locks.
2. Move locking to the right places, there is zero reason why the
adjust_link PHY callback has to take the locks, but rather FEC restart
should take them.
3. Introduce separate NAPI contexts for RX and TX, to get around one of
them blocking the other.

I doubt this will be less intrusive than reverting the offending patches
for now and taking a new stab at NAPI support in the next cycle.

Also I suspect the patch "net: fec: put tx to napi poll function to fix
dead lock" to introduce a more subtle problem in the ring buffer
accounting (why does this patch even change the way ring buffers are
tracked?) which triggers on rarer occasions, but I have to test if this
is still there with the lock added back.

Regards,
Lucas
-- 
Pengutronix e.K.                           | Lucas Stach                 |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-5076 |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html