[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20170227.210810.197044715013755200.davem@davemloft.net>
Date:   Mon, 27 Feb 2017 21:08:10 -0500 (EST)
From:   David Miller <davem@...emloft.net>
To:     eric.dumazet@...il.com
Cc:     netdev@...r.kernel.org, tariqt@...lanox.com, saeedm@...lanox.com
Subject: Re: [PATCH v2 net] net: solve a NAPI race
From: Eric Dumazet <eric.dumazet@...il.com>
Date: Mon, 27 Feb 2017 08:44:14 -0800
> Any point doing a napi_schedule() not from device hard irq handler
> is subject to the race for NIC using some kind of edge trigger
> interrupts.
> 
> Since we do not provide a ndo to disable device interrupts, the
> following can happen.
Ok, now I understand.
I think even without considering the race you are trying to solve,
this situation is really dangerous.
I am sure that every ->poll() handler out there was written by an
author who completely assumed that if they are executing then the
device's interrupts for that NAPI instance are disabled.  And this is
with very few, if any, exceptions.
So if we saw a driver doing something like:
	reg->irq_enable ^= value;
after napi_complete_done(), it would be quite understandable.
We really made a mistake taking the napi_schedule() call out of
the domain of the driver so that it could manage the interrupt
state properly.
I'm not against your missed bit fix as a short-term cure for now, it's
just that somewhere down the road we need to manage the interrupt
properly.
Powered by blists - more mailing lists
 
