[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID:
<BY3PR18MB4721AF7DFD7F5F0384C37B84C7052@BY3PR18MB4721.namprd18.prod.outlook.com>
Date: Wed, 18 Dec 2024 13:58:14 +0000
From: Shinas Rasheed <srasheed@...vell.com>
To: Larysa Zaremba <larysa.zaremba@...el.com>
CC: "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
Haseeb Gani
<hgani@...vell.com>, Sathesh B Edara <sedara@...vell.com>,
Vimlesh Kumar
<vimleshk@...vell.com>,
"thaller@...hat.com" <thaller@...hat.com>,
"wizhao@...hat.com" <wizhao@...hat.com>,
"kheib@...hat.com"
<kheib@...hat.com>,
"konguyen@...hat.com" <konguyen@...hat.com>,
"horms@...nel.org" <horms@...nel.org>,
"einstein.xue@...axg.com"
<einstein.xue@...axg.com>,
Veerasenareddy Burru <vburru@...vell.com>,
Andrew
Lunn <andrew+netdev@...n.ch>,
"David S. Miller" <davem@...emloft.net>,
Eric
Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>,
Paolo Abeni
<pabeni@...hat.com>,
Abhijit Ayarekar <aayarekar@...vell.com>,
Satananda
Burla <sburla@...vell.com>
Subject: RE: [EXTERNAL] Re: [PATCH net v2 1/4] octeon_ep: fix race conditions
in ndo_get_stats64
Hi Larysa,
> > > > > On Mon, Dec 16, 2024 at 03:30:12PM +0100, Larysa Zaremba wrote:
> > > > > > On Sun, Dec 15, 2024 at 11:58:39PM -0800, Shinas Rasheed wrote:
> > > > > > > ndo_get_stats64() can race with ndo_stop(), which frees input and
> > > > > > > output queue resources. Call synchronize_net() to avoid such
> races.
> > > > > > >
> > > > > > > Fixes: 6a610a46bad1 ("octeon_ep: add support for ndo ops")
> > > > > > > Signed-off-by: Shinas Rasheed <srasheed@...vell.com>
> > > > > > > ---
> > > > > > > V2:
> > > > > > > - Changed sync mechanism to fix race conditions from using an
> > > atomic
> > > > > > > set_bit ops to a much simpler synchronize_net()
> > > > > > >
> > > > > > > drivers/net/ethernet/marvell/octeon_ep/octep_main.c | 1 +
> > > > > > > 1 file changed, 1 insertion(+)
> > > > > > >
> > > > > > > diff --git a/drivers/net/ethernet/marvell/octeon_ep/octep_main.c
> > > > > b/drivers/net/ethernet/marvell/octeon_ep/octep_main.c
> > > > > > > index 549436efc204..941bbaaa67b5 100644
> > > > > > > --- a/drivers/net/ethernet/marvell/octeon_ep/octep_main.c
> > > > > > > +++ b/drivers/net/ethernet/marvell/octeon_ep/octep_main.c
> > > > > > > @@ -757,6 +757,7 @@ static int octep_stop(struct net_device
> > > *netdev)
> > > > > > > {
> > > > > > > struct octep_device *oct = netdev_priv(netdev);
> > > > > > >
> > > > > > > + synchronize_net();
> > > > > >
> > > > > > You should have elaborated on the fact that this synchronize_net() is
> for
> > > > > > __LINK_STATE_START flag in the commit message, this is not obvious.
> > > Also,
> > > > > is
> > > > > > octep_get_stats64() called from RCU-safe context?
> > > > > >
> > > > >
> > > > > Now I see that in case !netif_running(), you do not bail out of
> > > > > octep_get_stats64() fully (or at all after the second patch). So, could
> you
> > > > > explain, how are you utilizing RCU here?
> > > > >
> > > >
> > > > The understanding is that octep_get_stats64() (.ndo_get_stats64() in
> turn) is
> > > called from RCU safe contexts, and
> > > > that the netdev op is never called after the ndo_stop().
> > >
> > > As I now see, in net/core/net-sysfs.c, yes there is an rcu read lock around
> the
> > > thing, but there are a lot more callers and for example veth_get_stats64()
> > > explicitly calls rcu_read_lock().
> > >
> > > Also, even with RCU-protected section, I am not sure prevents the
> > > octep_get_stats64() to be called after synchronize_net() finishes. Again,
> the
> > > callers seem too diverse to definitely say that we can rely on built-in flags
> > > for this to not happen :/
> >
> > Usually, the understanding is that ndo_get_stats won't be called by the
> network stack after the interface is put down. As long as that is the case, I
> don't think we should keep adding checks until there is a strong reason to do
> so. What do you think?
> >
>
> It is hard to know without testing (but testing should not be hard). I think the
> phrase "Statistics must persist across routine operations like bringing the
> interface down and up." [0] implies that bringing the interface down may not
> necessarily prevent stats calls.
>
> [0] https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__docs.kernel.org_networking_statistics.html&d=DwIBAg&c=nKjWec2b6R0
> mOyPaz7xtfQ&r=1OxLD4y-oxrlgQ1rjXgWtmLz1pnaDjD96sDq-
> cKUwK4&m=DJzJNo9WT10pSHikJhCBbN7-CfB-
> O2kz9OVGsmiIRQXvcIIWDK6034tMmzZGvlFs&s=essE01suLWF42taNi0yJ3H3YC
> 0Et8GofMj5wxor9yD4&e=
>
Sorry, I misworded my previous statement. Of course ndo_get_stats can get called while the netdev is down. This is tested code, and the reason why there is no issue in this scenario is because for a ndo_get_stats call that happens after ndo_stop() happens, the oct->num_oqs will be seen as 0, and hence octep_iq nor octep_oq is not accessed in the for loop that follows. Octep_iq and octep_oq are the resources that we're trying to protect from race conditions. Hope that clarifies things.
> > > > Thanks for the comments
Powered by blists - more mailing lists