[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAEA6p_DcMsVqr9yacK-HYnS9trK7wKBTGFYEMyqG4vgC2k-H=w@mail.gmail.com>
Date: Mon, 18 Jan 2021 11:58:29 -0800
From: Wei Wang <weiwan@...gle.com>
To: Alexander Duyck <alexander.duyck@...il.com>
Cc: David Miller <davem@...emloft.net>,
Netdev <netdev@...r.kernel.org>,
Jakub Kicinski <kuba@...nel.org>,
Eric Dumazet <edumazet@...gle.com>,
Paolo Abeni <pabeni@...hat.com>,
Hannes Frederic Sowa <hannes@...essinduktion.org>,
Felix Fietkau <nbd@....name>
Subject: Re: [PATCH net-next v6 3/3] net: add sysfs attribute to control napi
threaded mode
On Fri, Jan 15, 2021 at 6:18 PM Alexander Duyck
<alexander.duyck@...il.com> wrote:
>
> On Fri, Jan 15, 2021 at 4:44 PM Wei Wang <weiwan@...gle.com> wrote:
> >
> > On Fri, Jan 15, 2021 at 3:08 PM Alexander Duyck
> > <alexander.duyck@...il.com> wrote:
> > >
> > > On Fri, Jan 15, 2021 at 1:54 PM Wei Wang <weiwan@...gle.com> wrote:
> > > >
> > > > On Thu, Jan 14, 2021 at 7:34 PM Alexander Duyck
> > > > <alexander.duyck@...il.com> wrote:
> > > > >
> > > > > On Thu, Jan 14, 2021 at 4:34 PM Wei Wang <weiwan@...gle.com> wrote:
> > > > > >
> > > > > > This patch adds a new sysfs attribute to the network device class.
> > > > > > Said attribute provides a per-device control to enable/disable the
> > > > > > threaded mode for all the napi instances of the given network device.
> > > > > >
> > > > > > Co-developed-by: Paolo Abeni <pabeni@...hat.com>
> > > > > > Signed-off-by: Paolo Abeni <pabeni@...hat.com>
> > > > > > Co-developed-by: Hannes Frederic Sowa <hannes@...essinduktion.org>
> > > > > > Signed-off-by: Hannes Frederic Sowa <hannes@...essinduktion.org>
> > > > > > Co-developed-by: Felix Fietkau <nbd@....name>
> > > > > > Signed-off-by: Felix Fietkau <nbd@....name>
> > > > > > Signed-off-by: Wei Wang <weiwan@...gle.com>
> > > > > > ---
> > > > > > include/linux/netdevice.h | 2 ++
> > > > > > net/core/dev.c | 28 +++++++++++++++++
> > > > > > net/core/net-sysfs.c | 63 +++++++++++++++++++++++++++++++++++++++
> > > > > > 3 files changed, 93 insertions(+)
> > > > > >
> > > > > > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> > > > > > index c24ed232c746..11ae0c9b9350 100644
> > > > > > --- a/include/linux/netdevice.h
> > > > > > +++ b/include/linux/netdevice.h
> > > > > > @@ -497,6 +497,8 @@ static inline bool napi_complete(struct napi_struct *n)
> > > > > > return napi_complete_done(n, 0);
> > > > > > }
> > > > > >
> > > > > > +int dev_set_threaded(struct net_device *dev, bool threaded);
> > > > > > +
> > > > > > /**
> > > > > > * napi_disable - prevent NAPI from scheduling
> > > > > > * @n: NAPI context
> > > > > > diff --git a/net/core/dev.c b/net/core/dev.c
> > > > > > index edcfec1361e9..d5fb95316ea8 100644
> > > > > > --- a/net/core/dev.c
> > > > > > +++ b/net/core/dev.c
> > > > > > @@ -6754,6 +6754,34 @@ static int napi_set_threaded(struct napi_struct *n, bool threaded)
> > > > > > return err;
> > > > > > }
> > > > > >
> > > > > > +static void dev_disable_threaded_all(struct net_device *dev)
> > > > > > +{
> > > > > > + struct napi_struct *napi;
> > > > > > +
> > > > > > + list_for_each_entry(napi, &dev->napi_list, dev_list)
> > > > > > + napi_set_threaded(napi, false);
> > > > > > +}
> > > > > > +
> > > > > > +int dev_set_threaded(struct net_device *dev, bool threaded)
> > > > > > +{
> > > > > > + struct napi_struct *napi;
> > > > > > + int ret;
> > > > > > +
> > > > > > + dev->threaded = threaded;
> > > > > > + list_for_each_entry(napi, &dev->napi_list, dev_list) {
> > > > > > + ret = napi_set_threaded(napi, threaded);
> > > > > > + if (ret) {
> > > > > > + /* Error occurred on one of the napi,
> > > > > > + * reset threaded mode on all napi.
> > > > > > + */
> > > > > > + dev_disable_threaded_all(dev);
> > > > > > + break;
> > > > > > + }
> > > > > > + }
> > > > > > +
> > > > > > + return ret;
> > > > >
> > > > > This doesn't seem right. The NAPI instances can be active while this
> > > > > is occuring can they not? I would think at a minimum you need to go
> > > > > through a napi_disable/napi_enable in order to toggle this value for
> > > > > each NAPI instance. Otherwise aren't you at risk for racing and having
> > > > > a napi_schedule attempt to wake up the thread before it has been
> > > > > allocated?
> > > >
> > > >
> > > > Yes. The napi instance could be active when this occurs. And I think
> > > > it is OK. It is cause napi_set_threaded() only sets
> > > > NAPI_STATE_THREADED bit after successfully created the kthread. And
> > > > ___napi_schedule() only tries to wake up the kthread after testing the
> > > > THREADED bit.
> > >
> > > But what do you have guaranteeing that the kthread has been written to
> > > memory? That is what I was getting at. Just because you have written
> > > the value doesn't mean it is in memory yet so you would probably need
> > > an smb_mb__before_atomic() barrier call before you set the bit.
> > >
> > Noted. Will look into this.
> >
> > > Also I am not sure it is entirely safe to have the instance polling
> > > while you are doing this. That is why I am thinking if the instance is
> > > enabled then a napi_disable/napi_enable would be preferable.
> > When the napi is actively being polled in threaded mode, we will keep
> > rescheduling the kthread and calling __napi_poll() until
> > NAPI_SCHED_STATE is cleared by napi_complete_done(). And during the
> > next time napi_schedule() is called, we re-evaluate
> > NAPI_STATE_THREADED bit to see if we should wake up kthread, or
> > generate softirq.
> > And for the other way around, if napi is being handled during
> > net_rx_action(), toggling the bit won't cause immediate wake-up of the
> > kthread, but will wait for NAPI_SCHED_STATE to be cleared, and the
> > next time napi_schedule() is called.
> > I think it is OK. WDYT?
>
> It is hard to say. The one spot that gives me a bit of concern is the
> NAPIF_STATE_MISSED case in napi_complete_done. It is essentially would
> become a switchover point between the two while we are actively
> polling inside the driver. You end up with NAPI_SCHED_STATE not being
> toggled but jumping from one to the other.
Hmm.. Right. That is the one case where NAPI_SCHED_STATE will not be
toggled, but could potentially change the processing mode.
But still, I don't see any race in this case. The napi instance will
still either be processed in softirq mode by net_rx_action(), or in
the kthread mode, after napi_complete_done() calls __napi_schedule().
Powered by blists - more mailing lists