[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAEA6p_B3zqviLc15ppSQQqUPx2OeU2h+gnLx=NT5HUzxb_gs7Q@mail.gmail.com>
Date: Thu, 4 Feb 2021 16:42:29 -0800
From: Wei Wang <weiwan@...gle.com>
To: Alexander Duyck <alexander.duyck@...il.com>
Cc: David Miller <davem@...emloft.net>,
Netdev <netdev@...r.kernel.org>,
Jakub Kicinski <kuba@...nel.org>,
Paolo Abeni <pabeni@...hat.com>,
Hannes Frederic Sowa <hannes@...essinduktion.org>,
Eric Dumazet <edumazet@...gle.com>,
Felix Fietkau <nbd@....name>,
Alexander Duyck <alexanderduyck@...com>
Subject: Re: [PATCH net-next v10 3/3] net: add sysfs attribute to control napi
threaded mode
On Thu, Feb 4, 2021 at 2:17 PM Alexander Duyck
<alexander.duyck@...il.com> wrote:
>
> On Thu, Feb 4, 2021 at 1:35 PM Wei Wang <weiwan@...gle.com> wrote:
> >
> > This patch adds a new sysfs attribute to the network device class.
> > Said attribute provides a per-device control to enable/disable the
> > threaded mode for all the napi instances of the given network device,
> > without the need for a device up/down.
> > User sets it to 1 or 0 to enable or disable threaded mode.
> > Note: when switching between threaded and the current softirq based mode
> > for a napi instance, it will not immediately take effect if the napi is
> > currently being polled. The mode switch will happen for the next time
> > napi_schedule() is called.
> >
> > Co-developed-by: Paolo Abeni <pabeni@...hat.com>
> > Signed-off-by: Paolo Abeni <pabeni@...hat.com>
> > Co-developed-by: Hannes Frederic Sowa <hannes@...essinduktion.org>
> > Signed-off-by: Hannes Frederic Sowa <hannes@...essinduktion.org>
> > Co-developed-by: Felix Fietkau <nbd@....name>
> > Signed-off-by: Felix Fietkau <nbd@....name>
> > Signed-off-by: Wei Wang <weiwan@...gle.com>
> > ---
> > Documentation/ABI/testing/sysfs-class-net | 15 +++++
> > include/linux/netdevice.h | 2 +
> > net/core/dev.c | 67 ++++++++++++++++++++++-
> > net/core/net-sysfs.c | 45 +++++++++++++++
> > 4 files changed, 127 insertions(+), 2 deletions(-)
> >
> > diff --git a/Documentation/ABI/testing/sysfs-class-net b/Documentation/ABI/testing/sysfs-class-net
> > index 1f2002df5ba2..1419103d11f9 100644
> > --- a/Documentation/ABI/testing/sysfs-class-net
> > +++ b/Documentation/ABI/testing/sysfs-class-net
> > @@ -337,3 +337,18 @@ Contact: netdev@...r.kernel.org
> > Description:
> > 32-bit unsigned integer counting the number of times the link has
> > been down
> > +
> > +What: /sys/class/net/<iface>/threaded
> > +Date: Jan 2021
> > +KernelVersion: 5.12
> > +Contact: netdev@...r.kernel.org
> > +Description:
> > + Boolean value to control the threaded mode per device. User could
> > + set this value to enable/disable threaded mode for all napi
> > + belonging to this device, without the need to do device up/down.
> > +
> > + Possible values:
> > + == ==================================
> > + 0 threaded mode disabled for this dev
> > + 1 threaded mode enabled for this dev
> > + == ==================================
> > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> > index 99fb4ec9573e..1340327f7abf 100644
> > --- a/include/linux/netdevice.h
> > +++ b/include/linux/netdevice.h
> > @@ -497,6 +497,8 @@ static inline bool napi_complete(struct napi_struct *n)
> > return napi_complete_done(n, 0);
> > }
> >
> > +int dev_set_threaded(struct net_device *dev, bool threaded);
> > +
> > /**
> > * napi_disable - prevent NAPI from scheduling
> > * @n: NAPI context
> > diff --git a/net/core/dev.c b/net/core/dev.c
> > index a8c5eca17074..9cc9b245419e 100644
> > --- a/net/core/dev.c
> > +++ b/net/core/dev.c
> > @@ -4290,8 +4290,9 @@ static inline void ____napi_schedule(struct softnet_data *sd,
> >
> > if (test_bit(NAPI_STATE_THREADED, &napi->state)) {
> > /* Paired with smp_mb__before_atomic() in
> > - * napi_enable(). Use READ_ONCE() to guarantee
> > - * a complete read on napi->thread. Only call
> > + * napi_enable()/napi_set_threaded().
> > + * Use READ_ONCE() to guarantee a complete
> > + * read on napi->thread. Only call
> > * wake_up_process() when it's not NULL.
> > */
> > thread = READ_ONCE(napi->thread);
> > @@ -6743,6 +6744,68 @@ static void init_gro_hash(struct napi_struct *napi)
> > napi->gro_bitmask = 0;
> > }
> >
> > +/* Setting/unsetting threaded mode on a napi might not immediately
> > + * take effect, if the current napi instance is actively being
> > + * polled. In this case, the switch between threaded mode and
> > + * softirq mode will happen in the next round of napi_schedule().
> > + * This should not cause hiccups/stalls to the live traffic.
> > + */
> > +static int napi_set_threaded(struct napi_struct *n, bool threaded)
> > +{
> > + int err = 0;
> > +
> > + if (threaded == !!test_bit(NAPI_STATE_THREADED, &n->state))
> > + return 0;
> > +
> > + if (!threaded) {
> > + clear_bit(NAPI_STATE_THREADED, &n->state);
> > + return 0;
> > + }
>
>
> > +
> > + if (!n->thread) {
> > + err = napi_kthread_create(n);
> > + if (err)
> > + return err;
> > + }
>
> This piece needs to be broken out similar to what we did for the
> napi_add and napi enable. In the case where we are enabling the
> threaded NAPI you should first go through and allocate all the
> threads. Then once all the threads are allocated you then enable them
> by setting the NAPI_STATE_THREADED bit.
>
> I would pull this section out and place it in a loop in
> dev_set_threaded to handle creating the threads before you set
> dev->threaded and then set the threaded flags in the napi instances.
>
> > +
> > + /* Make sure kthread is created before THREADED bit
> > + * is set.
> > + */
> > + smp_mb__before_atomic();
>
> If we split off the allocation like I mentioned earlier we can
> probably pull out the barrier and place it after the allocation of the
> kthreads.
>
> > + set_bit(NAPI_STATE_THREADED, &n->state);
> > +
> > + return 0;
> > +}
> > +
>
> With the change I suggested above you could also drop the return type
> from this because it should always succeed.
>
> > +static void dev_disable_threaded_all(struct net_device *dev)
> > +{
> > + struct napi_struct *napi;
> > +
> > + list_for_each_entry(napi, &dev->napi_list, dev_list)
> > + napi_set_threaded(napi, false);
> > + dev->threaded = 0;
> > +}
> > +
>
> You might consider renaming this to dev_set_threaded_all and passing a
> boolean "threaded" argument. Then you could reuse this code for both
> enabling and disabling of the threaded setup. Another thing I might
> change would be how we go about toggling the value. Maybe something
> more like:
>
> dev->threaded = 0;
> list_for_each_entry(napi, &dev->napi_list, dev_list)
> napi_set_threaded(napi, threaded);
> dev->threaded = threaded;
>
> The advantage of doing it this way is that we will be
> enabling/disabling all instances at the same time since we are
> overwriting the dev->threaded first and then taking care of the
> individual flags. If you are enabling then after we have set the bits
> we set dev->threaded which will switch them all on, and when we clear
> dev->threaded it will switch them all off so they should stop checking
> the threaded flags in the napi structs.
>
> > +int dev_set_threaded(struct net_device *dev, bool threaded)
> > +{
> > + struct napi_struct *napi;
> > + int ret;
> > +
>
> There are probably a couple changes that need to be made. First I
> would initialize ret to 0 here. More on that below.
>
> Second we need to add a check for if we are actually changing
> anything. If not then we shouldn't be bothering. So something like the
> following at the start should take care of that:
>
> if (!dev->threaded == !threaded)
> return 0;
>
> I would add a list_for_each_entry loop that does the thread allocation
> I called out above and will break if an error is encountered. That way
> you aren't toggling dev->threaded unless you know all of the instances
> can succeed. Something like:
>
> if (!dev->threaded) {
> list_for_each_entry(napi, &dev->napi_list, dev_list) {
> ret = napi_kthread_create(napi);
> if (ret)
> break;
> }
> }
>
> /* Make sure kthread is created before THREADED bit is set. */
> smp_mb__before_atomic();
>
> > + dev->threaded = threaded;
> > + list_for_each_entry(napi, &dev->napi_list, dev_list) {
> > + ret = napi_set_threaded(napi, threaded);
> > + if (ret) {
> > + /* Error occurred on one of the napi,
> > + * reset threaded mode on all napi.
> > + */
> > + dev_disable_threaded_all(dev);
> > + break;
> > + }
> > + }
>
> With my suggestion above you could look at just making the
> dev_disable_threaded_all accept a boolean argument for enable/disable
> and then you could just do something like:
>
> dev_set_threaded_all(struct net_device *dev, !ret && threaded);
>
Ack for the above suggestions. I think I used a combined function
napi_set_threaded() to create kthread + set threaded bit here because
I have the knowledge of all the napi instances belonging to dev here.
And if anyone fails, I would just disable them all afterwards. But I
think that might introduce an inconsistent stage where some napi are
in threaded mode, and some are in softirq mode.
I agree it is cleaner to first create kthreads for all napi, then set
the threaded bit for all.
> > +
> > + return ret;
> > +}
> > +
> > void netif_napi_add(struct net_device *dev, struct napi_struct *napi,
> > int (*poll)(struct napi_struct *, int), int weight)
> > {
> > diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
> > index daf502c13d6d..969743567257 100644
> > --- a/net/core/net-sysfs.c
> > +++ b/net/core/net-sysfs.c
> > @@ -538,6 +538,50 @@ static ssize_t phys_switch_id_show(struct device *dev,
> > }
> > static DEVICE_ATTR_RO(phys_switch_id);
> >
> > +static ssize_t threaded_show(struct device *dev,
> > + struct device_attribute *attr, char *buf)
> > +{
> > + struct net_device *netdev = to_net_dev(dev);
> > + int ret;
> > +
> > + if (!rtnl_trylock())
> > + return restart_syscall();
> > +
> > + if (!dev_isalive(netdev)) {
> > + ret = -EINVAL;
> > + goto unlock;
> > + }
> > +
> > + ret = sprintf(buf, fmt_dec, netdev->threaded);
> > +
>
> It seems like the more standard pattern is to initialize ret to
> -EVINAL and just call the sprintf if dev_isalive is true. That would
> allow you to drop the unlock jump label you had to add below.
>
Ack.
> > +unlock:
> > + rtnl_unlock();
> > + return ret;
> > +}
> > +
> > +static int modify_napi_threaded(struct net_device *dev, unsigned long val)
> > +{
> > + int ret;
> > +
> > + if (list_empty(&dev->napi_list))
> > + return -EOPNOTSUPP;
> > +
> > + if (val != 0 && val != 1)
> > + return -EOPNOTSUPP;
> > +
> > + ret = dev_set_threaded(dev, val);
> > +
> > + return ret;
> > +}
> > +
> > +static ssize_t threaded_store(struct device *dev,
> > + struct device_attribute *attr,
> > + const char *buf, size_t len)
> > +{
> > + return netdev_store(dev, attr, buf, len, modify_napi_threaded);
> > +}
> > +static DEVICE_ATTR_RW(threaded);
> > +
> > static struct attribute *net_class_attrs[] __ro_after_init = {
> > &dev_attr_netdev_group.attr,
> > &dev_attr_type.attr,
> > @@ -570,6 +614,7 @@ static struct attribute *net_class_attrs[] __ro_after_init = {
> > &dev_attr_proto_down.attr,
> > &dev_attr_carrier_up_count.attr,
> > &dev_attr_carrier_down_count.attr,
> > + &dev_attr_threaded.attr,
> > NULL,
> > };
> > ATTRIBUTE_GROUPS(net_class);
> > --
> > 2.30.0.365.g02bc693789-goog
> >
Powered by blists - more mailing lists