lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAEA6p_DtTG6ryiG3GkxaySJeNcYF=RfkgCYTc-T-mHqMwL2-Gw@mail.gmail.com>
Date:   Sat, 27 Feb 2021 15:23:56 -0800
From:   Wei Wang <weiwan@...gle.com>
To:     Jakub Kicinski <kuba@...nel.org>,
        Alexander Duyck <alexanderduyck@...com>,
        Eric Dumazet <edumazet@...gle.com>
Cc:     "David S . Miller" <davem@...emloft.net>,
        Linux Kernel Network Developers <netdev@...r.kernel.org>,
        Martin Zaharinov <micron10@...il.com>,
        Paolo Abeni <pabeni@...hat.com>,
        Hannes Frederic Sowa <hannes@...essinduktion.org>
Subject: Re: [PATCH net v2] net: fix race between napi kthread mode and busy poll

On Sat, Feb 27, 2021 at 11:00 AM Wei Wang <weiwan@...gle.com> wrote:
>
> On Fri, Feb 26, 2021 at 6:08 PM Jakub Kicinski <kuba@...nel.org> wrote:
> >
> > On Fri, 26 Feb 2021 17:35:21 -0800 Wei Wang wrote:
> > > On Fri, Feb 26, 2021 at 5:22 PM Jakub Kicinski <kuba@...nel.org> wrote:
> > > >
> > > > On Fri, 26 Feb 2021 17:02:17 -0800 Wei Wang wrote:
> > > > >  static int napi_thread_wait(struct napi_struct *napi)
> > > > >  {
> > > > > +       bool woken = false;
> > > > > +
> > > > >         set_current_state(TASK_INTERRUPTIBLE);
> > > > >
> > > > >         while (!kthread_should_stop() && !napi_disable_pending(napi)) {
> > > > > -               if (test_bit(NAPI_STATE_SCHED, &napi->state)) {
> > > > > +               unsigned long state = READ_ONCE(napi->state);
> > > > > +
> > > > > +               if ((state & NAPIF_STATE_SCHED) &&
> > > > > +                   ((state & NAPIF_STATE_SCHED_THREAD) || woken)) {
> > > > >                         WARN_ON(!list_empty(&napi->poll_list));
> > > > >                         __set_current_state(TASK_RUNNING);
> > > > >                         return 0;
> > > > > +               } else {
> > > > > +                       WARN_ON(woken);
> > > > >                 }
> > > > >
> > > > >                 schedule();
> > > > > +               woken = true;
> > > > >                 set_current_state(TASK_INTERRUPTIBLE);
> > > > >         }
> > > > >         __set_current_state(TASK_RUNNING);
> > > > >
> > > > > I don't think it is sufficient to only set SCHED_THREADED bit when the
> > > > > thread is in RUNNING state.
> > > > > In fact, the thread is most likely NOT in RUNNING mode before we call
> > > > > wake_up_process() in ____napi_schedule(), because it has finished the
> > > > > previous round of napi->poll() and SCHED bit was cleared, so
> > > > > napi_thread_wait() sets the state to INTERRUPTIBLE and schedule() call
> > > > > should already put it in sleep.
> > > >
> > > > That's why the check says "|| woken":
> > > >
> > > >         ((state & NAPIF_STATE_SCHED_THREAD) ||  woken))
> > > >
> > > > thread knows it owns the NAPI if:
> > > >
> > > >   (a) the NAPI has the explicit flag set
> > > > or
> > > >   (b) it was just worken up and !kthread_should_stop(), since only
> > > >       someone who just claimed the normal SCHED on thread's behalf
> > > >       will wake it up
> > >
> > > The 'woken' is set after schedule(). If it is the first time
> > > napi_threaded_wait() is called, and SCHED_THREADED is not set, and
> > > woken is not set either, this thread will be put to sleep when it
> > > reaches schedule(), even though there is work waiting to be done on
> > > that napi. And I think this kthread will not be woken up again
> > > afterwards, since the SCHED bit is already grabbed.
> >
> > Indeed, looks like the task will be in WAKING state until it runs?
> > We can switch the check in ____napi_schedule() from
> >
> >         if (thread->state == TASK_RUNNING)
> >
> > to
> >
> >         if (!(thread->state & TASK_INTERRUPTIBLE))
> >
> > ?
>
> Hmm... I am not very sure what state the thread will be put in after
> kthread_create(). Could it be in TASK_INTERRUPTIBLE?

I did a printk and confirmed that the thread->state is
TASK_UNINTERRUPTIBLE after kthread_create() is called.
So I think if we change the above state to:
          if (thread->state != TASK_INTERRUPTIBLE)
                  set_bit(NAPI_STATE_SCHED_THREADED, &napi->state);
It should work.

I tested the following patch on my setup and saw no issues:
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index ddf4cfc12615..682908707c1a 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -360,6 +360,7 @@ enum {
        NAPI_STATE_IN_BUSY_POLL,        /* sk_busy_loop() owns this NAPI */
        NAPI_STATE_PREFER_BUSY_POLL,    /* prefer busy-polling over
softirq processing*/
        NAPI_STATE_THREADED,            /* The poll is performed
inside its own thread*/
+       NAPI_STATE_SCHED_THREADED,      /* Napi is currently scheduled
in threaded mode */
 };

 enum {
@@ -372,6 +373,7 @@ enum {
        NAPIF_STATE_IN_BUSY_POLL        = BIT(NAPI_STATE_IN_BUSY_POLL),
        NAPIF_STATE_PREFER_BUSY_POLL    = BIT(NAPI_STATE_PREFER_BUSY_POLL),
        NAPIF_STATE_THREADED            = BIT(NAPI_STATE_THREADED),
+       NAPIF_STATE_SCHED_THREADED      = BIT(NAPI_STATE_SCHED_THREADED),
 };

 enum gro_result {
diff --git a/net/core/dev.c b/net/core/dev.c
index 6c5967e80132..43607523ee99 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1501,17 +1501,18 @@ static int napi_kthread_create(struct napi_struct *n)
 {
        int err = 0;

-       /* Create and wake up the kthread once to put it in
-        * TASK_INTERRUPTIBLE mode to avoid the blocked task
-        * warning and work with loadavg.
+       /* Avoid waking up the kthread during creation to prevent
+        * potential race.
         */
-       n->thread = kthread_run(napi_threaded_poll, n, "napi/%s-%d",
-                               n->dev->name, n->napi_id);
+       n->thread = kthread_create(napi_threaded_poll, n, "napi/%s-%d",
+                                  n->dev->name, n->napi_id);
        if (IS_ERR(n->thread)) {
                err = PTR_ERR(n->thread);
-               pr_err("kthread_run failed with err %d\n", err);
+               pr_err("kthread_create failed with err %d\n", err);
                n->thread = NULL;
        }
@@ -4294,6 +4295,8 @@ static inline void ____napi_schedule(struct
softnet_data *sd,
                 */
                thread = READ_ONCE(napi->thread);
                if (thread) {
+                       if (thread->state != TASK_INTERRUPTIBLE)
+                               set_bit(NAPI_STATE_SCHED_THREADED,
&napi->state);
                        wake_up_process(thread);
                        return;
                }
@@ -6486,6 +6489,7 @@ bool napi_complete_done(struct napi_struct *n,
int work_done)
                WARN_ON_ONCE(!(val & NAPIF_STATE_SCHED));

                new = val & ~(NAPIF_STATE_MISSED | NAPIF_STATE_SCHED |
+                             NAPIF_STATE_SCHED_THREADED |
                              NAPIF_STATE_PREFER_BUSY_POLL);

                /* If STATE_MISSED was set, leave STATE_SCHED set,
@@ -6968,16 +6972,24 @@ static int napi_poll(struct napi_struct *n,
struct list_head *repoll)

 static int napi_thread_wait(struct napi_struct *napi)
 {
+       bool woken = false;
+
        set_current_state(TASK_INTERRUPTIBLE);

        while (!kthread_should_stop() && !napi_disable_pending(napi)) {
-               if (test_bit(NAPI_STATE_SCHED, &napi->state)) {
+               /* Testing SCHED_THREADED bit here to make sure the current
+                * kthread owns this napi and could poll on this napi.
+                * Testing SCHED bit is not enough because SCHED bit might be
+                * set by some other busy poll thread or by napi_disable().
+                */
+               if (test_bit(NAPI_STATE_SCHED_THREADED, &napi->state)
|| woken) {
                        WARN_ON(!list_empty(&napi->poll_list));
                        __set_current_state(TASK_RUNNING);
                        return 0;
                }

                schedule();
+                /* woken being true indicates this thread owns this napi. */
+               woken = true;
                set_current_state(TASK_INTERRUPTIBLE);
        }
        __set_current_state(TASK_RUNNING);

Jakub, Eric and Alexander,
What do you think of the above patch?
To me, the logic here seems more complicated than the original v2
patch, but it helps save quite some set_bit() in ____napi_schedule().
So it may be worthwhile?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ