lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 25 Feb 2021 00:21:15 -0800
From:   Jakub Kicinski <kuba@...nel.org>
To:     Wei Wang <weiwan@...gle.com>
Cc:     Alexander Duyck <alexanderduyck@...com>,
        Eric Dumazet <edumazet@...gle.com>,
        "David S . Miller" <davem@...emloft.net>,
        netdev <netdev@...r.kernel.org>, Paolo Abeni <pabeni@...hat.com>,
        Hannes Frederic Sowa <hannes@...essinduktion.org>,
        Martin Zaharinov <micron10@...il.com>
Subject: Re: [PATCH net] net: fix race between napi kthread mode and busy
 poll

On Wed, 24 Feb 2021 18:31:55 -0800 Wei Wang wrote:
> On Wed, Feb 24, 2021 at 6:03 PM Jakub Kicinski <kuba@...nel.org> wrote:
> >
> > On Thu, 25 Feb 2021 01:22:08 +0000 Alexander Duyck wrote:  
> > > Yeah, that was the patch Wei had done earlier. Eric complained about the extra set_bit atomic operation in the threaded path. That is when I came up with the idea of just adding a bit to the busy poll logic so that the only extra cost in the threaded path was having to check 2 bits instead of 1.  
> >
> > Maybe we can set the bit only if the thread is running? When thread
> > comes out of schedule() it can be sure that it has an NAPI to service.
> > But when it enters napi_thread_wait() and before it hits schedule()
> > it must be careful to make sure the NAPI is still (or already in the
> > very first run after creation) owned by it.  
> 
> Are you suggesting setting the SCHED_THREAD bit in napi_thread_wait()
> somewhere instead of in ____napi_schedule() as you previously plotted?
> What does it help? I think if we have to do an extra set_bit(), it
> seems cleaner to set it in ____napi_schedule(). This would solve the
> warning issue as well.

I was thinking of something roughly like this:

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index ddf4cfc12615..3bce94e8c110 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -360,6 +360,7 @@ enum {
        NAPI_STATE_IN_BUSY_POLL,        /* sk_busy_loop() owns this NAPI */
        NAPI_STATE_PREFER_BUSY_POLL,    /* prefer busy-polling over softirq processing*/
        NAPI_STATE_THREADED,            /* The poll is performed inside its own thread*/
+       NAPI_STATE_SCHED_THREAD,        /* Thread owns the NAPI and will poll */
 };
 
 enum {
@@ -372,6 +373,7 @@ enum {
        NAPIF_STATE_IN_BUSY_POLL        = BIT(NAPI_STATE_IN_BUSY_POLL),
        NAPIF_STATE_PREFER_BUSY_POLL    = BIT(NAPI_STATE_PREFER_BUSY_POLL),
        NAPIF_STATE_THREADED            = BIT(NAPI_STATE_THREADED),
+       NAPIF_STATE_SCHED_THREAD        = BIT(NAPI_STATE_SCHED_THREAD),
 };
 
 enum gro_result {
diff --git a/net/core/dev.c b/net/core/dev.c
index 6c5967e80132..852b992d0ebb 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4294,6 +4294,8 @@ static inline void ____napi_schedule(struct softnet_data *sd,
                 */
                thread = READ_ONCE(napi->thread);
                if (thread) {
+                       if (thread->state == TASK_RUNNING)
+                               set_bit(NAPIF_STATE_SCHED_THREAD, &napi->state);
                        wake_up_process(thread);
                        return;
                }
@@ -6486,7 +6488,8 @@ bool napi_complete_done(struct napi_struct *n, int work_done)
                WARN_ON_ONCE(!(val & NAPIF_STATE_SCHED));
 
                new = val & ~(NAPIF_STATE_MISSED | NAPIF_STATE_SCHED |
-                             NAPIF_STATE_PREFER_BUSY_POLL);
+                             NAPIF_STATE_PREFER_BUSY_POLL |
+                             NAPIF_STATE_SCHED_THREAD);
 
                /* If STATE_MISSED was set, leave STATE_SCHED set,
                 * because we will call napi->poll() one more time.
@@ -6968,16 +6971,24 @@ static int napi_poll(struct napi_struct *n, struct list_head *repoll)
 
 static int napi_thread_wait(struct napi_struct *napi)
 {
+       bool woken = false;
+
        set_current_state(TASK_INTERRUPTIBLE);
 
        while (!kthread_should_stop() && !napi_disable_pending(napi)) {
-               if (test_bit(NAPI_STATE_SCHED, &napi->state)) {
+               unsigned long state = READ_ONCE(napi->state);
+
+               if ((state & NAPIF_STATE_SCHED) &&
+                   ((state & NAPIF_STATE_SCHED_THREAD) || woken)) {
                        WARN_ON(!list_empty(&napi->poll_list));
                        __set_current_state(TASK_RUNNING);
                        return 0;
+               } else {
+                       WARN_ON(woken);
                }
 
                schedule();
+               woken = true;
                set_current_state(TASK_INTERRUPTIBLE);
        }
        __set_current_state(TASK_RUNNING);


Extra set_bit() is only done if napi_schedule() comes early enough to
see the thread still running. When the thread is woken we continue to
assume ownership.

It's just an idea (but it may solve the first run and the disable case).

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ