lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAEA6p_AJYBPMQY2DEy_vhRwrq5fnZR3z0A_-_HN0+S4yc45enQ@mail.gmail.com>
Date:   Fri, 26 Feb 2021 10:28:00 -0800
From:   Wei Wang <weiwan@...gle.com>
To:     Martin Zaharinov <micron10@...il.com>
Cc:     Jakub Kicinski <kuba@...nel.org>,
        Alexander Duyck <alexanderduyck@...com>,
        Eric Dumazet <edumazet@...gle.com>,
        "David S . Miller" <davem@...emloft.net>,
        netdev <netdev@...r.kernel.org>, Paolo Abeni <pabeni@...hat.com>,
        Hannes Frederic Sowa <hannes@...essinduktion.org>,
        Alexander Duyck <alexander.duyck@...il.com>
Subject: Re: [PATCH net] net: fix race between napi kthread mode and busy poll

Hi Martin,
Could you help try the following new patch on your setup and let me
know if there are still issues?

Thanks.
Wei

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index ddf4cfc12615..9ed0f89ccdd5 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -357,9 +357,10 @@ enum {
        NAPI_STATE_NPSVC,               /* Netpoll - don't dequeue
from poll_list */
        NAPI_STATE_LISTED,              /* NAPI added to system lists */
        NAPI_STATE_NO_BUSY_POLL,        /* Do not add in napi_hash, no
busy polling */
-       NAPI_STATE_IN_BUSY_POLL,        /* sk_busy_loop() owns this NAPI */
+       NAPI_STATE_IN_BUSY_POLL,        /* sk_busy_loop() grabs SHED
bit and could busy poll */
        NAPI_STATE_PREFER_BUSY_POLL,    /* prefer busy-polling over
softirq processing*/
        NAPI_STATE_THREADED,            /* The poll is performed
inside its own thread*/
+       NAPI_STATE_SCHED_BUSY_POLL,     /* Napi is currently scheduled
in busy poll mode */
 };

 enum {
@@ -372,6 +373,7 @@ enum {
        NAPIF_STATE_IN_BUSY_POLL        = BIT(NAPI_STATE_IN_BUSY_POLL),
        NAPIF_STATE_PREFER_BUSY_POLL    = BIT(NAPI_STATE_PREFER_BUSY_POLL),
        NAPIF_STATE_THREADED            = BIT(NAPI_STATE_THREADED),
+       NAPIF_STATE_SCHED_BUSY_POLL     = BIT(NAPI_STATE_SCHED_BUSY_POLL),
 };

 enum gro_result {
diff --git a/net/core/dev.c b/net/core/dev.c
index 6c5967e80132..c717b67ce137 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1501,15 +1501,14 @@ static int napi_kthread_create(struct napi_struct *n)
 {
        int err = 0;

-       /* Create and wake up the kthread once to put it in
-        * TASK_INTERRUPTIBLE mode to avoid the blocked task
-        * warning and work with loadavg.
+       /* Avoid using  kthread_run() here to prevent race
+        * between softirq and kthread polling.
         */
-       n->thread = kthread_run(napi_threaded_poll, n, "napi/%s-%d",
-                               n->dev->name, n->napi_id);
+       n->thread = kthread_create(napi_threaded_poll, n, "napi/%s-%d",
+                                  n->dev->name, n->napi_id);
        if (IS_ERR(n->thread)) {
                err = PTR_ERR(n->thread);
-               pr_err("kthread_run failed with err %d\n", err);
+               pr_err("kthread_create failed with err %d\n", err);
                n->thread = NULL;
        }

@@ -6486,6 +6485,7 @@ bool napi_complete_done(struct napi_struct *n,
int work_done)
                WARN_ON_ONCE(!(val & NAPIF_STATE_SCHED));

                new = val & ~(NAPIF_STATE_MISSED | NAPIF_STATE_SCHED |
+                             NAPIF_STATE_SCHED_BUSY_POLL |
                              NAPIF_STATE_PREFER_BUSY_POLL);

                /* If STATE_MISSED was set, leave STATE_SCHED set,
@@ -6525,6 +6525,7 @@ static struct napi_struct *napi_by_id(unsigned
int napi_id)

 static void __busy_poll_stop(struct napi_struct *napi, bool skip_schedule)
 {
+       clear_bit(NAPI_STATE_SCHED_BUSY_POLL, &napi->state);
        if (!skip_schedule) {
                gro_normal_list(napi);
                __napi_schedule(napi);
@@ -6624,7 +6625,8 @@ void napi_busy_loop(unsigned int napi_id,
                        }
                        if (cmpxchg(&napi->state, val,
                                    val | NAPIF_STATE_IN_BUSY_POLL |
-                                         NAPIF_STATE_SCHED) != val) {
+                                         NAPIF_STATE_SCHED |
+                                         NAPIF_STATE_SCHED_BUSY_POLL) != val) {
                                if (prefer_busy_poll)

set_bit(NAPI_STATE_PREFER_BUSY_POLL, &napi->state);
                                goto count;
@@ -6971,7 +6973,10 @@ static int napi_thread_wait(struct napi_struct *napi)
        set_current_state(TASK_INTERRUPTIBLE);

        while (!kthread_should_stop() && !napi_disable_pending(napi)) {
-               if (test_bit(NAPI_STATE_SCHED, &napi->state)) {
+               unsigned long val = READ_ONCE(napi->state);
+
+               if (val & NAPIF_STATE_SCHED &&
+                   !(val & NAPIF_STATE_SCHED_BUSY_POLL)) {
                        WARN_ON(!list_empty(&napi->poll_list));
                        __set_current_state(TASK_RUNNING);
                        return 0;

On Thu, Feb 25, 2021 at 7:52 PM Alexander Duyck
<alexander.duyck@...il.com> wrote:
>
> On Thu, Feb 25, 2021 at 5:20 PM Jakub Kicinski <kuba@...nel.org> wrote:
> >
> > On Thu, 25 Feb 2021 16:16:20 -0800 Wei Wang wrote:
> > > On Thu, Feb 25, 2021 at 3:00 PM Jakub Kicinski <kuba@...nel.org> wrote:
> > > > On Thu, 25 Feb 2021 10:29:47 -0800 Wei Wang wrote:
> > > > > Hmm... I don't think the above patch would work. Consider a situation that:
> > > > > 1. At first, the kthread is in sleep mode.
> > > > > 2. Then someone calls napi_schedule() to schedule work on this napi.
> > > > > So ____napi_schedule() is called. But at this moment, the kthread is
> > > > > not yet in RUNNING state. So this function does not set SCHED_THREAD
> > > > > bit.
> > > > > 3. Then wake_up_process() is called to wake up the thread.
> > > > > 4. Then napi_threaded_poll() calls napi_thread_wait().
> > > >
> > > > But how is the task not in running state outside of napi_thread_wait()?
> > > >
> > > > My scheduler knowledge is rudimentary, but AFAIU off CPU tasks which
> > > > were not put to sleep are still in RUNNING state, so unless we set
> > > > INTERRUPTIBLE the task will be running, even if it's stuck in cond_resched().
> > >
> > > I think the thread is only in RUNNING state after wake_up_process() is
> > > called on the thread in ____napi_schedule(). Before that, it should be
> > > in INTERRUPTIBLE state. napi_thread_wait() explicitly calls
> > > set_current_state(TASK_INTERRUPTIBLE) when it finishes 1 round of
> > > polling.
> >
> > Are you concerned about it not being in RUNNING state after it's
> > spawned but before it's first parked?
> >
> > > > > woken is false
> > > > > and SCHED_THREAD bit is not set. So the kthread will go to sleep again
> > > > > (in INTERRUPTIBLE mode) when schedule() is called, and waits to be
> > > > > woken up by the next napi_schedule().
> > > > > That will introduce arbitrary delay for the napi->poll() to be called.
> > > > > Isn't it? Please enlighten me if I did not understand it correctly.
> > > >
> > > > Probably just me not understanding the scheduler :)
> > > >
> > > > > I personally prefer to directly set SCHED_THREAD bit in ____napi_schedule().
> > > > > Or stick with SCHED_BUSY_POLL solution and replace kthread_run() with
> > > > > kthread_create().
> > > >
> > > > Well, I'm fine with that too, no point arguing further if I'm not
> > > > convincing anyone. But we need a fix which fixes the issue completely,
> > > > not just one of three incarnations.
> > >
> > > Alexander and Eric,
> > > Do you guys have preference on which approach to take?
> > > If we keep the current SCHED_BUSY_POLL patch, I think we need to
> > > change kthread_run() to kthread_create() to address the warning Martin
> > > reported.
> > > Or if we choose to set SCHED_THREADED, we could keep kthread_run().
> > > But there is 1 extra set_bit() operation.
> >
> > To be clear extra set_bit() only if thread is running, which if IRQ
> > coalescing works should be rather rare.
>
> I was good with either approach. My preference would be to probably
> use kthread_create regardless as it doesn't make much sense to have
> the thread running until we really need it anyway.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ