lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAEA6p_Crfx8_izk+GCE30a-DAwiKbNmxNKJ0=7be1Wtm8AbX8Q@mail.gmail.com>
Date:   Wed, 24 Feb 2021 14:29:21 -0800
From:   Wei Wang <weiwan@...gle.com>
To:     Jakub Kicinski <kuba@...nel.org>
Cc:     Eric Dumazet <edumazet@...gle.com>,
        "David S . Miller" <davem@...emloft.net>,
        netdev <netdev@...r.kernel.org>, Paolo Abeni <pabeni@...hat.com>,
        Hannes Frederic Sowa <hannes@...essinduktion.org>,
        Alexander Duyck <alexanderduyck@...com>,
        Martin Zaharinov <micron10@...il.com>
Subject: Re: [PATCH net] net: fix race between napi kthread mode and busy poll

On Wed, Feb 24, 2021 at 1:30 PM Jakub Kicinski <kuba@...nel.org> wrote:
>
> On Wed, 24 Feb 2021 21:37:36 +0100 Eric Dumazet wrote:
> > On Wed, Feb 24, 2021 at 8:48 PM Jakub Kicinski <kuba@...nel.org> wrote:
> > > On Tue, 23 Feb 2021 15:41:30 -0800 Wei Wang wrote:
> > > > Currently, napi_thread_wait() checks for NAPI_STATE_SCHED bit to
> > > > determine if the kthread owns this napi and could call napi->poll() on
> > > > it. However, if socket busy poll is enabled, it is possible that the
> > > > busy poll thread grabs this SCHED bit (after the previous napi->poll()
> > > > invokes napi_complete_done() and clears SCHED bit) and tries to poll
> > > > on the same napi.
> > > > This patch tries to fix this race by adding a new bit
> > > > NAPI_STATE_SCHED_BUSY_POLL in napi->state. This bit gets set in
> > > > napi_busy_loop() togther with NAPI_STATE_SCHED, and gets cleared in
> > > > napi_complete_done() together with NAPI_STATE_SCHED. This helps
> > > > distinguish the ownership of the napi between kthread and the busy poll
> > > > thread, and prevents the kthread from polling on the napi when this napi
> > > > is still owned by the busy poll thread.
> > > >
> > > > Fixes: 29863d41bb6e ("net: implement threaded-able napi poll loop support")
> > > > Reported-by: Martin Zaharinov <micron10@...il.com>
> > > > Suggested-by: Alexander Duyck <alexanderduyck@...com>
> > > > Reviewed-by: Alexander Duyck <alexanderduyck@...com>
> > > > Reviewed-by: Eric Dumazet <edumazet@...gle.come>
> > >
> > > AFAIU sched bit controls the ownership of the poll_list
> >
> > I disagree. BUSY POLL never inserted the napi into a list,
> > because the user thread was polling one napi.
> >
> > Same for the kthread.
>
> There is no delayed execution in busy_poll. It either got the sched bit
> and it knows it, or it didn't.
>
> > wake_up_process() should be good enough.
>
> Well, if that's the direction maybe we should depend on the thread
> state more?  IOW pay less attention to SCHED and have
> napi_complete_done() set_current_state() if thread is running?
>
> I didn't think that through fully but you can't say "wake_up_process()
> should be good enough" and at the same time add another bit proving
> it's not enough.
>
> > > Can we pleaseadd a poll_list for the thread and make sure the
> > > thread polls based on the list?
> >
> > A list ? That would require a spinlock or something ?
>
> Does the softnet list require a spinlock?
>
> Obviously with current code the list would only ever have one napi
> instance per thread but I think it's worth the code simplicity.
> napi_complete_done() dels from the list / releases that ownership
> already.
>

I think what Jakub proposed here should work. But I have a similar
concern as Eric. I think the kthread belongs to the NAPI instance, and
the kthread only polls on that specific NAPI if threaded mode is
enabled. Adding the NAPI to a list that the kthread polls seems to be
a reverse of logic. And it is unlike the sd->poll_list, where multiple
NAPI instances could be added to that list and get polled. But
functionality-wise, it does seem it will work.

> > > IMO that's far clearer than defining a forest of ownership state
> > > bits.
> >
> > Adding a bit seems simpler than adding a list.
>
> In terms of what? LoC?
>
> Just to find out what the LoC is I sketched this out:
>
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index ddf4cfc12615..77f09ced9ee4 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -348,6 +348,7 @@ struct napi_struct {
>         struct hlist_node       napi_hash_node;
>         unsigned int            napi_id;
>         struct task_struct      *thread;
> +       struct list_head        thread_poll_list;
>  };
>
>  enum {
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 6c5967e80132..99ff083232e9 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -4294,6 +4294,8 @@ static inline void ____napi_schedule(struct softnet_data *sd,
>                  */
>                 thread = READ_ONCE(napi->thread);
>                 if (thread) {
> +                       list_add_tail(&napi->poll_list,
> +                                     &napi->thread_poll_list);
>                         wake_up_process(thread);
>                         return;
>                 }
> @@ -6777,6 +6779,7 @@ void netif_napi_add(struct net_device *dev, struct napi_struct *napi,
>                 return;
>
>         INIT_LIST_HEAD(&napi->poll_list);
> +       INIT_LIST_HEAD(&napi->thread_poll_list);
>         INIT_HLIST_NODE(&napi->napi_hash_node);
>         hrtimer_init(&napi->timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL_PINNED);
>         napi->timer.function = napi_watchdog;
> @@ -6971,8 +6974,7 @@ static int napi_thread_wait(struct napi_struct *napi)
>         set_current_state(TASK_INTERRUPTIBLE);
>
>         while (!kthread_should_stop() && !napi_disable_pending(napi)) {
> -               if (test_bit(NAPI_STATE_SCHED, &napi->state)) {
> -                       WARN_ON(!list_empty(&napi->poll_list));
> +               if (!list_emtpy(&napi->thread_poll_list)) {
>                         __set_current_state(TASK_RUNNING);
>                         return 0;
>                 }
>
> $ git diff --stat
>  include/linux/netdevice.h | 1 +
>  net/core/dev.c            | 6 ++++--
>  2 files changed, 5 insertions(+), 2 deletions(-)
>
> > > I think with just the right (wrong?) timing this patch will still
> > > not protect against disabling the NAPI.
> >
> > Maybe, but this patch is solving one issue that was easy to trigger.
> >
> > disabling the NAPI is handled already.
>
> The thread checks if NAPI is getting disabled, then time passes, then
> it checks if it's scheduled. If napi gets disabled in the "time passes"
> period thread will think that it got scheduled again.
>

Not sure if I understand it correctly, when you say "then it checks if
it's scheduled", do you mean the schedule() call in napi_thread_wait()
that re-enters this function? If so, it still checks to make sure
!napi_disable_pending(napi) before it goes to poll on the napi
instance. I think that is sufficient to make sure we don't poll on a
NAPI that is in DISABLE state?


> Sure, we can go and make special annotations in all other parts of NAPI
> infra, but isn't that an obvious sign of a bad design?
>
>
> I wanted to add that I have spent quite a bit of time hacking around
> the threaded NAPI thing before I had to maintain, and (admittedly my
> brain is not very capable but) I had a hard time getting things working
> reliably with netpoll, busy polling, disabling etc. IOW I'm not just
> claiming that "more bits" is not a good solution on a whim.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ