[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210224133032.4227a60c@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com>
Date: Wed, 24 Feb 2021 13:30:32 -0800
From: Jakub Kicinski <kuba@...nel.org>
To: Eric Dumazet <edumazet@...gle.com>
Cc: Wei Wang <weiwan@...gle.com>,
"David S . Miller" <davem@...emloft.net>,
netdev <netdev@...r.kernel.org>, Paolo Abeni <pabeni@...hat.com>,
Hannes Frederic Sowa <hannes@...essinduktion.org>,
Alexander Duyck <alexanderduyck@...com>,
Martin Zaharinov <micron10@...il.com>
Subject: Re: [PATCH net] net: fix race between napi kthread mode and busy
poll
On Wed, 24 Feb 2021 21:37:36 +0100 Eric Dumazet wrote:
> On Wed, Feb 24, 2021 at 8:48 PM Jakub Kicinski <kuba@...nel.org> wrote:
> > On Tue, 23 Feb 2021 15:41:30 -0800 Wei Wang wrote:
> > > Currently, napi_thread_wait() checks for NAPI_STATE_SCHED bit to
> > > determine if the kthread owns this napi and could call napi->poll() on
> > > it. However, if socket busy poll is enabled, it is possible that the
> > > busy poll thread grabs this SCHED bit (after the previous napi->poll()
> > > invokes napi_complete_done() and clears SCHED bit) and tries to poll
> > > on the same napi.
> > > This patch tries to fix this race by adding a new bit
> > > NAPI_STATE_SCHED_BUSY_POLL in napi->state. This bit gets set in
> > > napi_busy_loop() togther with NAPI_STATE_SCHED, and gets cleared in
> > > napi_complete_done() together with NAPI_STATE_SCHED. This helps
> > > distinguish the ownership of the napi between kthread and the busy poll
> > > thread, and prevents the kthread from polling on the napi when this napi
> > > is still owned by the busy poll thread.
> > >
> > > Fixes: 29863d41bb6e ("net: implement threaded-able napi poll loop support")
> > > Reported-by: Martin Zaharinov <micron10@...il.com>
> > > Suggested-by: Alexander Duyck <alexanderduyck@...com>
> > > Reviewed-by: Alexander Duyck <alexanderduyck@...com>
> > > Reviewed-by: Eric Dumazet <edumazet@...gle.come>
> >
> > AFAIU sched bit controls the ownership of the poll_list
>
> I disagree. BUSY POLL never inserted the napi into a list,
> because the user thread was polling one napi.
>
> Same for the kthread.
There is no delayed execution in busy_poll. It either got the sched bit
and it knows it, or it didn't.
> wake_up_process() should be good enough.
Well, if that's the direction maybe we should depend on the thread
state more? IOW pay less attention to SCHED and have
napi_complete_done() set_current_state() if thread is running?
I didn't think that through fully but you can't say "wake_up_process()
should be good enough" and at the same time add another bit proving
it's not enough.
> > Can we pleaseadd a poll_list for the thread and make sure the
> > thread polls based on the list?
>
> A list ? That would require a spinlock or something ?
Does the softnet list require a spinlock?
Obviously with current code the list would only ever have one napi
instance per thread but I think it's worth the code simplicity.
napi_complete_done() dels from the list / releases that ownership
already.
> > IMO that's far clearer than defining a forest of ownership state
> > bits.
>
> Adding a bit seems simpler than adding a list.
In terms of what? LoC?
Just to find out what the LoC is I sketched this out:
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index ddf4cfc12615..77f09ced9ee4 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -348,6 +348,7 @@ struct napi_struct {
struct hlist_node napi_hash_node;
unsigned int napi_id;
struct task_struct *thread;
+ struct list_head thread_poll_list;
};
enum {
diff --git a/net/core/dev.c b/net/core/dev.c
index 6c5967e80132..99ff083232e9 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4294,6 +4294,8 @@ static inline void ____napi_schedule(struct softnet_data *sd,
*/
thread = READ_ONCE(napi->thread);
if (thread) {
+ list_add_tail(&napi->poll_list,
+ &napi->thread_poll_list);
wake_up_process(thread);
return;
}
@@ -6777,6 +6779,7 @@ void netif_napi_add(struct net_device *dev, struct napi_struct *napi,
return;
INIT_LIST_HEAD(&napi->poll_list);
+ INIT_LIST_HEAD(&napi->thread_poll_list);
INIT_HLIST_NODE(&napi->napi_hash_node);
hrtimer_init(&napi->timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL_PINNED);
napi->timer.function = napi_watchdog;
@@ -6971,8 +6974,7 @@ static int napi_thread_wait(struct napi_struct *napi)
set_current_state(TASK_INTERRUPTIBLE);
while (!kthread_should_stop() && !napi_disable_pending(napi)) {
- if (test_bit(NAPI_STATE_SCHED, &napi->state)) {
- WARN_ON(!list_empty(&napi->poll_list));
+ if (!list_emtpy(&napi->thread_poll_list)) {
__set_current_state(TASK_RUNNING);
return 0;
}
$ git diff --stat
include/linux/netdevice.h | 1 +
net/core/dev.c | 6 ++++--
2 files changed, 5 insertions(+), 2 deletions(-)
> > I think with just the right (wrong?) timing this patch will still
> > not protect against disabling the NAPI.
>
> Maybe, but this patch is solving one issue that was easy to trigger.
>
> disabling the NAPI is handled already.
The thread checks if NAPI is getting disabled, then time passes, then
it checks if it's scheduled. If napi gets disabled in the "time passes"
period thread will think that it got scheduled again.
Sure, we can go and make special annotations in all other parts of NAPI
infra, but isn't that an obvious sign of a bad design?
I wanted to add that I have spent quite a bit of time hacking around
the threaded NAPI thing before I had to maintain, and (admittedly my
brain is not very capable but) I had a hard time getting things working
reliably with netpoll, busy polling, disabling etc. IOW I'm not just
claiming that "more bits" is not a good solution on a whim.
Powered by blists - more mailing lists