[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20210105122328.3e5569a4@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com>
Date: Tue, 5 Jan 2021 12:23:28 -0800
From: Jakub Kicinski <kuba@...nel.org>
To: Alex Elder <elder@...aro.org>
Cc: David Miller <davem@...emloft.net>,
Network Development <netdev@...r.kernel.org>,
Eric Dumazet <edumazet@...gle.com>
Subject: Re: Missed schedule_napi()?
On Mon, 4 Jan 2021 10:46:09 -0600 Alex Elder wrote:
> I have a question about whether it's possible to effectively
> miss a schedule_napi() call when a disable_napi() is underway.
>
> I'm going to try to represent the code in question here
> in an interleaved way to explain the scenario; I hope
> it's clear.
>
> Suppose the SCHED flag is clear. And suppose two
> concurrent threads do things in the sequence below.
>
> Disabling thread | Scheduling thread
> ------------------------+----------------------
> void napi_disable(struct napi_struct *n)
> { | bool napi_schedule_prep(struct napi_struct *n)
> might_sleep(); | {
> | unsigned long val, new;
> |
> | do {
> set_bit(NAPI_STATE_DISABLE, &n->state);
> | val = READ_ONCE(n->state);
> | if (unlikely(val & NAPIF_STATE_DISABLE))
> | return false;
> | . . .
> while (test_and_set_bit(NAPI_STATE_SCHED, &n->state))
> msleep(1); |
> . . . |
>
> We start with the SCHED bit clear. The disabling thread
> sets the DISABLE bit as it begins. The scheduling thread
> checks the state and finds that it is disabled, so it
> simply returns false, and the napi_schedule() caller will
> *not* call __napi_schedule().
>
> But even though NAPI is getting disabled, the scheduling thread
> wants it recorded that a NAPI poll should be scheduled, even
> if it happens later. In other words, it seems like this
> case is essentially a MISSED schedule.
>
> The disabling thread sets the SCHED bit, having found it was
> not set previously, and thereby disables NAPI processing until
> it is re-enabled.
>
> Later, napi_enable() will clear the SCHED bit, allowing NAPI
> processing to continue, but there is no record that the
> scheduling thread indicated that a poll was needed,
>
> Am I misunderstanding this? If so, can someone please explain?
> It seems to me that the napi_schedule() call is "lost".
AFAICT your analysis is correct. At the same time the NAPI API does
not (to the best of my knowledge) give any guarantees about NAPI
invocations matching the number of __napi_schedule() calls.
The expectation is that the communication channel will be "reset"
after the napi_disable() call, processing or dropping all the events
which were outstanding after napi_disable().
Powered by blists - more mailing lists