[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAM_iQpXU_jAbE4TC3ezZbfbYmYeViVgea+xyqPAxc5XTL9+cVw@mail.gmail.com>
Date: Fri, 19 Mar 2021 12:40:53 -0700
From: Cong Wang <xiyou.wangcong@...il.com>
To: Yunsheng Lin <linyunsheng@...wei.com>
Cc: David Miller <davem@...emloft.net>,
Jakub Kicinski <kuba@...nel.org>,
Vladimir Oltean <olteanv@...il.com>,
Alexei Starovoitov <ast@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>,
Andrii Nakryiko <andriin@...com>,
Eric Dumazet <edumazet@...gle.com>,
Wei Wang <weiwan@...gle.com>,
"Cong Wang ." <cong.wang@...edance.com>,
Taehee Yoo <ap420073@...il.com>,
Linux Kernel Network Developers <netdev@...r.kernel.org>,
LKML <linux-kernel@...r.kernel.org>, linuxarm@...neuler.org,
Marc Kleine-Budde <mkl@...gutronix.de>,
linux-can@...r.kernel.org, Jamal Hadi Salim <jhs@...atatu.com>,
Jiri Pirko <jiri@...nulli.us>,
Andrii Nakryiko <andrii@...nel.org>,
Martin KaFai Lau <kafai@...com>,
Song Liu <songliubraving@...com>, Yonghong Song <yhs@...com>,
John Fastabend <john.fastabend@...il.com>, kpsingh@...nel.org,
bpf <bpf@...r.kernel.org>, Jonas Bonn <jonas.bonn@...rounds.com>,
Paolo Abeni <pabeni@...hat.com>,
Michael Zhivich <mzhivich@...mai.com>,
Josh Hunt <johunt@...mai.com>, Jike Song <albcamus@...il.com>,
Kehuan Feng <kehuan.feng@...il.com>
Subject: Re: [PATCH net] net: sched: fix packet stuck problem for lockless qdisc
On Wed, Mar 17, 2021 at 11:52 PM Yunsheng Lin <linyunsheng@...wei.com> wrote:
>
> Lockless qdisc has below concurrent problem:
> cpu0 cpu1
> . .
> q->enqueue .
> . .
> qdisc_run_begin() .
> . .
> dequeue_skb() .
> . .
> sch_direct_xmit() .
> . .
> . q->enqueue
> . qdisc_run_begin()
> . return and do nothing
> . .
> qdisc_run_end() .
>
> cpu1 enqueue a skb without calling __qdisc_run() because cpu0
> has not released the lock yet and spin_trylock() return false
> for cpu1 in qdisc_run_begin(), and cpu0 do not see the skb
> enqueued by cpu1 when calling dequeue_skb() because cpu1 may
> enqueue the skb after cpu0 calling dequeue_skb() and before
> cpu0 calling qdisc_run_end().
>
> Lockless qdisc has another concurrent problem when tx_action
> is involved:
>
> cpu0(serving tx_action) cpu1 cpu2
> . . .
> . q->enqueue .
> . qdisc_run_begin() .
> . dequeue_skb() .
> . . q->enqueue
> . . .
> . sch_direct_xmit() .
> . . qdisc_run_begin()
> . . return and do nothing
> . . .
> clear __QDISC_STATE_SCHED . .
> qdisc_run_begin() . .
> return and do nothing . .
> . . .
> . qdisc_run_begin() .
>
> This patch fixes the above data race by:
> 1. Set a flag after spin_trylock() return false.
> 2. Retry a spin_trylock() in case other CPU may not see the
> new flag after it releases the lock.
> 3. reschedule if the flag is set after the lock is released
> at the end of qdisc_run_end().
>
> For tx_action case, the flags is also set when cpu1 is at the
> end if qdisc_run_begin(), so tx_action will be rescheduled
> again to dequeue the skb enqueued by cpu2.
>
> Also clear the flag before dequeuing in order to reduce the
> overhead of the above process, and aviod doing the heavy
> test_and_clear_bit() at the end of qdisc_run_end().
>
> Fixes: 6b3ba9146fe6 ("net: sched: allow qdiscs to handle locking")
> Signed-off-by: Yunsheng Lin <linyunsheng@...wei.com>
> ---
> For those who has not been following the qdsic scheduling
> discussion, there is packet stuck problem for lockless qdisc,
> see [1], and I has done some cleanup and added some enhanced
> features too, see [2] [3].
> While I was doing the optimization for lockless qdisc, it
> accurred to me that these optimization is useless if there is
> still basic bug in lockless qdisc, even the bug is not easily
> reproducible. So look through [1] again, I found that the data
> race for tx action mentioned by Michael, and thought deep about
> it and came up with this patch trying to fix it.
>
> So I am really appreciated some who still has the reproducer
> can try this patch and report back.
>
> 1. https://lore.kernel.org/netdev/d102074f-7489-e35a-98cf-e2cad7efd8a2@netrounds.com/t/#ma7013a79b8c4d8e7c49015c724e481e6d5325b32
> 2. https://patchwork.kernel.org/project/netdevbpf/patch/1615777818-13969-1-git-send-email-linyunsheng@huawei.com/
> 3. https://patchwork.kernel.org/project/netdevbpf/patch/1615800610-34700-1-git-send-email-linyunsheng@huawei.com/
> ---
> include/net/sch_generic.h | 23 ++++++++++++++++++++---
> net/sched/sch_generic.c | 1 +
> 2 files changed, 21 insertions(+), 3 deletions(-)
>
> diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
> index f7a6e14..4220eab 100644
> --- a/include/net/sch_generic.h
> +++ b/include/net/sch_generic.h
> @@ -36,6 +36,7 @@ struct qdisc_rate_table {
> enum qdisc_state_t {
> __QDISC_STATE_SCHED,
> __QDISC_STATE_DEACTIVATED,
> + __QDISC_STATE_NEED_RESCHEDULE,
> };
>
> struct qdisc_size_table {
> @@ -159,8 +160,17 @@ static inline bool qdisc_is_empty(const struct Qdisc *qdisc)
> static inline bool qdisc_run_begin(struct Qdisc *qdisc)
> {
> if (qdisc->flags & TCQ_F_NOLOCK) {
> - if (!spin_trylock(&qdisc->seqlock))
> - return false;
> + if (!spin_trylock(&qdisc->seqlock)) {
> + set_bit(__QDISC_STATE_NEED_RESCHEDULE,
> + &qdisc->state);
Why do we need another bit? I mean why not just call __netif_schedule()?
> +
> + /* Retry again in case other CPU may not see the
> + * new flags after it releases the lock at the
> + * end of qdisc_run_end().
> + */
> + if (!spin_trylock(&qdisc->seqlock))
> + return false;
> + }
> WRITE_ONCE(qdisc->empty, false);
> } else if (qdisc_is_running(qdisc)) {
> return false;
> @@ -176,8 +186,15 @@ static inline bool qdisc_run_begin(struct Qdisc *qdisc)
> static inline void qdisc_run_end(struct Qdisc *qdisc)
> {
> write_seqcount_end(&qdisc->running);
> - if (qdisc->flags & TCQ_F_NOLOCK)
> + if (qdisc->flags & TCQ_F_NOLOCK) {
> spin_unlock(&qdisc->seqlock);
> +
> + if (unlikely(test_bit(__QDISC_STATE_NEED_RESCHEDULE,
> + &qdisc->state) &&
> + !test_bit(__QDISC_STATE_DEACTIVATED,
> + &qdisc->state)))
Testing two bits one by one is not atomic...
> + __netif_schedule(qdisc);
> + }
> }
>
> static inline bool qdisc_may_bulk(const struct Qdisc *qdisc)
> diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
> index 44991ea..25d75d8 100644
> --- a/net/sched/sch_generic.c
> +++ b/net/sched/sch_generic.c
> @@ -205,6 +205,7 @@ static struct sk_buff *dequeue_skb(struct Qdisc *q, bool *validate,
> const struct netdev_queue *txq = q->dev_queue;
> struct sk_buff *skb = NULL;
>
> + clear_bit(__QDISC_STATE_NEED_RESCHEDULE, &q->state);
> *packets = 1;
> if (unlikely(!skb_queue_empty(&q->gso_skb))) {
> spinlock_t *lock = NULL;
> --
> 2.7.4
>
Powered by blists - more mailing lists