[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <64b3c3dc-e36d-45b6-4b3a-45e3d26e8315@huawei.com>
Date: Sat, 28 May 2022 08:51:22 +0800
From: Yunsheng Lin <linyunsheng@...wei.com>
To: Guoju Fang <gjfang@...ux.alibaba.com>
CC: <davem@...emloft.net>, <edumazet@...gle.com>,
<eric.dumazet@...il.com>, <guoju.fgj@...baba-inc.com>,
<kuba@...nel.org>, <netdev@...r.kernel.org>, <pabeni@...hat.com>,
<rgauguey@...rayinc.com>, <sjones@...rayinc.com>,
<vladimir.oltean@....com>, <vray@...rayinc.com>, <will@...nel.org>
Subject: Re: [PATCH v3 net] net: sched: add barrier to fix packet stuck
problem for lockless qdisc
On 2022/5/27 17:11, Guoju Fang wrote:
> In qdisc_run_end(), the spin_unlock() only has store-release semantic,
> which guarantees all earlier memory access are visible before it. But
> the subsequent test_bit() may be reordered ahead of the spin_unlock(),
> and may cause a packet stuck problem.
>
> The concurrent operations can be described as below,
> CPU 0 | CPU 1
> qdisc_run_end() | qdisc_run_begin()
> . | .
> ----> /* may be reorderd here */ | .
> | . | .
> | spin_unlock() | set_bit()
> | . | smp_mb__after_atomic()
> ---- test_bit() | spin_trylock()
> . | .
>
> Consider the following sequence of events:
> CPU 0 reorder test_bit() ahead and see MISSED = 0
> CPU 1 calls set_bit()
> CPU 1 calls spin_trylock() and return fail
> CPU 0 executes spin_unlock()
>
> At the end of the sequence, CPU 0 calls spin_unlock() and does nothing
> because it see MISSED = 0. The skb on CPU 1 has beed enqueued but no one
> take it, until the next cpu pushing to the qdisc (if ever ...) will
> notice and dequeue it.
>
> So one explicit barrier is needed between spin_unlock() and test_bit()
> to ensure the correct order.
It might be better to mention why smp_mb() is used instead of smp_rmb()
or smp_wmb():
spin_unlock() and test_bit() ordering is a store-load ordering, which
requires a full memory barrier as smp_mb().
>
> Fixes: 89837eb4b246 ("net: sched: add barrier to ensure correct ordering for lockless qdisc")
The Fixes tag should be:
Fixes: a90c57f2cedd ("net: sched: fix packet stuck problem for lockless qdisc")
Other than that, it looks good to me:
Reviewed-by: Yunsheng Lin <linyunsheng@...wei.com>
> Signed-off-by: Guoju Fang <gjfang@...ux.alibaba.com>
> ---
> V2 -> V3: Not split the Fixes tag across multiple lines
> V1 -> V2: Rewrite comments
> ---
> include/net/sch_generic.h | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
> index 9bab396c1f3b..8a8738642ca0 100644
> --- a/include/net/sch_generic.h
> +++ b/include/net/sch_generic.h
> @@ -229,6 +229,9 @@ static inline void qdisc_run_end(struct Qdisc *qdisc)
> if (qdisc->flags & TCQ_F_NOLOCK) {
> spin_unlock(&qdisc->seqlock);
>
> + /* ensure ordering between spin_unlock() and test_bit() */
> + smp_mb();
> +
> if (unlikely(test_bit(__QDISC_STATE_MISSED,
> &qdisc->state)))
> __netif_schedule(qdisc);
>
Powered by blists - more mailing lists