linux-kernel - Re: Packet gets stuck in NOLOCK pfifo

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <5f51cbad3cc2_3eceb208fc@john-XPS-13-9370.notmuch>
Date:   Thu, 03 Sep 2020 22:07:57 -0700
From:   John Fastabend <john.fastabend@...il.com>
To:     Cong Wang <xiyou.wangcong@...il.com>,
        Paolo Abeni <pabeni@...hat.com>
Cc:     Kehuan Feng <kehuan.feng@...il.com>,
        Hillf Danton <hdanton@...a.com>,
        Jike Song <albcamus@...il.com>, Josh Hunt <johunt@...mai.com>,
        Jonas Bonn <jonas.bonn@...rounds.com>,
        Michael Zhivich <mzhivich@...mai.com>,
        David Miller <davem@...emloft.net>,
        John Fastabend <john.fastabend@...il.com>,
        LKML <linux-kernel@...r.kernel.org>,
        Netdev <netdev@...r.kernel.org>
Subject: Re: Packet gets stuck in NOLOCK pfifo_fast qdisc

Cong Wang wrote:
> On Thu, Sep 3, 2020 at 1:40 AM Paolo Abeni <pabeni@...hat.com> wrote:
> >
> > On Wed, 2020-09-02 at 22:01 -0700, Cong Wang wrote:
> > > Can you test the attached one-line fix? I think we are overthinking,
> > > probably all
> > > we need here is a busy wait.
> >
> > I think that will solve, but I also think that will kill NOLOCK
> > performances due to really increased contention.
> 
> Yeah, we somehow end up with more locks (seqlock, skb array lock)
> for lockless qdisc. What an irony... ;)

I went back to the original nolock implementation code to try and figure
out how this was working in the first place.

After initial patch series we have this in __dev_xmit_skb()

	if (q->flags & TCQ_F_NOLOCK) {
		if (unlikely(test_bit(__QDISC_STATE_DEACTIVATED, &q->state))) {
			__qdisc_drop(skb, &to_free);
			rc = NET_XMIT_DROP;
		} else {
			rc = q->enqueue(skb, q, &to_free) & NET_XMIT_MASK;
			__qdisc_run(q);
		}

		if (unlikely(to_free))
			kfree_skb_list(to_free);
		return rc;
	}

One important piece here is we used __qdisc_run(q) instead of
what we have there now qdisc_run(q). Here is the latest code,


	if (q->flags & TCQ_F_NOLOCK) { 
		rc = q->enqueue(skb, q, &to_free) & NET_XMIT_MASK;
		qdisc_run(q);
		...

__qdisc_run is going to always go into a qdisc_restart loop and
dequeue packets. There is no check here to see if another CPU
is running or not. Compare that to qdisc_run()

	static inline void qdisc_run(struct Qdisc *q)
	{
		if (qdisc_run_begin(q)) {
			__qdisc_run(q);
			qdisc_run_end(q);
		}
	}

Here we have all the racing around qdisc_is_running() that seems
unsolvable.

Seems we flipped __qdisc_run to qdisc_run here 32f7b44d0f566
("sched: manipulate __QDISC_STATE_RUNNING in qdisc_run_* helpers"). 
Its not clear to me from thatpatch though why it was even done
there?

Maybe this would unlock us,

diff --git a/net/core/dev.c b/net/core/dev.c
index 7df6c9617321..9b09429103f1 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3749,7 +3749,7 @@ static inline int __dev_xmit_skb(struct sk_buff *skb, struct Qdisc *q,
 
 	if (q->flags & TCQ_F_NOLOCK) {
 		rc = q->enqueue(skb, q, &to_free) & NET_XMIT_MASK;
-		qdisc_run(q);
+		__qdisc_run(q);
 
 		if (unlikely(to_free))
 			kfree_skb_list(to_free);


Per other thread we also need the state deactivated check added
back.

> 
> >
> > At this point I fear we could consider reverting the NOLOCK stuff.
> > I personally would hate doing so, but it looks like NOLOCK benefits are
> > outweighed by its issues.
> 
> I agree, NOLOCK brings more pains than gains. There are many race
> conditions hidden in generic qdisc layer, another one is enqueue vs.
> reset which is being discussed in another thread.

Sure. Seems they crept in over time. I had some plans to write a
lockless HTB implementation. But with fq+EDT with BPF it seems that
it is no longer needed, we have a more generic/better solution.  So
I dropped it. Also most folks should really be using fq, fq_codel,
etc. by default anyways. Using pfifo_fast alone is not ideal IMO.

Thanks,
John