netdev - Re: [PATCH net-next 1/3] net: provide macros for commonly copied lockless queue stop/wake code

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAKgT0UeDy6B0QJt126tykUfu+cB2VK0YOoMOYcL1JQFmxtgG0A@mail.gmail.com>
Date:   Mon, 3 Apr 2023 08:18:04 -0700
From:   Alexander Duyck <alexander.duyck@...il.com>
To:     Jakub Kicinski <kuba@...nel.org>
Cc:     Heiner Kallweit <hkallweit1@...il.com>, davem@...emloft.net,
        netdev@...r.kernel.org, edumazet@...gle.com, pabeni@...hat.com,
        Herbert Xu <herbert@...dor.apana.org.au>
Subject: Re: [PATCH net-next 1/3] net: provide macros for commonly copied
 lockless queue stop/wake code

On Sat, Apr 1, 2023 at 11:58 AM Jakub Kicinski <kuba@...nel.org> wrote:
>
> On Sat, 1 Apr 2023 17:18:12 +0200 Heiner Kallweit wrote:
> > > +#define __netif_tx_queue_maybe_wake(txq, get_desc, start_thrs, down_cond) \
> > > +   ({                                                              \
> > > +           int _res;                                               \
> > > +                                                                   \
> > > +           _res = -1;                                              \
> >
> > One more question: Don't we need a read memory barrier here to ensure
> > get_desc is up-to-date?
>
> CC: Alex, maybe I should not be posting after 10pm, with the missing v2
> and sparse CC list.. :|
>
> I was thinking about this too yesterday. AFAICT this implementation
> could indeed result in waking even tho the queue is full on non-x86.
> That's why the drivers have an extra check at the start of .xmit? :(

The extra check at the start is more historical than anything else.
Logic like that has been there since the e1000 days. I think it
addressed items like pktgen which I think didn't make use of the
stop/wake flags way back when. I'll add in Herbet who was the original
author for this code so he can add some additional history if needed.

> I *think* that the right ordering would be:
>
> WRITE cons
> mb()  # A
> READ stopped
> rmb() # C
> READ prod, cons

What would the extra rmb() get you? The mb() will have already flushed
out any writes and if stopped is set the tail should have already been
written before setting it.

One other thing to keep in mind is that the wake gives itself a pretty
good runway. We are talking about enough to transmit at least 2
frames. So if another consumer is stopping it we aren't waking it
unless there is enough space for yet another frame after the current
consumer.

> And on the producer side (existing):
>
> WRITE prod
> READ prod, cons
> mb()  # B
> WRITE stopped
> READ prod, cons
>
> But I'm slightly afraid to change it, it's been working for over
> a decade :D

I wouldn't change it. The code has predated BQL in the e1000 driver
and has been that way since the inception of it I believe in 2.6.19.

> One neat thing that I noticed, which we could potentially exploit
> if we were to touch this code is that BQL already has a smp_mb()
> on the consumer side. So on any kernel config and driver which support
> BQL we can use that instead of adding another barrier at #A.
>
> It would actually be a neat optimization because right now, AFAICT,
> completion will fire the # A -like barrier almost every time.

Yeah, the fact is the barrier in the wake path may actually be
redundant if BQL is enabled. My advice is if you are wanting to get a
better idea of how this was setup you could take a look at the e1000
driver in the 2.6.19 kernel as that was where this code originated and
I am pretty certain it predates anything in any of the other Intel
drivers other than maybe e100.