netdev - Re: [PATCH net-next 1/3] net: provide macros for commonly copied lockless queue stop/wake code

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAKgT0UcsOwspt0TEashpWZ2_gFDR878NskBhquhEyCaN=uYnDQ@mail.gmail.com>
Date:   Mon, 3 Apr 2023 11:11:35 -0700
From:   Alexander Duyck <alexander.duyck@...il.com>
To:     Jakub Kicinski <kuba@...nel.org>
Cc:     Heiner Kallweit <hkallweit1@...il.com>, davem@...emloft.net,
        netdev@...r.kernel.org, edumazet@...gle.com, pabeni@...hat.com,
        Herbert Xu <herbert@...dor.apana.org.au>,
        "Paul E. McKenney" <paulmck@...nel.org>
Subject: Re: [PATCH net-next 1/3] net: provide macros for commonly copied
 lockless queue stop/wake code

On Mon, Apr 3, 2023 at 8:56 AM Jakub Kicinski <kuba@...nel.org> wrote:
>
> On Mon, 3 Apr 2023 08:18:04 -0700 Alexander Duyck wrote:
> > On Sat, Apr 1, 2023 at 11:58 AM Jakub Kicinski <kuba@...nel.org> wrote:
> > > > One more question: Don't we need a read memory barrier here to ensure
> > > > get_desc is up-to-date?
> > >
> > > CC: Alex, maybe I should not be posting after 10pm, with the missing v2
> > > and sparse CC list.. :|
> > >
> > > I was thinking about this too yesterday. AFAICT this implementation
> > > could indeed result in waking even tho the queue is full on non-x86.
> > > That's why the drivers have an extra check at the start of .xmit? :(
> >
> > The extra check at the start is more historical than anything else.
> > Logic like that has been there since the e1000 days. I think it
> > addressed items like pktgen which I think didn't make use of the
> > stop/wake flags way back when. I'll add in Herbet who was the original
> > author for this code so he can add some additional history if needed.
>
> Thanks for the pointer, you weren't kidding with the 2.6.19, that seems
> to be when to code was added to e1000 :) Looks fairly similar to the
> current code minus the BQL.
>
> > > I *think* that the right ordering would be:
> > >
> > > c1. WRITE cons
> > > c2. mb()  # A
> > > c3. READ stopped
> > > c4. rmb() # C
> > > c5. READ prod, cons
> >
> > What would the extra rmb() get you? The mb() will have already flushed
> > out any writes and if stopped is set the tail should have already been
> > written before setting it.
>
> I don't think in terms of flushes. Let me add line numbers to the
> producer and the consumer.
>
>  c1. WRITE cons
>  c2. mb()  # A
>  c3. READ stopped
>  c4. rmb() # C
>  c5. READ prod, cons
>
>  p1. WRITE prod
>  p2. READ prod, cons
>  p3. mb()  # B
>  p4. WRITE stopped
>  p5. READ prod, cons
>
> The way I think the mb() orders c1 and c3 vs p2 and p4. The rmb()
> orders c3 and c5 vs p1 and p4. Let me impenitently add Paul..

So which function is supposed to be consumer vs producer here? I think
your write stopped is on the wrong side of the memory barrier. It
should be writing prod and stopped both before the barrier.

The maybe/try stop should essentially be:
1. write tail
2. read prod/cons
3. if unused >= 1x packet
3.a return

4. set stop
5. mb()
6. Re-read prod/cons
7. if unused >= 1x packet
7.a. test_and_clear stop

The maybe/try wake would be:
1. write head
2. read prod/cons
3. if consumed == 0 || unused < 2x packet
3.a. return

4. mb()
5. test_and_clear stop

> > One other thing to keep in mind is that the wake gives itself a pretty
> > good runway. We are talking about enough to transmit at least 2
> > frames. So if another consumer is stopping it we aren't waking it
> > unless there is enough space for yet another frame after the current
> > consumer.
>
> Ack, the race is very unlikely, basically the completing CPU would have
> to take an expensive IRQ between checking the descriptor count and
> checking if stopped -- to let the sending CPU queue multiple frames.
>
> But in theory the race is there, right?

I don't think this is so much a race as a skid. Specifically when we
wake the queue it will only run for one more packet in such a
scenario. I think it is being run more like a flow control threshold
rather than some sort of lock.

I think I see what you are getting at though. Basically if the xmit
function were to cycle several times between steps 3.a and 4 in the
maybe/try wake it could fill the queue and then trigger the wake even
though the queue is full and the unused space was already consumed.