netdev - Re: [PATCH net-next 1/3] net: provide macros for commonly copied lockless queue stop/wake code

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20230403085601.44f04cd2@kernel.org>
Date:   Mon, 3 Apr 2023 08:56:01 -0700
From:   Jakub Kicinski <kuba@...nel.org>
To:     Alexander Duyck <alexander.duyck@...il.com>
Cc:     Heiner Kallweit <hkallweit1@...il.com>, davem@...emloft.net,
        netdev@...r.kernel.org, edumazet@...gle.com, pabeni@...hat.com,
        Herbert Xu <herbert@...dor.apana.org.au>,
        "Paul E. McKenney" <paulmck@...nel.org>
Subject: Re: [PATCH net-next 1/3] net: provide macros for commonly copied
 lockless queue stop/wake code

On Mon, 3 Apr 2023 08:18:04 -0700 Alexander Duyck wrote:
> On Sat, Apr 1, 2023 at 11:58 AM Jakub Kicinski <kuba@...nel.org> wrote:
> > > One more question: Don't we need a read memory barrier here to ensure
> > > get_desc is up-to-date?  
> >
> > CC: Alex, maybe I should not be posting after 10pm, with the missing v2
> > and sparse CC list.. :|
> >
> > I was thinking about this too yesterday. AFAICT this implementation
> > could indeed result in waking even tho the queue is full on non-x86.
> > That's why the drivers have an extra check at the start of .xmit? :(  
> 
> The extra check at the start is more historical than anything else.
> Logic like that has been there since the e1000 days. I think it
> addressed items like pktgen which I think didn't make use of the
> stop/wake flags way back when. I'll add in Herbet who was the original
> author for this code so he can add some additional history if needed.

Thanks for the pointer, you weren't kidding with the 2.6.19, that seems
to be when to code was added to e1000 :) Looks fairly similar to the
current code minus the BQL.

> > I *think* that the right ordering would be:
> >
> > c1. WRITE cons
> > c2. mb()  # A
> > c3. READ stopped
> > c4. rmb() # C
> > c5. READ prod, cons  
> 
> What would the extra rmb() get you? The mb() will have already flushed
> out any writes and if stopped is set the tail should have already been
> written before setting it.

I don't think in terms of flushes. Let me add line numbers to the
producer and the consumer.

 c1. WRITE cons
 c2. mb()  # A
 c3. READ stopped
 c4. rmb() # C
 c5. READ prod, cons  

 p1. WRITE prod
 p2. READ prod, cons
 p3. mb()  # B
 p4. WRITE stopped
 p5. READ prod, cons

The way I think the mb() orders c1 and c3 vs p2 and p4. The rmb()
orders c3 and c5 vs p1 and p4. Let me impenitently add Paul..

> One other thing to keep in mind is that the wake gives itself a pretty
> good runway. We are talking about enough to transmit at least 2
> frames. So if another consumer is stopping it we aren't waking it
> unless there is enough space for yet another frame after the current
> consumer.

Ack, the race is very unlikely, basically the completing CPU would have
to take an expensive IRQ between checking the descriptor count and
checking if stopped -- to let the sending CPU queue multiple frames.

But in theory the race is there, right?

> > And on the producer side (existing):
> >
> > p1. WRITE prod
> > p2. READ prod, cons
> > p3. mb()  # B
> > p4. WRITE stopped
> > p5. READ prod, cons
> >
> > But I'm slightly afraid to change it, it's been working for over
> > a decade :D  
> 
> I wouldn't change it. The code has predated BQL in the e1000 driver
> and has been that way since the inception of it I believe in 2.6.19.
> 
> > One neat thing that I noticed, which we could potentially exploit
> > if we were to touch this code is that BQL already has a smp_mb()
> > on the consumer side. So on any kernel config and driver which support
> > BQL we can use that instead of adding another barrier at #A.
> >
> > It would actually be a neat optimization because right now, AFAICT,
> > completion will fire the # A -like barrier almost every time.  
> 
> Yeah, the fact is the barrier in the wake path may actually be
> redundant if BQL is enabled. My advice is if you are wanting to get a
> better idea of how this was setup you could take a look at the e1000
> driver in the 2.6.19 kernel as that was where this code originated and
> I am pretty certain it predates anything in any of the other Intel
> drivers other than maybe e100.