netdev - Re: [PATCH net-next 1/3] net: provide macros for commonly copied lockless queue stop/wake code

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1e9bbdde-df97-4319-a4b7-e426c4351317@paulmck-laptop>
Date:   Wed, 5 Apr 2023 15:20:39 -0700
From:   "Paul E. McKenney" <paulmck@...nel.org>
To:     Alexander Duyck <alexander.duyck@...il.com>
Cc:     Jakub Kicinski <kuba@...nel.org>,
        Heiner Kallweit <hkallweit1@...il.com>, davem@...emloft.net,
        netdev@...r.kernel.org, edumazet@...gle.com, pabeni@...hat.com,
        Herbert Xu <herbert@...dor.apana.org.au>
Subject: Re: [PATCH net-next 1/3] net: provide macros for commonly copied
 lockless queue stop/wake code

On Mon, Apr 03, 2023 at 01:27:44PM -0700, Alexander Duyck wrote:
> On Mon, Apr 3, 2023 at 12:03 PM Jakub Kicinski <kuba@...nel.org> wrote:
> >
> > On Mon, 3 Apr 2023 11:11:35 -0700 Alexander Duyck wrote:
> > > On Mon, Apr 3, 2023 at 8:56 AM Jakub Kicinski <kuba@...nel.org> wrote:
> > > > I don't think in terms of flushes. Let me add line numbers to the
> > > > producer and the consumer.
> > > >
> > > >  c1. WRITE cons
> > > >  c2. mb()  # A
> > > >  c3. READ stopped
> > > >  c4. rmb() # C
> > > >  c5. READ prod, cons
> > > >
> > > >  p1. WRITE prod
> > > >  p2. READ prod, cons
> > > >  p3. mb()  # B
> > > >  p4. WRITE stopped
> > > >  p5. READ prod, cons
> > > >
> > > > The way I think the mb() orders c1 and c3 vs p2 and p4. The rmb()
> > > > orders c3 and c5 vs p1 and p4. Let me impenitently add Paul..
> > >
> > > So which function is supposed to be consumer vs producer here?
> >
> > producer is xmit consumer is NAPI
> >
> > > I think your write stopped is on the wrong side of the memory barrier.
> > > It should be writing prod and stopped both before the barrier.
> >
> > Indeed, Paul pointed out over chat that we need two barriers there
> > to be correct :( Should be fine in practice, first one is BQL,
> > second one is on the slow path.
> >
> > > The maybe/try stop should essentially be:
> > > 1. write tail
> > > 2. read prod/cons
> > > 3. if unused >= 1x packet
> > > 3.a return
> > >
> > > 4. set stop
> > > 5. mb()
> > > 6. Re-read prod/cons
> > > 7. if unused >= 1x packet
> > > 7.a. test_and_clear stop
> > >
> > > The maybe/try wake would be:
> > > 1. write head
> > > 2. read prod/cons
> > > 3. if consumed == 0 || unused < 2x packet
> > > 3.a. return
> > >
> > > 4. mb()
> > > 5. test_and_clear stop
> > >
> > > > > One other thing to keep in mind is that the wake gives itself a pretty
> > > > > good runway. We are talking about enough to transmit at least 2
> > > > > frames. So if another consumer is stopping it we aren't waking it
> > > > > unless there is enough space for yet another frame after the current
> > > > > consumer.
> > > >
> > > > Ack, the race is very unlikely, basically the completing CPU would have
> > > > to take an expensive IRQ between checking the descriptor count and
> > > > checking if stopped -- to let the sending CPU queue multiple frames.
> > > >
> > > > But in theory the race is there, right?
> > >
> > > I don't think this is so much a race as a skid. Specifically when we
> > > wake the queue it will only run for one more packet in such a
> > > scenario. I think it is being run more like a flow control threshold
> > > rather than some sort of lock.
> > >
> > > I think I see what you are getting at though. Basically if the xmit
> > > function were to cycle several times between steps 3.a and 4 in the
> > > maybe/try wake it could fill the queue and then trigger the wake even
> > > though the queue is full and the unused space was already consumed.
> >
> > Yup, exactly. So we either need to sprinkle a couple more barriers
> > and tests in, or document that the code is only 99.999999% safe
> > against false positive restarts and drivers need to check for ring
> > full at the beginning of xmit.
> >
> > I'm quite tempted to add the barriers, because on the NAPI/consumer
> > side we could use this as an opportunity to start piggy backing on
> > the BQL barrier.
> 
> The thing is the more barriers we add the more it will hurt
> performance. I'd be tempted to just increase the runway we have as we
> could afford a 1 packet skid if we had a 2 packet runway for the
> start/stop thresholds.
> 
> I suspect that is probably why we haven't seen any issues as the
> DESC_NEEDED is pretty generous since it is assuming worst case
> scenarios.

Mightn't preemption or interrupts cause further issues?  Or are preemption
and/or interrupts disabled across the relevant sections of code?

							Thanx, Paul