[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230403120345.0c02232c@kernel.org>
Date: Mon, 3 Apr 2023 12:03:45 -0700
From: Jakub Kicinski <kuba@...nel.org>
To: Alexander Duyck <alexander.duyck@...il.com>
Cc: Heiner Kallweit <hkallweit1@...il.com>, davem@...emloft.net,
netdev@...r.kernel.org, edumazet@...gle.com, pabeni@...hat.com,
Herbert Xu <herbert@...dor.apana.org.au>,
"Paul E. McKenney" <paulmck@...nel.org>
Subject: Re: [PATCH net-next 1/3] net: provide macros for commonly copied
lockless queue stop/wake code
On Mon, 3 Apr 2023 11:11:35 -0700 Alexander Duyck wrote:
> On Mon, Apr 3, 2023 at 8:56 AM Jakub Kicinski <kuba@...nel.org> wrote:
> > I don't think in terms of flushes. Let me add line numbers to the
> > producer and the consumer.
> >
> > c1. WRITE cons
> > c2. mb() # A
> > c3. READ stopped
> > c4. rmb() # C
> > c5. READ prod, cons
> >
> > p1. WRITE prod
> > p2. READ prod, cons
> > p3. mb() # B
> > p4. WRITE stopped
> > p5. READ prod, cons
> >
> > The way I think the mb() orders c1 and c3 vs p2 and p4. The rmb()
> > orders c3 and c5 vs p1 and p4. Let me impenitently add Paul..
>
> So which function is supposed to be consumer vs producer here?
producer is xmit consumer is NAPI
> I think your write stopped is on the wrong side of the memory barrier.
> It should be writing prod and stopped both before the barrier.
Indeed, Paul pointed out over chat that we need two barriers there
to be correct :( Should be fine in practice, first one is BQL,
second one is on the slow path.
> The maybe/try stop should essentially be:
> 1. write tail
> 2. read prod/cons
> 3. if unused >= 1x packet
> 3.a return
>
> 4. set stop
> 5. mb()
> 6. Re-read prod/cons
> 7. if unused >= 1x packet
> 7.a. test_and_clear stop
>
> The maybe/try wake would be:
> 1. write head
> 2. read prod/cons
> 3. if consumed == 0 || unused < 2x packet
> 3.a. return
>
> 4. mb()
> 5. test_and_clear stop
>
> > > One other thing to keep in mind is that the wake gives itself a pretty
> > > good runway. We are talking about enough to transmit at least 2
> > > frames. So if another consumer is stopping it we aren't waking it
> > > unless there is enough space for yet another frame after the current
> > > consumer.
> >
> > Ack, the race is very unlikely, basically the completing CPU would have
> > to take an expensive IRQ between checking the descriptor count and
> > checking if stopped -- to let the sending CPU queue multiple frames.
> >
> > But in theory the race is there, right?
>
> I don't think this is so much a race as a skid. Specifically when we
> wake the queue it will only run for one more packet in such a
> scenario. I think it is being run more like a flow control threshold
> rather than some sort of lock.
>
> I think I see what you are getting at though. Basically if the xmit
> function were to cycle several times between steps 3.a and 4 in the
> maybe/try wake it could fill the queue and then trigger the wake even
> though the queue is full and the unused space was already consumed.
Yup, exactly. So we either need to sprinkle a couple more barriers
and tests in, or document that the code is only 99.999999% safe
against false positive restarts and drivers need to check for ring
full at the beginning of xmit.
I'm quite tempted to add the barriers, because on the NAPI/consumer
side we could use this as an opportunity to start piggy backing on
the BQL barrier.
Powered by blists - more mailing lists