lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKgT0Ue-hEycSyYvVJt0L5Z=373MyNPbgPjFZMA5j2v0hWg0zg@mail.gmail.com>
Date:   Mon, 3 Apr 2023 13:27:44 -0700
From:   Alexander Duyck <alexander.duyck@...il.com>
To:     Jakub Kicinski <kuba@...nel.org>
Cc:     Heiner Kallweit <hkallweit1@...il.com>, davem@...emloft.net,
        netdev@...r.kernel.org, edumazet@...gle.com, pabeni@...hat.com,
        Herbert Xu <herbert@...dor.apana.org.au>,
        "Paul E. McKenney" <paulmck@...nel.org>
Subject: Re: [PATCH net-next 1/3] net: provide macros for commonly copied
 lockless queue stop/wake code

On Mon, Apr 3, 2023 at 12:03 PM Jakub Kicinski <kuba@...nel.org> wrote:
>
> On Mon, 3 Apr 2023 11:11:35 -0700 Alexander Duyck wrote:
> > On Mon, Apr 3, 2023 at 8:56 AM Jakub Kicinski <kuba@...nel.org> wrote:
> > > I don't think in terms of flushes. Let me add line numbers to the
> > > producer and the consumer.
> > >
> > >  c1. WRITE cons
> > >  c2. mb()  # A
> > >  c3. READ stopped
> > >  c4. rmb() # C
> > >  c5. READ prod, cons
> > >
> > >  p1. WRITE prod
> > >  p2. READ prod, cons
> > >  p3. mb()  # B
> > >  p4. WRITE stopped
> > >  p5. READ prod, cons
> > >
> > > The way I think the mb() orders c1 and c3 vs p2 and p4. The rmb()
> > > orders c3 and c5 vs p1 and p4. Let me impenitently add Paul..
> >
> > So which function is supposed to be consumer vs producer here?
>
> producer is xmit consumer is NAPI
>
> > I think your write stopped is on the wrong side of the memory barrier.
> > It should be writing prod and stopped both before the barrier.
>
> Indeed, Paul pointed out over chat that we need two barriers there
> to be correct :( Should be fine in practice, first one is BQL,
> second one is on the slow path.
>
> > The maybe/try stop should essentially be:
> > 1. write tail
> > 2. read prod/cons
> > 3. if unused >= 1x packet
> > 3.a return
> >
> > 4. set stop
> > 5. mb()
> > 6. Re-read prod/cons
> > 7. if unused >= 1x packet
> > 7.a. test_and_clear stop
> >
> > The maybe/try wake would be:
> > 1. write head
> > 2. read prod/cons
> > 3. if consumed == 0 || unused < 2x packet
> > 3.a. return
> >
> > 4. mb()
> > 5. test_and_clear stop
> >
> > > > One other thing to keep in mind is that the wake gives itself a pretty
> > > > good runway. We are talking about enough to transmit at least 2
> > > > frames. So if another consumer is stopping it we aren't waking it
> > > > unless there is enough space for yet another frame after the current
> > > > consumer.
> > >
> > > Ack, the race is very unlikely, basically the completing CPU would have
> > > to take an expensive IRQ between checking the descriptor count and
> > > checking if stopped -- to let the sending CPU queue multiple frames.
> > >
> > > But in theory the race is there, right?
> >
> > I don't think this is so much a race as a skid. Specifically when we
> > wake the queue it will only run for one more packet in such a
> > scenario. I think it is being run more like a flow control threshold
> > rather than some sort of lock.
> >
> > I think I see what you are getting at though. Basically if the xmit
> > function were to cycle several times between steps 3.a and 4 in the
> > maybe/try wake it could fill the queue and then trigger the wake even
> > though the queue is full and the unused space was already consumed.
>
> Yup, exactly. So we either need to sprinkle a couple more barriers
> and tests in, or document that the code is only 99.999999% safe
> against false positive restarts and drivers need to check for ring
> full at the beginning of xmit.
>
> I'm quite tempted to add the barriers, because on the NAPI/consumer
> side we could use this as an opportunity to start piggy backing on
> the BQL barrier.

The thing is the more barriers we add the more it will hurt
performance. I'd be tempted to just increase the runway we have as we
could afford a 1 packet skid if we had a 2 packet runway for the
start/stop thresholds.

I suspect that is probably why we haven't seen any issues as the
DESC_NEEDED is pretty generous since it is assuming worst case
scenarios.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ