netdev - Re: [PATCH net v2] net: sched: add barrier to ensure correct ordering for lockless qdisc

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20210619103009.GA1530@ip-172-31-30-86.us-east-2.compute.internal>
Date:   Sat, 19 Jun 2021 10:30:09 +0000
From:   Yunsheng Lin <yunshenglin0825@...il.com>
To:     Jakub Kicinski <kuba@...nel.org>
Cc:     Yunsheng Lin <linyunsheng@...wei.com>, davem@...emloft.net,
        olteanv@...il.com, ast@...nel.org, daniel@...earbox.net,
        andriin@...com, edumazet@...gle.com, weiwan@...gle.com,
        cong.wang@...edance.com, ap420073@...il.com,
        netdev@...r.kernel.org, linux-kernel@...r.kernel.org,
        linuxarm@...neuler.org, mkl@...gutronix.de,
        linux-can@...r.kernel.org, jhs@...atatu.com,
        xiyou.wangcong@...il.com, jiri@...nulli.us, andrii@...nel.org,
        kafai@...com, songliubraving@...com, yhs@...com,
        john.fastabend@...il.com, kpsingh@...nel.org, bpf@...r.kernel.org,
        jonas.bonn@...rounds.com, pabeni@...hat.com, mzhivich@...mai.com,
        johunt@...mai.com, albcamus@...il.com, kehuan.feng@...il.com,
        a.fatoum@...gutronix.de, atenart@...nel.org,
        alexander.duyck@...il.com, hdanton@...a.com, jgross@...e.com,
        JKosina@...e.com, mkubecek@...e.cz, bjorn@...nel.org,
        alobakin@...me
Subject: Re: [PATCH net v2] net: sched: add barrier to ensure correct
 ordering for lockless qdisc

On Fri, Jun 18, 2021 at 05:38:37PM -0700, Jakub Kicinski wrote:
> On Fri, 18 Jun 2021 17:30:47 -0700 Jakub Kicinski wrote:
> > On Thu, 17 Jun 2021 09:04:14 +0800 Yunsheng Lin wrote:
> > > The spin_trylock() was assumed to contain the implicit
> > > barrier needed to ensure the correct ordering between
> > > STATE_MISSED setting/clearing and STATE_MISSED checking
> > > in commit a90c57f2cedd ("net: sched: fix packet stuck
> > > problem for lockless qdisc").
> > > 
> > > But it turns out that spin_trylock() only has load-acquire
> > > semantic, for strongly-ordered system(like x86), the compiler
> > > barrier implicitly contained in spin_trylock() seems enough
> > > to ensure the correct ordering. But for weakly-orderly system
> > > (like arm64), the store-release semantic is needed to ensure
> > > the correct ordering as clear_bit() and test_bit() is store
> > > operation, see queued_spin_lock().
> > > 
> > > So add the explicit barrier to ensure the correct ordering
> > > for the above case.
> > > 
> > > Fixes: a90c57f2cedd ("net: sched: fix packet stuck problem for lockless qdisc")
> > > Signed-off-by: Yunsheng Lin <linyunsheng@...wei.com>  
> > 
> > Acked-by: Jakub Kicinski <kuba@...nel.org>
> 
> Actually.. do we really need the _before_atomic() barrier?
> I'd think we only need to make sure we re-check the lock 
> after we set the bit, ordering of the first check doesn't 
> matter.

When debugging pointed to the misordering between STATE_MISSED
setting/clearing and STATE_MISSED checking, only _after_atomic()
was added first, and it did not fix the misordering problem,
when both _before_atomic() and _after_atomic() were added, the
misordering problem disappeared.

I suppose _before_atomic() matters because the STATE_MISSED
setting and the lock rechecking is only done when first check of
STATE_MISSED returns false. _before_atomic() is used to make sure
the first check returns correct result, if it does not return the
correct result, then we may have misordering problem too.

     cpu0                        cpu1
                              clear MISSED
                             _after_atomic()
                                dequeue
    enqueue
 first trylock() #false
  MISSED check #*true* ?

As above, even cpu1 has a _after_atomic() between clearing
STATE_MISSED and dequeuing, we might stiil need a barrier to
prevent cpu0 doing speculative MISSED checking before cpu1
clearing MISSED?

And the implicit load-acquire barrier contained in the first
trylock() does not seems to prevent the above case too.

And there is no load-acquire barrier in pfifo_fast_dequeue()
too, which possibly make the above case more likely to happen.