linux-kernel - Re: [PATCH] blk-mq: Start to fix memory ordering...

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20170906084841.upozocusxerkpqdp@hirez.programming.kicks-ass.net>
Date:   Wed, 6 Sep 2017 10:48:41 +0200
From:   Peter Zijlstra <peterz@...radead.org>
To:     Andrea Parri <parri.andrea@...il.com>
Cc:     Jens Axboe <axboe@...com>, linux-kernel@...r.kernel.org,
        stern@...land.harvard.edu, will.deacon@....com,
        tom.leiming@...il.com, boqun.feng@...il.com,
        paulmck@...ux.vnet.ibm.com, hch@....de, bart.vanassche@....com
Subject: Re: [PATCH] blk-mq: Start to fix memory ordering...

On Wed, Sep 06, 2017 at 09:13:04AM +0200, Andrea Parri wrote:
> > +	smp_mb__before_atomic();
> 
> I am wondering whether we should be using smp_wmb() instead: this would
> provide the above guarantee and save a full barrier on powerpc/arm64.

Right, did that.

> > +	set_bit(REQ_ATOM_STARTED, &rq->atomic_flags);
> > +	if (test_bit(REQ_ATOM_COMPLETE, &rq->atomic_flags)) {
> > +		/*
> > +		 * Coherence order guarantees these consequtive stores to a
> > +		 * singe variable propagate in the specified order. Thus the
> > +		 * clear_bit() is ordered _after_ the set bit. See
> > +		 * blk_mq_check_expired().
> > +		 */
> >  		clear_bit(REQ_ATOM_COMPLETE, &rq->atomic_flags);
> 
> It could be useful to stress that set_bit(), clear_bit()  must "act" on
> the same subword of the unsigned long (whatever "subword" means at this
> level...) to rely on the coherence order (c.f., alpha's implementation).

As I wrote to your initial reply (which I now saw was private) I went
through the architectures again and found h8300 to use byte ops to
implement all the bitops.

So subword here means byte :/

The last time we looked at this was for PG_waiter and back then I think
we settled on u32 (with Alpha for example using 32bit ll/sc ops). Linus
moved PG_waiters to the same byte though, so that is all fine.

  b91e1302ad9b ("mm: optimize PageWaiters bit use for unlock_page()")

> > +	if (time_after_eq(jiffies, deadline)) {
> > +		if (!blk_mark_rq_complete(rq)) {
> > +			/*
> > +			 * Relies on the implied MB from test_and_clear() to
> > +			 * order the COMPLETE load against the STARTED load.
> > +			 * Orders against the coherence order in
> > +			 * blk_mq_start_request().
> 
> I understand "from test_and_set_bit()" (in blk_mark_rq_complete()) and
> that the interested cycle is:
> 
>    /* in blk_mq_start_request() */
>    [STORE STARTED bit = 1 into atomic_flags]
>       -->co [STORE COMPLETE bit = 0 into atomic_flags]
>          /* in blk_mq_check_expired() */
>          -->rf [LOAD COMPLETE bit = 0 from atomic_flags]
>             -->po-loc [LOAD STARTED bit = 0 from atomic_flags]
>                /* in blk_mq_start_request() again */
>                -->fr [STORE STARTED bit = 1 into atomic_flags]
> 
>    (N.B. Assume all accesses happen to/from the same subword.)
> 
> This cycle being forbidden by the "coherence check", I'd say we do not
> need to rely on the MB mentioned by the comment; what am I missing?

Nothing, I forgot about the read-after-read thing but did spot the MB.
Either one suffices to guarantee the order we need. It just needs to be
documented as being relied upon.