linux-kernel - Re: [PATCH] kcov: convert kcov.refcount to refcount

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CACT4Y+a-V036akpmmOeB0qhL7W8F=vVEue9+-p_qiDgzqJ4UiQ@mail.gmail.com>
Date:   Mon, 21 Jan 2019 18:00:35 +0100
From:   Dmitry Vyukov <dvyukov@...gle.com>
To:     Alan Stern <stern@...land.harvard.edu>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Elena Reshetova <elena.reshetova@...el.com>,
        Andrea Parri <andrea.parri@...rulasolutions.com>,
        Kees Cook <keescook@...omium.org>,
        "Paul E. McKenney" <paulmck@...ux.ibm.com>,
        Will Deacon <will.deacon@....com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Andrey Ryabinin <aryabinin@...tuozzo.com>,
        Anders Roxell <anders.roxell@...aro.org>,
        Mark Rutland <mark.rutland@....com>,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] kcov: convert kcov.refcount to refcount_t

On Mon, Jan 21, 2019 at 5:05 PM Alan Stern <stern@...land.harvard.edu> wrote:
>
> On Mon, 21 Jan 2019, Peter Zijlstra wrote:
>
> > On Mon, Jan 21, 2019 at 10:52:37AM +0100, Dmitry Vyukov wrote:
> > > On Wed, Jan 16, 2019 at 1:51 PM Dmitry Vyukov <dvyukov@...gle.com> wrote:
> >
> > > > KCOV uses refcounts in a very simple canonical way, so no hidden
> > > > ordering implied.
> > > >
> > > > Am I missing something or refcount_dec_and_test does not in fact
> > > > provide ACQUIRE ordering?
> > > >
> > > > +case 5) - decrement-based RMW ops that return a value
> > > > +-----------------------------------------------------
> > > > +
> > > > +Function changes:
> > > > +                atomic_dec_and_test() --> refcount_dec_and_test()
> > > > +                atomic_sub_and_test() --> refcount_sub_and_test()
> > > > +                no atomic counterpart --> refcount_dec_if_one()
> > > > +                atomic_add_unless(&var, -1, 1) --> refcount_dec_not_one(&var)
> > > > +
> > > > +Memory ordering guarantees changes:
> > > > +                fully ordered --> RELEASE ordering + control dependency
> > > >
> > > > I think that's against the expected refcount guarantees. When I
> > > > privatize an  atomic_dec_and_test I would expect that not only stores,
> > > > but also loads act on a quiescent object. But loads can hoist outside
> > > > of the control dependency.
> > > >
> > > > Consider the following example, is it the case that the BUG_ON can still fire?
> > > >
> > > > struct X {
> > > >   refcount_t rc; // == 2
> > > >   int done1, done2; // == 0
> > > > };
> > > >
> > > > // thread 1:
> > > > x->done1 = 1;
> > > > if (refcount_dec_and_test(&x->rc))
> > > >   BUG_ON(!x->done2);
> > > >
> > > > // thread 2:
> > > > x->done2 = 1;
> > > > if (refcount_dec_and_test(&x->rc))
> > > >   BUG_ON(!x->done1);
> >
> > I'm the one responsible for that refcount_t ordering.
> >
> > The rationale for REL+CTRL is that for the final put we want to ensure
> > all prior load/store are complete, because any later access could be a
> > UAF; consider:
> >
> >
> > P0()
> > {
> >       x->foo=0;
> >       if (refcount_dec_and_test(&x->rc))
> >               free(x);
> > }
> >
> > P1()
> > {
> >       x->bar=1;
> >       if (refcount_dec_and_test(&->rc))
> >               free(x);
> > }
> >
> >
> > without release, if would be possible for either (foo,bar) store to
> > happen after the free().
> >
> > Additionally we also need the CTRL to ensure that the actual free()
> > happens _after_ the dec_and_test, freeing early would be bad.
> >
> > But after these two requirements, the basic refcounting works.
> >
> > > The refcount_dec_and_test guarantees look too weak to me, see the
> > > example above. Shouldn't refcount_dec_and_test returning true give the
> > > object in fully quiescent state? Why only control dependency? Loads
> > > can hoist across control dependency, no?
> >
> > Yes, loads can escape like you say.
> >
> > Any additional ordering; like the one you have above; are not strictly
> > required for the proper functioning of the refcount. Rather, you rely on
> > additional ordering and will need to provide this explicitly:
> >
> >
> >       if (refcount_dec_and_text(&x->rc)) {
> >               /*
> >                * Comment that explains what we order against....
> >                */
> >               smp_mb__after_atomic();
> >               BUG_ON(!x->done*);
> >               free(x);
> >       }
> >
> >
> > Also; these patches explicitly mention that refcount_t is weaker,
> > specifically to make people aware of this difference.
> >
> > A full smp_mb() (or two) would also be much more expensive on a number
> > of platforms and in the vast majority of the cases it is not required.
>
> How about adding smp_rmb() into refcount_dec_and_test()?  That would
> give acq+rel semantics, which seems to be what people will expect.  And
> it wouldn't be nearly as expensive as a full smp_mb().

I would expect a ref count api to provide acquire on the last reference release.
Also, even if the code is correct when first submitted and the
developer has done the proper analysis, no acquire will be quite
fragile during subsequent changes: adding a load to a function that is
meant to "own" the object (e.g. a destructor) can subtly break things.
This does not affect x86, so mostly likely won't be noticed right
away, and even when noticed it can cost man-months to debug episodic
misbehaviors.
>From the performance perspective the last release happens less often
than non-last releases, which already contain a barrier, and release
of the last reference usually contains more work (multiple kfree's,
scheduler calls, etc), so it should not be a deal breaker. From safe
API design perspective we can do it the other way around: provide a
safe default and an opt-in version without barrier and longer name for
cases where people do know what they are doing and each bit of
performance really matters.