linux-kernel - Re: [PATCH 1/3] ipc: convert ipc_namespace.count from atomic_t to refcount

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAGXu5jJqfB2y0z7prVRdp9O4A=XUFgb_JkcNAsXgY+6-NOL79g@mail.gmail.com>
Date:   Thu, 20 Jul 2017 08:12:42 -0700
From:   Kees Cook <keescook@...omium.org>
To:     "Eric W. Biederman" <ebiederm@...ssion.com>
Cc:     Ingo Molnar <mingo@...nel.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Davidlohr Bueso <dave@...olabs.net>,
        Elena Reshetova <elena.reshetova@...el.com>,
        LKML <linux-kernel@...r.kernel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Greg KH <gregkh@...uxfoundation.org>,
        Ingo Molnar <mingo@...hat.com>,
        Alexey Dobriyan <adobriyan@...il.com>,
        "Serge E. Hallyn" <serge@...lyn.com>, arozansk@...hat.com,
        Hans Liljestrand <ishkamiel@...il.com>,
        David Windsor <dwindsor@...il.com>
Subject: Re: [PATCH 1/3] ipc: convert ipc_namespace.count from atomic_t to refcount_t

On Thu, Jul 20, 2017 at 5:34 AM, Eric W. Biederman
<ebiederm@...ssion.com> wrote:
> Ingo Molnar <mingo@...nel.org> writes:
>
>> * Andrew Morton <akpm@...ux-foundation.org> wrote:
>>
>>> On Wed, 19 Jul 2017 15:54:27 -0700 Davidlohr Bueso <dave@...olabs.net> wrote:
>>>
>>> > On Wed, 19 Jul 2017, Andrew Morton wrote:
>>> >
>>> > >I do rather dislike these conversions from the point of view of
>>> > >performance overhead and general code bloat.  But I seem to have lost
>>> > >that struggle and I don't think any of these are fastpath(?).
>>> >
>>> > Well, since we now have fd25d19 (locking/refcount: Create unchecked atomic_t
>>> > implementation), performance is supposed to be ok.
>>>
>>> Sure, things are OK for people who disable the feature.
>>
>> So with the WIP fast-refcount series from Kees:
>>
>>       [PATCH v6 0/2] x86: Implement fast refcount overflow protection
>>
>> I believe the robustness difference between optimized-refcount_t and
>> full-refcount_t will be marginal.
>>
>> I.e. we'll be able to have both higher API safety _and_ performance.
>>
>>> But for people who want to enable the feature we really should minimize the cost
>>> by avoiding blindly converting sites which simply don't need it: simple, safe,
>>> old, well-tested code.  Why go and slow down such code?  Need to apply some
>>> common sense here...
>>
>> It's old, well-tested code _for existing, sane parameters_, until someone finds a
>> decade old bug in one of these with an insane parameters no-one stumbled upon so
>> far, and builds an exploit on top of it.
>>
>> Only by touching all these places do we have a chance to improve things measurably
>> in terms of reducing the probability of bugs.
>
> The more I hear people pushing the upsides of refcount_t without
> considering the downsides the more I dislike it.
>
> - refcount_t is really the wrong thing because it uses saturation
>   semantics.  So by definition it includes a bug.

This is a feature, not a bug. :) If the kernel has a refcount overflow
flaw (which, in the pantheon of exploitable kernel bugs, is
_common_[1], as I've referenced earlier), then we're downgrading an
exploitable use-after-free to a harmless memory allocation leak. Even
if you don't include malicious attackers in the consideration, this
changes a memory corruption of unknown results into a memory leak.
That's actually an _improvement_ to availability and integrity.

> - refcount_t will only really prevent something if there is an extra
>   increment.  That is not the kind of bug people are likely to make.

Like I've said, this is common. This is usually a mistake in error
handling which forgets (or misplaces) a "put".

> - refcount_t won't help if you have an extra decrement.  The bad
>   use-after-free will still happen.

Yes, and not having a protected refcount_t will also allow a
use-after-free. There is no change here, so it's not a "downside" of
refcount_t. In fact, having gained the implicit annotation of
refcount_t being a refcounter (rather than a simple atomic_t) means
that auditing users is easier and more focused. This could reduce the
chance people make mistakes in the first place, especially since the
API is more constrained than atomic_t.

> - refcount_t won't help if there is a memory stomp.  As with an extra
>   decrement the bad use-after-free will still happen.

A stomp of the refcount_t value itself? Sure, and this remains as
vulnerable as atomic_t. This isn't a downside to refcount_t. And
again, since there _is_ checking of the value in places, it's possible
an actionable warning will be produced (though, yes, after the
use-after-free has been exposed), which is a benefit over simple
atomic_t. I mention this in the commit log ("better to maybe produce
the warning than be universally silent").

> So all I see is a huge amount of code churn to implement a buggy (by
> definition) refcounting API, that risks adding new bugs and only truly
> helps with bugs that are unlikely in the first place.

Given that the conversions alone have been uncovering refcount bugs
and that the implementation isn't "buggy" (it provides a specific set
of protections), I strongly disagree with your assessment.

> I really don't think this is an obvious slam dunk.

It entirely blocks a commonly exploitable flaw in the kernel. This
isn't a probabilistic mitigation, either. While I'm not sure I'd ever
describe a security protection as a slam dunk, I think this is up
there. :)

-Kees

[1] When I say "common", I'm speaking from the perspective of security
flaw frequency. The kernel sees about 1-2 high severity security flaws
a year (with an average lifetime of 5 years), and the
refcount-overflow use-after-free class of flaw is normally reliable
for attackers (and I'd classify as high severity). With 2016 seeing
two known separate refcount-overflow use-after-free flaws, this could
be better described as an epidemic, but I'll try to be less
inflammatory and just say "common".

-- 
Kees Cook
Pixel Security