linux-kernel - Re: [PATCH] secretmem: Prevent secretmem

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <202110251438.1762406A5@keescook>
Date:   Mon, 25 Oct 2021 15:30:03 -0700
From:   Kees Cook <keescook@...omium.org>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     Matthew Wilcox <willy@...radead.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Linux-MM <linux-mm@...ck.org>,
        Jordy Zomer <jordy@...ing.systems>,
        James Bottomley <James.Bottomley@...senpartnership.com>,
        Mike Rapoport <rppt@...nel.org>,
        Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [PATCH] secretmem: Prevent secretmem_users from wrapping to zero

On Mon, Oct 25, 2021 at 02:17:58PM -0700, Linus Torvalds wrote:
> On Mon, Oct 25, 2021 at 2:04 PM Kees Cook <keescook@...omium.org> wrote:
> >
> > Is secretmem different? We're trying to count how many of these we have,
> > this is a common pattern in, for example, the network code which does
> > this kind of thing a lot.
> 
> Yes, secretmem is different.

Prefix: I'm fine with this being whatever; it's a corner case, so this
reply is mainly about nailing down the rationales for future decisions.

> A refcount being zero means that the data it referenced no longer exists.

I don't disagree with this definition, but I would like to understand how
some other use-cases fit into this. The case of basic allocated object memory
lifetime reference counting we all agree on. What about the case of what
I see that is more like a "shared resource usage count" where the shared
resource doesn't necessarily disappear when we reach "no users"?

i.e. there is some resource, and it starts its life with no one using it
(count = 1). Then we have users declaring their use (count = 2) and later
release (count = 1) of the resource. It's not really ever unallocated when
users == 0 (count == 1), but we might use the usage counter for other
things and don't want to let it go negative or ever increment from zero
(in case zero is used for marking the resource unavailable forever). For
example, protocols knowing if there are any sockets left open, crypto
API usage counts, kernel module usage counts, etc.

I don't see as clear a distinction between secretmem and the above
examples. The question being answered is "how many users do I have?" and
we want to make sure we don't end up with overflow or underflow given the
(unfortunately) common case of reference counting kernel bugs. The fact
that secretmem doesn't have its own allocation to free when it hits 0
seems like just an implementation detail of this particular resource
usage counter.

But, ignoring secretmem for a moment, what about a specific example
from above: the module loader's refcnt atomic_t, which is actually
maintaining an allocation (the module), and uses usage counts of 0 to mean
"I am removing this module", and 1 is "I have no users", 2 is "1 user",
etc. It seems like this should use refcount_t to me?

> Stop arguing for garbage. It was wrong, just admit it. The semantics
> for "refcount" is something else than what that code had. As a result,
> it got reverted. I've applied Willy's patch that actually makes sense.
> 
> Arguing for garbage in the name of "security" is still garbage. In
> fact, it only causes confusion, and as such is likely to result in
> problems - including security problems - later.

Sure, and reasonable people can disagree about what is garbage. :) I see
using a refcount_t here as a not unreasonable way to protect against
potential future security problems if the scope of secretmem ever grows
and more than just hibernation starts to care about this usage counter.

> Because confusion about semantics is bad. And that was what that patch was.
> 
> And I want to state - again - how dangerous this "refcounts are always
> prefereable to atomics" mental model is. Refcounts are _not_
> fundamentally preferable to atomics. They are slower, bigger, and have
> completely different semantics.

I will push back on this; I don't think that's a fair assessment at all.
For storage size, refcount_t is identical to atomic_t (refcount_t _is_
an atomic_t). But perhaps you meant code size? Yes, the refcount_t
helpers are technically larger. But like speed, where refcount_t is also
technically slower, neither size nor speed were so much changed that
we kept around the routines that made it exactly the same speed and
grew the instruction count by 1 (the original x86-specific refcount_t
implementation was just as fast as atomic_t). I don't see speed nor size
alone to be a good reason to say "don't use refcount_t".

But yes, I agree about the different semantics: there are very specific
memory ordering assumptions that tend to be more strict than atomic_t
(which IMO actually makes it more suitable than atomic_t for shared
usage counts).

> So if something isn't a refcount, it damn well shouldn't use "refcount_t".

Again, I don't disagree, but since it looked like a refcount to me, I'd
like to understand what why we don't see this case the same way. Since I
agree that secretmem is currently pretty iffy (nothing actually allocated
to track, only system state), I'll ask a slightly different question:
should the module loader use refcount_t? If not, why?

-- 
Kees Cook