[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <202410161343.D871C005@keescook>
Date: Wed, 16 Oct 2024 14:13:39 -0700
From: Kees Cook <kees@...nel.org>
To: Jan Hendrik Farr <kernel@...rr.cc>
Cc: Thorsten Blum <thorsten.blum@...lux.com>, kent.overstreet@...ux.dev,
regressions@...ts.linux.dev, linux-bcachefs@...r.kernel.org,
linux-hardening@...r.kernel.org, linux-kernel@...r.kernel.org,
ardb@...nel.org, morbo@...gle.com
Subject: Re: [REGRESSION][BISECTED] erroneous buffer overflow detected in
bch2_xattr_validate
On Mon, Oct 07, 2024 at 05:10:53PM +0200, Jan Hendrik Farr wrote:
> On 07 05:56:46, Jan Hendrik Farr wrote:
> > > I want to separate several easily confused issues. Instead of just
> > > saying __bdos, let's clearly refer to what calculation within bdos is
> > > being used. There are 3 choices currently:
> > > - alloc_size attribute
> > > - counted_by attribute
> > > - fallback to __bos (which is similar to sizeof(), except that FAMs are 0 sized)
> > >
> > > Additionally there are (for all intents and purposes) 2 size
> > > determinations to be made by __bos and __bdos, via argument 2:
> > > - containing object size (type 0) ("maximum size")
> > > - specific object size (type 1) ("minimum size")
> >
> > "maximum" vs "minimum" size would by type 0 vs type 2, but I think you
> > do mean type 0 and type 1 as those are the types currently used by
> > __struct_size and __member_size. Those are both "maximum" sizes.
> >
> > >
> > > For example, consider:
> > >
> > > struct posix_acl *acl = malloc(1024);
> > > acl->a_count = 1;
> > >
> > > what should these return:
> > >
> > > __bos(acl, 0)
> > > __bos(acl, 1)
> > > __bdos(acl, 0)
> > > __bdos(acl, 1)
> > > __bos(acl->a_entries, 0)
> > > __bos(acl->a_entries, 1)
> > > __bdos(acl->a_entries, 0)
> > > __bdos(acl->a_entries, 1)
> > >
> >
> > I gathered some data from clang and gcc on all for all these cases and
> > additionally varied whether the allocation size is a compile time known
> > constant, runtime known, or not known. I also varied whether
> > __counted_by was used.
> >
> > Source code: [1]
> >
> >
> > Abbreviations:
> >
> > FAM = flexible array member
> > -1 = SIZE_MAX
> > p->a_ent = p->a_entries
> > comp. = allocation size is compile time known
> > run. = allocation size is compile time known
> > none = allocation size is unknown
> > count = __counted_by attribute in use
> > correct = What I think the correct answers should be. In some places I
> > have two answers. In that case the second number is what the kernel
> > currently expects.
> >
> >
> > And here's the data:
> >
> > function |comp.|run.|none|count| gcc |clang |correct
> > ----------------|-----|----|----|-----|------|------|-----
> > bos(p, 0) | x | | | | 1024 | 1024 | 1024
> > bos(p, 0) | | x | | | -1 | -1 | -1
> > bos(p, 0) | | | x | | -1 | -1 | -1
> > bos(p, 0) | x | | | x | 1024 | 1024 | 1024
> > bos(p, 0) | | x | | x | -1 | -1 | -1
> > bos(p, 0) | | | x | x | -1 | -1 | -1
> > ----------------|-----|----|----|-----|------|------|-----
> > bos(p, 1) | x | | | | 1024 | 1024 | 1024
> > bos(p, 1) | | x | | | -1 | -1 | -1
> > bos(p, 1) | | | x | | -1 | -1 | -1
> > bos(p, 1) | x | | | x | 1024 | 1024 | 1024
> > bos(p, 1) | | x | | x | -1 | -1 | -1
> > bos(p, 1) | | | x | x | -1 | -1 | -1
> > ----------------|-----|----|----|-----|------|------|-----
> > bdos(p, 0) | x | | | | 1024 | 1024 | 1024
> > bdos(p, 0) | | x | | | 1024 | 1024 | 1024
> > bdos(p, 0) | | | x | | -1 | -1 | -1
> > bdos(p, 0) | x | | | x | 1024 | 36 | 43 / 40
> > bdos(p, 0) | | x | | x | 1024 | 36 | 43 / 40
> > bdos(p, 0) | | | x | x | -1 | 36 | 43 / 40
> > ----------------|-----|----|----|-----|------|------|-----
> > bdos(p, 1) | x | | | | 1024 | 1024 | 1024
> > bdos(p, 1) | | x | | | 1024 | 1024 | 1024
> > bdos(p, 1) | | | x | | -1 | -1 | -1
> > bdos(p, 1) | x | | | x | 1024 | 36 | 43 / 40
> > bdos(p, 1) | | x | | x | 1024 | 36 | 43 / 40
> > bdos(p, 1) | | | x | x | -1 | 36 | 43 / 40
> > ----------------|-----|----|----|-----|------|------|-----
> > bos(p->a_ent, 0)| x | | | | 996 | 996 | 996
> > bos(p->a_ent, 0)| | x | | | -1 | -1 | -1
> > bos(p->a_ent, 0)| | | x | | -1 | -1 | -1
> > bos(p->a_ent, 0)| x | | | x | 996 | 996 | 996
> > bos(p->a_ent, 0)| | x | | x | -1 | -1 | -1
> > bos(p->a_ent, 0)| | | x | x | -1 | -1 | -1
> > ----------------|-----|----|----|-----|------|------|-----
> > bos(p->a_ent, 1)| x | | | | 996 | 996 | 992
> > bos(p->a_ent, 1)| | x | | | -1 | -1 | -1
> > bos(p->a_ent, 1)| | | x | | -1 | -1 | -1
> > bos(p->a_ent, 1)| x | | | x | 996 | 996 | 992
> > bos(p->a_ent, 1)| | x | | x | -1 | -1 | -1
> > bos(p->a_ent, 1)| | | x | x | -1 | -1 | -1
> > ----------------|-----|----|----|-----|------|------|-----
> > bdos(p->a_ent,0)| x | | | | 996 | 996 | 996
> > bdos(p->a_ent,0)| | x | | | 996 | 996 | 996
> > bdos(p->a_ent,0)| | | x | | -1 | -1 | -1
> > bdos(p->a_ent,0)| x | | | x | 8 | 8 | 8
> > bdos(p->a_ent,0)| | x | | x | 8 | 8 | 8
> > bdos(p->a_ent,0)| | | x | x | 8 | 8 | 8
>
>
> These previous three should probably actually be like this:
> bdos(p->a_ent,0)| x | | | x | 8 | 8 | 15
> bdos(p->a_ent,0)| | x | | x | 8 | 8 | 15
> bdos(p->a_ent,0)| | | x | x | 8 | 8 | 15
>
> They should include the allowed padding after the FAM, as this is a type
> 0 bdos. Not really an issue here, as the kernel expects 8 here.
>
>
> > ----------------|-----|----|----|-----|------|------|-----
> > bdos(p->a_ent,1)| x | | | | 996 | 996 | 992
> > bdos(p->a_ent,1)| | x | | | 996 | 996 | 992
> > bdos(p->a_ent,1)| | | x | | -1 | -1 | -1
> > bdos(p->a_ent,1)| x | | | x | 8 | 8 | 8
> > bdos(p->a_ent,1)| | x | | x | 8 | 8 | 8
> > bdos(p->a_ent,1)| | | x | x | 8 | 8 | 8
> > ----------------|-----|----|----|-----|------|------|-----
Thanks for building all these tables -- I want to look at it all again
myself, but I'm pretty well convinced you've found a bunch of things we
need to sort out.
> > bos only uses the allocation size to give it's answers. It only works if
> > it is a compile time known constant. bos also does not utilize the
> > __counted_by attribute.
> >
> > bdos on the other hand allows the allocation size to be runtime known.
> > It also makes use of the __counted_by attribute if present, which always
> > takes precedence over the allocation size when the compiler supports it
> > for the particular case. So in those cases you can "lie" to the compiler
> > about the size of an object.
I am okay with counted_by winning out over alloc_size when it is present.
However, I would expect this:
void *v = malloc(1024);
struct flex *p = v;
p->counter = 4;
__bdos(p, 1) == struct_size(f, array, 4) // "p" has both size hints
__bdos(v, 1) == 1024 // "v" only has alloc_size
> >
> > clang supports the __counted_by attribute for both cases (p and
> > p->a_entries). gcc only supports it for p->a_entries cases.
I think gcc refuses to believe "p" is anything until it has been
dereferenced to a specific type. I would like it if they could fix this,
as if we have a pointer to a type and we're using __bdos to query its
size, we are explicitly treating it as the type it is pointing at (i.e.
examining the counter and evaluating the FAM size).
> >
> >
> >
> > Issue A (clang)
> > =======
> >
> > function |comp.|run.|none|count| gcc |clang |correct
> > ----------------|-----|----|----|-----|------|------|-----
> > bdos(p, 0) | x | | | x | 1024 | 36 | 43 / 40
> > bdos(p, 0) | | x | | x | 1024 | 36 | 43 / 40
> > bdos(p, 0) | | | x | x | -1 | 36 | 43 / 40
> > bdos(p, 1) | x | | | x | 1024 | 36 | 43 / 40
> > bdos(p, 1) | | x | | x | 1024 | 36 | 43 / 40
> > bdos(p, 1) | | | x | x | -1 | 36 | 43 / 40
> >
> > These cases also represent the "bdos off by 4" issue in clang. clang
> > will compute these results using:
> >
> > max(sizeof(struct posix_acl), offsetof(struct posix_acl, a_entries) +
> > count * sizeof(struct posix_acl_entries)) = 36
> >
> > The kernel on the other hand expects this behavior:
> >
> > sizeof(struct posix_acl) + count * sizeof(struct posix_acl_entries) = 40
> >
> >
> > I think the correct calculation would actually be this:
> >
> > offsetof(struct posix_acl, a_entries)
> > + (acl->a_count + 1) * sizeof(struct posix_acl_entry) - 1 = 43
> >
> > The C11 standard says that when the . or -> operator is used on a struct
> > with an FAM it behaves like the FAM was replaced with the largest array
> > (with the same element type) that would not make the object any larger
> > (see page 113 and 114 of [2]).
> > So there are actually multiple sizes of the object that are consistent
> > with a count of 1.
> >
> > malloc-max = maximum size of the object
> > malloc-min = minimum size of the object
> > FAME = flexible array member element
> > (FAME) = hypothetical 2nd FAME
> >
> > <-----------------malloc-max-------------->
> > <-----------------malloc-min------->
> > <------sizeof(posix_acl)------->
> > <-FAME-><(FAME)>
> >
> > The clang documentation of type 0 (vs type 2) bdos says this:
> >
> > If ``type & 2 == 0``, the least ``n`` is returned such that accesses to
> > ``(const char*)ptr + n`` and beyond are known to be out of bounds.
> >
> > We only _know_ that that access to the last byte of a 2nd hypothetical FAME
> > would be out of bounds. All the bytes before that are padding that is
> > allowed by the standard.
> >
> >
> > However, also this calculation doesn't get the kernel out
> > of trouble here. While this would fix the issue for this particular
> > struct it does not solve it for all structs:
> >
> > What if the elements of the FAM were chars instead of
> > struct posix_acl_entries here? In that case the kernel is back to
> > overestimating the size of the struct / underreporting the count to the
> > compiler. So while I think this answer is more correct it doesn't
> > actually solve the issue.
> >
> > Example:
> > Let's say the kernel allocates one of these posix_acl_char structs for a
> > single char in the array:
> >
> > malloc(sizeof(posix_acl_char) + 1 * sizeof(char)) = 33
> >
> > The C standard actually says that this object will behave like this when
> > the FAM is accessed:
> >
> > struct posix_acl {
> > refcount_t a_refcount;
> > struct rcu_head a_rcu;
> > unsigned int a_count;
> > char a_entries[5];
> > };
> >
> > a_count should be set to 5, not 1!
This is making my head spin. In a practical sense, a struct instance
has a fixed size (which I would say is the calculation using max(),
above). Whether the . or -> operators are used shouldn't matter (to
the program's view of the object size).
This "1 becomes 5" should just not be allowed. I can accept object padding
contributing to object size, but we should not redefine the array size
out from under what it actually is. I don't want array accesses into
padding either. :P
I think 36 is correct here, not 40 nor 43.
> > So we would really need an option to tell the compiler to use the same
> > size calculation as the kernel expects here, or maybe be able to specify
> > an offset in the __counted_by attribute. Alternatively clang could use
> > an option to disable the use of __counted_by for cases where the whole
> > struct is passed. This would make it behave like gcc.
> >
> >
> >
> > Issue B (clang + gcc)
> > =======
> >
> > A less serious issue happens with these cases:
> >
> > function |comp.|run.|none|count| gcc |clang |correct
> > ----------------|-----|----|----|-----|------|------|-----
> > bos(p->a_ent, 1)| x | | | | 996 | 996 | 992
> > bos(p->a_ent, 1)| x | | | x | 996 | 996 | 992
> > bdos(p->a_ent,1)| x | | | | 996 | 996 | 992
> > bdos(p->a_ent,1)| | x | | | 996 | 996 | 992
> >
> > In this case the size returned by bos/bdos is too large, so this won't
> > lead to false positives. Both clang and gcc simply compute the difference
> > between the pointer from the start of the FAM to the end of the whole
> > struct. I believe this is wrong. According to the C standard the object
> > should behave like the FAM was replaced with the largest array that does
> > not make the object any larger. The size of that array is 124 elements.
> > So the posix_acl becomes:
> >
> > struct posix_acl {
> > refcount_t a_refcount;
> > struct rcu_head a_rcu;
> > unsigned int a_count;
> > struct posix_acl_entry a_entries[124];
> > };
> >
> > Since this is a type 1 bos/bdos it should return the size of just the
> > array, which is 124 * 8 = 992, and not 124.5 * 8 = 996.
For type 1, I agree: this needs to be the precise size of the array.
As of now, are GCC and Clang both using:
max(sizeof(*p),
offsetof(typeof(*p), FAM) + count * sizeof(*p->FAM))
?
Linux can adjust struct_size() to match, and I'll check that nothing
shrunk inappropriately...
--
Kees Cook
Powered by blists - more mailing lists