linux-hardening - Re: [REGRESSION][BISECTED] erroneous buffer overflow detected in bch2_xattr

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAGG=3QUatjhoDHdkDtZ+ftz7JvMhvaQ9XkFyyZSt_95V_nSN8A@mail.gmail.com>
Date: Wed, 16 Oct 2024 16:41:11 -0700
From: Bill Wendling <morbo@...gle.com>
To: Jan Hendrik Farr <kernel@...rr.cc>
Cc: Kees Cook <kees@...nel.org>, Thorsten Blum <thorsten.blum@...lux.com>, kent.overstreet@...ux.dev, 
	regressions@...ts.linux.dev, linux-bcachefs@...r.kernel.org, 
	linux-hardening@...r.kernel.org, linux-kernel@...r.kernel.org, 
	ardb@...nel.org
Subject: Re: [REGRESSION][BISECTED] erroneous buffer overflow detected in bch2_xattr_validate

On Sun, Oct 6, 2024 at 8:56 PM Jan Hendrik Farr <kernel@...rr.cc> wrote:
> > I want to separate several easily confused issues. Instead of just
> > saying __bdos, let's clearly refer to what calculation within bdos is
> > being used. There are 3 choices currently:
> > - alloc_size attribute
> > - counted_by attribute
> > - fallback to __bos (which is similar to sizeof(), except that FAMs are 0 sized)
> >
> > Additionally there are (for all intents and purposes) 2 size
> > determinations to be made by __bos and __bdos, via argument 2:
> > - containing object size (type 0) ("maximum size")
> > - specific object size (type 1) ("minimum size")
>
> "maximum" vs "minimum" size would by type 0 vs type 2, but I think you
> do mean type 0 and type 1 as those are the types currently used by
> __struct_size and __member_size. Those are both "maximum" sizes.
>
> >
> > For example, consider:
> >
> > struct posix_acl *acl = malloc(1024);
> > acl->a_count = 1;
> >
> > what should these return:
> >
> >       __bos(acl, 0)
> >       __bos(acl, 1)
> >       __bdos(acl, 0)
> >       __bdos(acl, 1)
> >       __bos(acl->a_entries, 0)
> >       __bos(acl->a_entries, 1)
> >       __bdos(acl->a_entries, 0)
> >       __bdos(acl->a_entries, 1)
> >
>
Thank you for this detailed write-up! I'm sorry for my late response.

> I gathered some data from clang and gcc on all for all these cases and
> additionally varied whether the allocation size is a compile time known
> constant, runtime known, or not known. I also varied whether
> __counted_by was used.
>
> Source code: [1]
>
>
> Abbreviations:
>
> FAM      = flexible array member
> -1       = SIZE_MAX
> p->a_ent = p->a_entries
> comp.    = allocation size is compile time known
> run.     = allocation size is compile time known
> none     = allocation size is unknown
> count    = __counted_by attribute in use
> correct  = What I think the correct answers should be. In some places I
> have two answers. In that case the second number is what the kernel
> currently expects.
>
>
> And here's the data:
>
> function        |comp.|run.|none|count| gcc  |clang |correct
> ----------------|-----|----|----|-----|------|------|-----
> bos(p, 0)       |  x  |    |    |     | 1024 | 1024 | 1024
> bos(p, 0)       |     | x  |    |     |  -1  |  -1  | -1
> bos(p, 0)       |     |    | x  |     |  -1  |  -1  | -1
> bos(p, 0)       |  x  |    |    |  x  | 1024 | 1024 | 1024
> bos(p, 0)       |     | x  |    |  x  |  -1  |  -1  | -1
> bos(p, 0)       |     |    | x  |  x  |  -1  |  -1  | -1
> ----------------|-----|----|----|-----|------|------|-----
> bos(p, 1)       |  x  |    |    |     | 1024 | 1024 | 1024
> bos(p, 1)       |     | x  |    |     |  -1  |  -1  | -1
> bos(p, 1)       |     |    | x  |     |  -1  |  -1  | -1
> bos(p, 1)       |  x  |    |    |  x  | 1024 | 1024 | 1024
> bos(p, 1)       |     | x  |    |  x  |  -1  |  -1  | -1
> bos(p, 1)       |     |    | x  |  x  |  -1  |  -1  | -1
> ----------------|-----|----|----|-----|------|------|-----
> bdos(p, 0)      |  x  |    |    |     | 1024 | 1024 | 1024
> bdos(p, 0)      |     | x  |    |     | 1024 | 1024 | 1024
> bdos(p, 0)      |     |    | x  |     |  -1  |  -1  | -1
> bdos(p, 0)      |  x  |    |    |  x  | 1024 |  36  | 43 / 40
> bdos(p, 0)      |     | x  |    |  x  | 1024 |  36  | 43 / 40
> bdos(p, 0)      |     |    | x  |  x  |  -1  |  36  | 43 / 40
> ----------------|-----|----|----|-----|------|------|-----
> bdos(p, 1)      |  x  |    |    |     | 1024 | 1024 | 1024
> bdos(p, 1)      |     | x  |    |     | 1024 | 1024 | 1024
> bdos(p, 1)      |     |    | x  |     |  -1  |  -1  | -1
> bdos(p, 1)      |  x  |    |    |  x  | 1024 |  36  | 43 / 40
> bdos(p, 1)      |     | x  |    |  x  | 1024 |  36  | 43 / 40
> bdos(p, 1)      |     |    | x  |  x  |  -1  |  36  | 43 / 40
> ----------------|-----|----|----|-----|------|------|-----
> bos(p->a_ent, 0)|  x  |    |    |     |  996 | 996  | 996
> bos(p->a_ent, 0)|     | x  |    |     |  -1  |  -1  | -1
> bos(p->a_ent, 0)|     |    | x  |     |  -1  |  -1  | -1
> bos(p->a_ent, 0)|  x  |    |    |  x  |  996 | 996  | 996
> bos(p->a_ent, 0)|     | x  |    |  x  |  -1  |  -1  | -1
> bos(p->a_ent, 0)|     |    | x  |  x  |  -1  |  -1  | -1
> ----------------|-----|----|----|-----|------|------|-----
> bos(p->a_ent, 1)|  x  |    |    |     |  996 | 996  | 992
> bos(p->a_ent, 1)|     | x  |    |     |  -1  |  -1  | -1
> bos(p->a_ent, 1)|     |    | x  |     |  -1  |  -1  | -1
> bos(p->a_ent, 1)|  x  |    |    |  x  |  996 | 996  | 992
> bos(p->a_ent, 1)|     | x  |    |  x  |  -1  |  -1  | -1
> bos(p->a_ent, 1)|     |    | x  |  x  |  -1  |  -1  | -1
> ----------------|-----|----|----|-----|------|------|-----
> bdos(p->a_ent,0)|  x  |    |    |     |  996 | 996  | 996
> bdos(p->a_ent,0)|     | x  |    |     |  996 | 996  | 996
> bdos(p->a_ent,0)|     |    | x  |     |  -1  |  -1  | -1
> bdos(p->a_ent,0)|  x  |    |    |  x  |   8  |  8   |  8
> bdos(p->a_ent,0)|     | x  |    |  x  |   8  |  8   |  8
> bdos(p->a_ent,0)|     |    | x  |  x  |   8  |  8   |  8
> ----------------|-----|----|----|-----|------|------|-----
> bdos(p->a_ent,1)|  x  |    |    |     |  996 | 996  | 992
> bdos(p->a_ent,1)|     | x  |    |     |  996 | 996  | 992
> bdos(p->a_ent,1)|     |    | x  |     |  -1  |  -1  | -1
> bdos(p->a_ent,1)|  x  |    |    |  x  |   8  |  8   |  8
> bdos(p->a_ent,1)|     | x  |    |  x  |   8  |  8   |  8
> bdos(p->a_ent,1)|     |    | x  |  x  |   8  |  8   |  8
> ----------------|-----|----|----|-----|------|------|-----
>
> bos only uses the allocation size to give it's answers. It only works if
> it is a compile time known constant. bos also does not utilize the
> __counted_by attribute.
>
> bdos on the other hand allows the allocation size to be runtime known.
> It also makes use of the __counted_by attribute if present, which always
> takes precedence over the allocation size when the compiler supports it
> for the particular case. So in those cases you can "lie" to the compiler
> about the size of an object.
>
> clang supports the __counted_by attribute for both cases (p and
> p->a_entries). gcc only supports it for p->a_entries cases.
>
>
>
> Issue A (clang)
> =======
>
> function        |comp.|run.|none|count| gcc  |clang |correct
> ----------------|-----|----|----|-----|------|------|-----
> bdos(p, 0)      |  x  |    |    |  x  | 1024 |  36  | 43 / 40
> bdos(p, 0)      |     | x  |    |  x  | 1024 |  36  | 43 / 40
> bdos(p, 0)      |     |    | x  |  x  |  -1  |  36  | 43 / 40
> bdos(p, 1)      |  x  |    |    |  x  | 1024 |  36  | 43 / 40
> bdos(p, 1)      |     | x  |    |  x  | 1024 |  36  | 43 / 40
> bdos(p, 1)      |     |    | x  |  x  |  -1  |  36  | 43 / 40
>
> These cases also represent the "bdos off by 4" issue in clang. clang
> will compute these results using:
>
> max(sizeof(struct posix_acl), offsetof(struct posix_acl, a_entries) +
> count * sizeof(struct posix_acl_entries)) = 36
>
> The kernel on the other hand expects this behavior:
>
> sizeof(struct posix_acl) + count * sizeof(struct posix_acl_entries) = 40
>
>
> I think the correct calculation would actually be this:
>
> offsetof(struct posix_acl, a_entries)
> + (acl->a_count + 1) * sizeof(struct posix_acl_entry) - 1 = 43
>
> The C11 standard says that when the . or -> operator is used on a struct
> with an FAM it behaves like the FAM was replaced with the largest array
> (with the same element type) that would not make the object any larger
> (see page 113 and 114 of [2]).
> So there are actually multiple sizes of the object that are consistent
> with a count of 1.
>
> malloc-max = maximum size of the object
> malloc-min = minimum size of the object
> FAME = flexible array member element
> (FAME) = hypothetical 2nd FAME
>
> <-----------------malloc-max-------------->
> <-----------------malloc-min------->
> <------sizeof(posix_acl)------->
>                             <-FAME-><(FAME)>
>
> The clang documentation of type 0 (vs type 2) bdos says this:
>
> If ``type & 2 == 0``, the least ``n`` is returned such that accesses to
>    ``(const char*)ptr + n`` and beyond are known to be out of bounds.
>
> We only _know_ that that access to the last byte of a 2nd hypothetical FAME
> would be out of bounds. All the bytes before that are padding that is
> allowed by the standard.
>
>
> However, also this calculation doesn't get the kernel out
> of trouble here. While this would fix the issue for this particular
> struct it does not solve it for all structs:
>
> What if the elements of the FAM were chars instead of
> struct posix_acl_entries here? In that case the kernel is back to
> overestimating the size of the struct / underreporting the count to the
> compiler. So while I think this answer is more correct it doesn't
> actually solve the issue.
>
> Example:
> Let's say the kernel allocates one of these posix_acl_char structs for a
> single char in the array:
>
> malloc(sizeof(posix_acl_char) + 1 * sizeof(char)) = 33
>
> The C standard actually says that this object will behave like this when
> the FAM is accessed:
>
> struct posix_acl {
>     refcount_t a_refcount;
>     struct rcu_head a_rcu;
>     unsigned int a_count;
>     char a_entries[5];
> };
>
> a_count should be set to 5, not 1!
>
While the standard says that it should act as 5 instead of 1 and that
it's not an error to access padding, the point of the __counted_by
attribute is to check that the user isn't doing anything "bad" by
going outside of whatever bounds have been put in place. So I wouldn't
want __bdos(p->a_entries, 0) to return 5 when the initial allocation
is for 1. It's confusing given the documentation for the attribute.

> So we would really need an option to tell the compiler to use the same
> size calculation as the kernel expects here, or maybe be able to specify
> an offset in the __counted_by attribute. Alternatively clang could use
> an option to disable the use of __counted_by for cases where the whole
> struct is passed. This would make it behave like gcc.
>
I would be in favor of disabling __bdos on a whole struct pointer if
it will match the functionality between the compilers. I don't think
Qing has that on her plate at the moment, but when / if she revisits
that we can discuss exactly how to perform the calculations then.

> Issue B (clang + gcc)
> =======
>
> A less serious issue happens with these cases:
>
> function        |comp.|run.|none|count| gcc  |clang |correct
> ----------------|-----|----|----|-----|------|------|-----
> bos(p->a_ent, 1)|  x  |    |    |     |  996 | 996  | 992
> bos(p->a_ent, 1)|  x  |    |    |  x  |  996 | 996  | 992
> bdos(p->a_ent,1)|  x  |    |    |     |  996 | 996  | 992
> bdos(p->a_ent,1)|     | x  |    |     |  996 | 996  | 992
>
> In this case the size returned by bos/bdos is too large, so this won't
> lead to false positives. Both clang and gcc simply compute the difference
> between the pointer from the start of the FAM to the end of the whole
> struct. I believe this is wrong. According to the C standard the object
> should behave like the FAM was replaced with the largest array that does
> not make the object any larger. The size of that array is 124 elements.
> So the posix_acl becomes:
>
I reported a similar issue to GCC a while back. The response is that
it's not incorrect, because the size is still valid (padding, etc.).
Their view is that, even when asked for the size of the subobject,
they want to return some value, even if it's larger than the
subobject, but not outside the bounds of the full object. I strongly
disagree that it's okay to do that, but I'm probably in the minority.
Clang's support for 1 as the __bdos's second argument isn't great. I'm
trying to fix it, but got sidetracked by higher priority issues.

So in conclusion, if turning off the calculation for a pointer to the
whole struct works, then I'm okay with it.

-bw

> struct posix_acl {
>     refcount_t a_refcount;
>     struct rcu_head a_rcu;
>     unsigned int a_count;
>     struct posix_acl_entry a_entries[124];
> };
>
> Since this is a type 1 bos/bdos it should return the size of just the
> array, which is 124 * 8 = 992, and not 124.5 * 8 = 996.
>
> [1] https://godbolt.org/z/a5eM3z8PY
> [2] https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1548.pdf
>
> Best Regards
> Jan
>