linux-kernel - Re: [REGRESSION][BISECTED] erroneous buffer overflow detected in bch2_xattr

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZwNb-_UPL9BPSg9N@archlinux>
Date: Mon, 7 Oct 2024 05:56:43 +0200
From: Jan Hendrik Farr <kernel@...rr.cc>
To: Kees Cook <kees@...nel.org>
Cc: Thorsten Blum <thorsten.blum@...lux.com>, kent.overstreet@...ux.dev,
	regressions@...ts.linux.dev, linux-bcachefs@...r.kernel.org,
	linux-hardening@...r.kernel.org, linux-kernel@...r.kernel.org,
	ardb@...nel.org, morbo@...gle.com
Subject: Re: [REGRESSION][BISECTED] erroneous buffer overflow detected in
 bch2_xattr_validate

> I want to separate several easily confused issues. Instead of just
> saying __bdos, let's clearly refer to what calculation within bdos is
> being used. There are 3 choices currently:
> - alloc_size attribute
> - counted_by attribute
> - fallback to __bos (which is similar to sizeof(), except that FAMs are 0 sized)
> 
> Additionally there are (for all intents and purposes) 2 size
> determinations to be made by __bos and __bdos, via argument 2:
> - containing object size (type 0) ("maximum size")
> - specific object size (type 1) ("minimum size")

"maximum" vs "minimum" size would by type 0 vs type 2, but I think you
do mean type 0 and type 1 as those are the types currently used by
__struct_size and __member_size. Those are both "maximum" sizes.

> 
> For example, consider:
> 
> struct posix_acl *acl = malloc(1024);
> acl->a_count = 1;
> 
> what should these return:
> 
> 	__bos(acl, 0)
> 	__bos(acl, 1)
> 	__bdos(acl, 0)
> 	__bdos(acl, 1)
> 	__bos(acl->a_entries, 0)
> 	__bos(acl->a_entries, 1)
> 	__bdos(acl->a_entries, 0)
> 	__bdos(acl->a_entries, 1)
> 

I gathered some data from clang and gcc on all for all these cases and
additionally varied whether the allocation size is a compile time known
constant, runtime known, or not known. I also varied whether
__counted_by was used.

Source code: [1]


Abbreviations:

FAM      = flexible array member
-1       = SIZE_MAX
p->a_ent = p->a_entries
comp.    = allocation size is compile time known
run.     = allocation size is compile time known
none     = allocation size is unknown
count    = __counted_by attribute in use
correct  = What I think the correct answers should be. In some places I
have two answers. In that case the second number is what the kernel
currently expects.


And here's the data:

function        |comp.|run.|none|count| gcc  |clang |correct
----------------|-----|----|----|-----|------|------|-----
bos(p, 0)       |  x  |    |    |     | 1024 | 1024 | 1024
bos(p, 0)       |     | x  |    |     |  -1  |  -1  | -1
bos(p, 0)       |     |    | x  |     |  -1  |  -1  | -1
bos(p, 0)       |  x  |    |    |  x  | 1024 | 1024 | 1024
bos(p, 0)       |     | x  |    |  x  |  -1  |  -1  | -1
bos(p, 0)       |     |    | x  |  x  |  -1  |  -1  | -1
----------------|-----|----|----|-----|------|------|-----
bos(p, 1)       |  x  |    |    |     | 1024 | 1024 | 1024
bos(p, 1)       |     | x  |    |     |  -1  |  -1  | -1
bos(p, 1)       |     |    | x  |     |  -1  |  -1  | -1
bos(p, 1)       |  x  |    |    |  x  | 1024 | 1024 | 1024
bos(p, 1)       |     | x  |    |  x  |  -1  |  -1  | -1
bos(p, 1)       |     |    | x  |  x  |  -1  |  -1  | -1
----------------|-----|----|----|-----|------|------|-----
bdos(p, 0)      |  x  |    |    |     | 1024 | 1024 | 1024
bdos(p, 0)      |     | x  |    |     | 1024 | 1024 | 1024
bdos(p, 0)      |     |    | x  |     |  -1  |  -1  | -1
bdos(p, 0)      |  x  |    |    |  x  | 1024 |  36  | 43 / 40
bdos(p, 0)      |     | x  |    |  x  | 1024 |  36  | 43 / 40
bdos(p, 0)      |     |    | x  |  x  |  -1  |  36  | 43 / 40
----------------|-----|----|----|-----|------|------|-----
bdos(p, 1)      |  x  |    |    |     | 1024 | 1024 | 1024
bdos(p, 1)      |     | x  |    |     | 1024 | 1024 | 1024
bdos(p, 1)      |     |    | x  |     |  -1  |  -1  | -1
bdos(p, 1)      |  x  |    |    |  x  | 1024 |  36  | 43 / 40
bdos(p, 1)      |     | x  |    |  x  | 1024 |  36  | 43 / 40
bdos(p, 1)      |     |    | x  |  x  |  -1  |  36  | 43 / 40
----------------|-----|----|----|-----|------|------|-----
bos(p->a_ent, 0)|  x  |    |    |     |  996 | 996  | 996
bos(p->a_ent, 0)|     | x  |    |     |  -1  |  -1  | -1
bos(p->a_ent, 0)|     |    | x  |     |  -1  |  -1  | -1
bos(p->a_ent, 0)|  x  |    |    |  x  |  996 | 996  | 996
bos(p->a_ent, 0)|     | x  |    |  x  |  -1  |  -1  | -1
bos(p->a_ent, 0)|     |    | x  |  x  |  -1  |  -1  | -1
----------------|-----|----|----|-----|------|------|-----
bos(p->a_ent, 1)|  x  |    |    |     |  996 | 996  | 992
bos(p->a_ent, 1)|     | x  |    |     |  -1  |  -1  | -1
bos(p->a_ent, 1)|     |    | x  |     |  -1  |  -1  | -1
bos(p->a_ent, 1)|  x  |    |    |  x  |  996 | 996  | 992
bos(p->a_ent, 1)|     | x  |    |  x  |  -1  |  -1  | -1
bos(p->a_ent, 1)|     |    | x  |  x  |  -1  |  -1  | -1
----------------|-----|----|----|-----|------|------|-----
bdos(p->a_ent,0)|  x  |    |    |     |  996 | 996  | 996
bdos(p->a_ent,0)|     | x  |    |     |  996 | 996  | 996
bdos(p->a_ent,0)|     |    | x  |     |  -1  |  -1  | -1
bdos(p->a_ent,0)|  x  |    |    |  x  |   8  |  8   |  8
bdos(p->a_ent,0)|     | x  |    |  x  |   8  |  8   |  8
bdos(p->a_ent,0)|     |    | x  |  x  |   8  |  8   |  8
----------------|-----|----|----|-----|------|------|-----
bdos(p->a_ent,1)|  x  |    |    |     |  996 | 996  | 992
bdos(p->a_ent,1)|     | x  |    |     |  996 | 996  | 992
bdos(p->a_ent,1)|     |    | x  |     |  -1  |  -1  | -1
bdos(p->a_ent,1)|  x  |    |    |  x  |   8  |  8   |  8
bdos(p->a_ent,1)|     | x  |    |  x  |   8  |  8   |  8
bdos(p->a_ent,1)|     |    | x  |  x  |   8  |  8   |  8
----------------|-----|----|----|-----|------|------|-----

bos only uses the allocation size to give it's answers. It only works if
it is a compile time known constant. bos also does not utilize the
__counted_by attribute.

bdos on the other hand allows the allocation size to be runtime known.
It also makes use of the __counted_by attribute if present, which always
takes precedence over the allocation size when the compiler supports it
for the particular case. So in those cases you can "lie" to the compiler
about the size of an object.

clang supports the __counted_by attribute for both cases (p and
p->a_entries). gcc only supports it for p->a_entries cases.



Issue A (clang)
=======

function        |comp.|run.|none|count| gcc  |clang |correct
----------------|-----|----|----|-----|------|------|-----
bdos(p, 0)      |  x  |    |    |  x  | 1024 |  36  | 43 / 40
bdos(p, 0)      |     | x  |    |  x  | 1024 |  36  | 43 / 40
bdos(p, 0)      |     |    | x  |  x  |  -1  |  36  | 43 / 40
bdos(p, 1)      |  x  |    |    |  x  | 1024 |  36  | 43 / 40
bdos(p, 1)      |     | x  |    |  x  | 1024 |  36  | 43 / 40
bdos(p, 1)      |     |    | x  |  x  |  -1  |  36  | 43 / 40

These cases also represent the "bdos off by 4" issue in clang. clang
will compute these results using:

max(sizeof(struct posix_acl), offsetof(struct posix_acl, a_entries) +
count * sizeof(struct posix_acl_entries)) = 36

The kernel on the other hand expects this behavior:

sizeof(struct posix_acl) + count * sizeof(struct posix_acl_entries) = 40


I think the correct calculation would actually be this:

offsetof(struct posix_acl, a_entries)
+ (acl->a_count + 1) * sizeof(struct posix_acl_entry) - 1 = 43

The C11 standard says that when the . or -> operator is used on a struct
with an FAM it behaves like the FAM was replaced with the largest array
(with the same element type) that would not make the object any larger
(see page 113 and 114 of [2]).
So there are actually multiple sizes of the object that are consistent
with a count of 1.

malloc-max = maximum size of the object
malloc-min = minimum size of the object
FAME = flexible array member element
(FAME) = hypothetical 2nd FAME

<-----------------malloc-max-------------->
<-----------------malloc-min------->
<------sizeof(posix_acl)------->
                            <-FAME-><(FAME)>

The clang documentation of type 0 (vs type 2) bdos says this:

If ``type & 2 == 0``, the least ``n`` is returned such that accesses to 
   ``(const char*)ptr + n`` and beyond are known to be out of bounds.

We only _know_ that that access to the last byte of a 2nd hypothetical FAME
would be out of bounds. All the bytes before that are padding that is
allowed by the standard.


However, also this calculation doesn't get the kernel out
of trouble here. While this would fix the issue for this particular
struct it does not solve it for all structs:

What if the elements of the FAM were chars instead of
struct posix_acl_entries here? In that case the kernel is back to
overestimating the size of the struct / underreporting the count to the
compiler. So while I think this answer is more correct it doesn't
actually solve the issue.

Example:
Let's say the kernel allocates one of these posix_acl_char structs for a
single char in the array:

malloc(sizeof(posix_acl_char) + 1 * sizeof(char)) = 33

The C standard actually says that this object will behave like this when
the FAM is accessed:

struct posix_acl {
    refcount_t a_refcount;
    struct rcu_head a_rcu;
    unsigned int a_count;
    char a_entries[5];
};

a_count should be set to 5, not 1!


So we would really need an option to tell the compiler to use the same
size calculation as the kernel expects here, or maybe be able to specify
an offset in the __counted_by attribute. Alternatively clang could use
an option to disable the use of __counted_by for cases where the whole
struct is passed. This would make it behave like gcc.



Issue B (clang + gcc)
=======

A less serious issue happens with these cases:

function        |comp.|run.|none|count| gcc  |clang |correct
----------------|-----|----|----|-----|------|------|-----
bos(p->a_ent, 1)|  x  |    |    |     |  996 | 996  | 992
bos(p->a_ent, 1)|  x  |    |    |  x  |  996 | 996  | 992
bdos(p->a_ent,1)|  x  |    |    |     |  996 | 996  | 992
bdos(p->a_ent,1)|     | x  |    |     |  996 | 996  | 992

In this case the size returned by bos/bdos is too large, so this won't
lead to false positives. Both clang and gcc simply compute the difference
between the pointer from the start of the FAM to the end of the whole
struct. I believe this is wrong. According to the C standard the object
should behave like the FAM was replaced with the largest array that does
not make the object any larger. The size of that array is 124 elements.
So the posix_acl becomes:

struct posix_acl {
    refcount_t a_refcount;
    struct rcu_head a_rcu;
    unsigned int a_count;
    struct posix_acl_entry a_entries[124];
};

Since this is a type 1 bos/bdos it should return the size of just the
array, which is 124 * 8 = 992, and not 124.5 * 8 = 996.

[1] https://godbolt.org/z/a5eM3z8PY
[2] https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1548.pdf

Best Regards
Jan