linux-kernel - Re: [PATCH v4 1/6] binfmt_elf: Support segments with 0 filesz and misaligned starts

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87o7hl80lm.fsf@email.froward.int.ebiederm.org>
Date:   Fri, 29 Sep 2023 10:23:33 -0500
From:   "Eric W. Biederman" <ebiederm@...ssion.com>
To:     Pedro Falcato <pedro.falcato@...il.com>
Cc:     Kees Cook <keescook@...omium.org>,
        Sebastian Ott <sebott@...hat.com>,
        Thomas Weißschuh <linux@...ssschuh.net>,
        Al Viro <viro@...iv.linux.org.uk>,
        Christian Brauner <brauner@...nel.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org,
        linux-mm@...ck.org, linux-hardening@...r.kernel.org
Subject: Re: [PATCH v4 1/6] binfmt_elf: Support segments with 0 filesz and
 misaligned starts

Pedro Falcato <pedro.falcato@...il.com> writes:

> On Fri, Sep 29, 2023 at 4:24 AM Kees Cook <keescook@...omium.org> wrote:
>>
>> From: "Eric W. Biederman" <ebiederm@...ssion.com>
>>
>> Implement a helper elf_load() that wraps elf_map() and performs all
>> of the necessary work to ensure that when "memsz > filesz" the bytes
>> described by "memsz > filesz" are zeroed.
>>
>> An outstanding issue is if the first segment has filesz 0, and has a
>> randomized location. But that is the same as today.
>>
>> In this change I replaced an open coded padzero() that did not clear
>> all of the way to the end of the page, with padzero() that does.
>>
>> I also stopped checking the return of padzero() as there is at least
>> one known case where testing for failure is the wrong thing to do.
>> It looks like binfmt_elf_fdpic may have the proper set of tests
>> for when error handling can be safely completed.
>>
>> I found a couple of commits in the old history
>> https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git,
>> that look very interesting in understanding this code.
>>
>> commit 39b56d902bf3 ("[PATCH] binfmt_elf: clearing bss may fail")
>> commit c6e2227e4a3e ("[SPARC64]: Missing user access return value checks in fs/binfmt_elf.c and fs/compat.c")
>> commit 5bf3be033f50 ("v2.4.10.1 -> v2.4.10.2")
>>
>> Looking at commit 39b56d902bf3 ("[PATCH] binfmt_elf: clearing bss may fail"):
>> >  commit 39b56d902bf35241e7cba6cc30b828ed937175ad
>> >  Author: Pavel Machek <pavel@....cz>
>> >  Date:   Wed Feb 9 22:40:30 2005 -0800
>> >
>> >     [PATCH] binfmt_elf: clearing bss may fail
>> >
>> >     So we discover that Borland's Kylix application builder emits weird elf
>> >     files which describe a non-writeable bss segment.
>> >
>> >     So remove the clear_user() check at the place where we zero out the bss.  I
>> >     don't _think_ there are any security implications here (plus we've never
>> >     checked that clear_user() return value, so whoops if it is a problem).
>> >
>> >     Signed-off-by: Pavel Machek <pavel@...e.cz>
>> >     Signed-off-by: Andrew Morton <akpm@...l.org>
>> >     Signed-off-by: Linus Torvalds <torvalds@...l.org>
>>
>> It seems pretty clear that binfmt_elf_fdpic with skipping clear_user() for
>> non-writable segments and otherwise calling clear_user(), aka padzero(),
>> and checking it's return code is the right thing to do.
>>
>> I just skipped the error checking as that avoids breaking things.
>>
>> And notably, it looks like Borland's Kylix died in 2005 so it might be
>> safe to just consider read-only segments with memsz > filesz an error.
>>
>> Reported-by: Sebastian Ott <sebott@...hat.com>
>> Reported-by: Thomas Weißschuh <linux@...ssschuh.net>
>> Closes: https://lkml.kernel.org/r/20230914-bss-alloc-v1-1-78de67d2c6dd@weissschuh.net
>> Signed-off-by: "Eric W. Biederman" <ebiederm@...ssion.com>
>> Link: https://lore.kernel.org/r/87sf71f123.fsf@email.froward.int.ebiederm.org
>> Signed-off-by: Kees Cook <keescook@...omium.org>
>> ---
>>  fs/binfmt_elf.c | 111 +++++++++++++++++++++---------------------------
>>  1 file changed, 48 insertions(+), 63 deletions(-)
>>
>> diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
>> index 7b3d2d491407..2a615f476e44 100644
>> --- a/fs/binfmt_elf.c
>> +++ b/fs/binfmt_elf.c
>> @@ -110,25 +110,6 @@ static struct linux_binfmt elf_format = {
>>
>>  #define BAD_ADDR(x) (unlikely((unsigned long)(x) >= TASK_SIZE))
>>
>> -static int set_brk(unsigned long start, unsigned long end, int prot)
>> -{
>> -       start = ELF_PAGEALIGN(start);
>> -       end = ELF_PAGEALIGN(end);
>> -       if (end > start) {
>> -               /*
>> -                * Map the last of the bss segment.
>> -                * If the header is requesting these pages to be
>> -                * executable, honour that (ppc32 needs this).
>> -                */
>> -               int error = vm_brk_flags(start, end - start,
>> -                               prot & PROT_EXEC ? VM_EXEC : 0);
>> -               if (error)
>> -                       return error;
>> -       }
>> -       current->mm->start_brk = current->mm->brk = end;
>> -       return 0;
>> -}
>> -
>>  /* We need to explicitly zero any fractional pages
>>     after the data section (i.e. bss).  This would
>>     contain the junk from the file that should not
>> @@ -406,6 +387,51 @@ static unsigned long elf_map(struct file *filep, unsigned long addr,
>>         return(map_addr);
>>  }
>>
>> +static unsigned long elf_load(struct file *filep, unsigned long addr,
>> +               const struct elf_phdr *eppnt, int prot, int type,
>> +               unsigned long total_size)
>> +{
>> +       unsigned long zero_start, zero_end;
>> +       unsigned long map_addr;
>> +
>> +       if (eppnt->p_filesz) {
>> +               map_addr = elf_map(filep, addr, eppnt, prot, type, total_size);
>> +               if (BAD_ADDR(map_addr))
>> +                       return map_addr;
>> +               if (eppnt->p_memsz > eppnt->p_filesz) {
>> +                       zero_start = map_addr + ELF_PAGEOFFSET(eppnt->p_vaddr) +
>> +                               eppnt->p_filesz;
>> +                       zero_end = map_addr + ELF_PAGEOFFSET(eppnt->p_vaddr) +
>> +                               eppnt->p_memsz;
>> +
>> +                       /* Zero the end of the last mapped page */
>> +                       padzero(zero_start);
>> +               }
>> +       } else {
>> +               map_addr = zero_start = ELF_PAGESTART(addr);
>> +               zero_end = zero_start + ELF_PAGEOFFSET(eppnt->p_vaddr) +
>> +                       eppnt->p_memsz;
>
> What happens if a previous segment has mapped ELF_PAGESTART(addr)?
> Don't we risk mapping over that?

It is bug of whomever produced the ELF executable.

The architectural page size is known is part of the per architecture
sysv ABI.  Typical it is the same or larger than the hardware page
size.

ELF executables are always mmaped in page sized chunks.  Which makes a
starting offset part-way through a page weird, and a bit awkward, but it
is something the code already attempts to handle.

> Whereas AFAIK old logic would just padzero the bss bytes.

No.  The old logic would either map that region with elf_map, and then
call padzero to zero out the bss bytes, or the old logic would fail if
the file offset was not contained within the file.

The updated logic if "filesz == 0" simply ignores the file offset
and always mmaps /dev/zero instead.  This means that giving a bogus
file offset does not unnecessarily cause an executable to fail.


If the desired behavior is to have file contents and bss on the same
page of data, the generator of the elf program header needs to
have "memsz > filesz".  That is already well supported for everything
except the elf interpreters.

Eric