lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHk-=whKbJkuVmzb0hD3N6q7veprUrSpiBHRxVY=AffWZPtxmg@mail.gmail.com>
Date:   Wed, 9 Jun 2021 13:08:34 -0700
From:   Linus Torvalds <torvalds@...ux-foundation.org>
To:     James Wang <jnwang@...ux.alibaba.com>,
        Steven Rostedt <rostedt@...dmis.org>
Cc:     Liangyan <liangyan.peng@...ux.alibaba.com>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Ingo Molnar <mingo@...hat.com>,
        Xunlei Pang <xlpang@...ux.alibaba.com>,
        yinbinbin@...babacloud.com, wetp <wetp.zy@...ux.alibaba.com>,
        stable <stable@...r.kernel.org>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>
Subject: Re: [PATCH] tracing: Correct the length check which causes memory corruption

Steven?

On Mon, Jun 7, 2021 at 6:46 AM James Wang <jnwang@...ux.alibaba.com> wrote:
>
> >
> > James Wang has reproduced it stably on the latest 4.19 LTS.
> > After some debugging, we finally proved that it's due to ftrace
> > buffer out-of-bound access using a debug tool as follows:
[..]

Looks about right:

> > diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> > index a21ef9cd2aae..9299057feb56 100644
> > --- a/kernel/trace/trace.c
> > +++ b/kernel/trace/trace.c
> > @@ -2736,7 +2736,7 @@ trace_event_buffer_lock_reserve(struct trace_buffer **current_rb,
> >           (entry = this_cpu_read(trace_buffered_event))) {
> >               /* Try to use the per cpu buffer first */
> >               val = this_cpu_inc_return(trace_buffered_event_cnt);
> > -             if ((len < (PAGE_SIZE - sizeof(*entry))) && val == 1) {
> > +             if ((len < (PAGE_SIZE - sizeof(*entry) - sizeof(entry->array[0]))) && val == 1) {
> >                       trace_event_setup(entry, type, trace_ctx);
> >                       entry->array[0] = len;
> >                       return entry;

I have to say that I don't love that code. Not before, and not with the fix.

That "sizeof(*entry)" is clearly wrong, because it doesn't take the
unsized array into account.

But adding the sizeof() for a single array entry doesn't make that
already unreadable and buggy code much more readable.

It would probably be better to use "struct_size(entry, buffer, 1)"
instead, and I think it would be good to just split things up a bit to
be more legibe:

        unsigned long max_len = PAGE_SIZE - struct_size(entry, array, 1);

        if (val == 1 && len < max_len && val == 1) {
                trace_event_setup(entry, type, trace_ctx);
                ..

instead.

However, I have a few questions:

 - why "len < max_offset" rather than "<="?

 - why don't we check the length before we even try to reserve that
percpu buffer with the expensive atomic this_cpu_inc_return()?

 - is the size of that array guaranteed to always be 1? If so, why is
it unsized? Why is it an array at all?

 - clearly the array{} size must not be guaranteed to be 1, but why a
size of 1 then always sufficient here? Clearly a size of 1 is the
minimum required since we do that

        entry->array[0] = len;

   and thus use one entry, but what is it that makes it ok that it
really is just one entry?

Steven, please excuse the above stupid questions of mine, but that
code looks really odd.

               Linus

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ