lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 6 Jul 2022 09:27:11 -0400
From:   Steven Rostedt <rostedt@...dmis.org>
To:     Kuniyuki Iwashima <kuniyu@...zon.com>
Cc:     "David S. Miller" <davem@...emloft.net>,
        Eric Dumazet <edumazet@...gle.com>,
        Jakub Kicinski <kuba@...nel.org>,
        Paolo Abeni <pabeni@...hat.com>,
        Luis Chamberlain <mcgrof@...nel.org>,
        Kees Cook <keescook@...omium.org>,
        Iurii Zaikin <yzaikin@...gle.com>,
        Kuniyuki Iwashima <kuni1840@...il.com>,
        <netdev@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
        Satoru Moriya <satoru.moriya@....com>
Subject: Re: [PATCH v1 net 11/16] net: Fix a data-race around sysctl_mem.

On Wed, 6 Jul 2022 09:17:07 -0400
Steven Rostedt <rostedt@...dmis.org> wrote:

> On Tue, 5 Jul 2022 22:21:25 -0700
> Kuniyuki Iwashima <kuniyu@...zon.com> wrote:
> 
> > --- a/include/trace/events/sock.h
> > +++ b/include/trace/events/sock.h
> > @@ -122,9 +122,9 @@ TRACE_EVENT(sock_exceed_buf_limit,
> >  
> >  	TP_printk("proto:%s sysctl_mem=%ld,%ld,%ld allocated=%ld sysctl_rmem=%d rmem_alloc=%d sysctl_wmem=%d wmem_alloc=%d wmem_queued=%d kind=%s",
> >  		__entry->name,
> > -		__entry->sysctl_mem[0],
> > -		__entry->sysctl_mem[1],
> > -		__entry->sysctl_mem[2],
> > +		READ_ONCE(__entry->sysctl_mem[0]),
> > +		READ_ONCE(__entry->sysctl_mem[1]),
> > +		READ_ONCE(__entry->sysctl_mem[2]),  
> 
> This is not reading anything to do with sysctl. It's reading the content of
> what was recorded in the ring buffer.
> 
> That is, the READ_ONCE() here is not necessary, and if anything will break
> user space parsing, as this is exported to user space to tell it how to
> read the binary format in the ring buffer.

I take that back. Looking at the actual trace event, it is pointing to
sysctl memory, which is a major bug.

TRACE_EVENT(sock_exceed_buf_limit,

        TP_PROTO(struct sock *sk, struct proto *prot, long allocated, int kind),

        TP_ARGS(sk, prot, allocated, kind),

        TP_STRUCT__entry(
                __array(char, name, 32)
                __field(long *, sysctl_mem)

sysctl_mem is a pointer.

                __field(long, allocated)
                __field(int, sysctl_rmem)
                __field(int, rmem_alloc)
                __field(int, sysctl_wmem)
                __field(int, wmem_alloc)
                __field(int, wmem_queued)
                __field(int, kind)
        ),

        TP_fast_assign(
                strncpy(__entry->name, prot->name, 32);

                __entry->sysctl_mem = prot->sysctl_mem;


They save the pointer **IN THE RING BUFFER**!!!

                __entry->allocated = allocated;
                __entry->sysctl_rmem = sk_get_rmem0(sk, prot);
                __entry->rmem_alloc = atomic_read(&sk->sk_rmem_alloc);
                __entry->sysctl_wmem = sk_get_wmem0(sk, prot);
                __entry->wmem_alloc = refcount_read(&sk->sk_wmem_alloc);
                __entry->wmem_queued = READ_ONCE(sk->sk_wmem_queued);
                __entry->kind = kind;
        ),

        TP_printk("proto:%s sysctl_mem=%ld,%ld,%ld allocated=%ld sysctl_rmem=%d rmem_alloc=%d sysctl_wmem=%d wmem_alloc=%d wmem_queued=%d kind=%s",
                __entry->name,
                __entry->sysctl_mem[0],
                __entry->sysctl_mem[1],
                __entry->sysctl_mem[2],

They are now reading a stale pointer, which can be read at any time. That
is, you get the information of what is in sysctl_mem at the time the ring
buffer is read (which is useless from user space), and not at the time of
the event.

Thanks for pointing this out. This needs to be fixed.

-- Steve


                __entry->allocated,
                __entry->sysctl_rmem,
                __entry->rmem_alloc,
                __entry->sysctl_wmem,
                __entry->wmem_alloc,
                __entry->wmem_queued,
                show_skmem_kind_names(__entry->kind)
        )

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ