[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20220706162753.47894-1-kuniyu@amazon.com>
Date:   Wed, 6 Jul 2022 09:27:53 -0700
From:   Kuniyuki Iwashima <kuniyu@...zon.com>
To:     <rostedt@...dmis.org>
CC:     <davem@...emloft.net>, <edumazet@...gle.com>,
        <keescook@...omium.org>, <kuba@...nel.org>, <kuni1840@...il.com>,
        <kuniyu@...zon.com>, <linux-kernel@...r.kernel.org>,
        <mcgrof@...nel.org>, <netdev@...r.kernel.org>, <pabeni@...hat.com>,
        <satoru.moriya@....com>, <yzaikin@...gle.com>
Subject: Re: [PATCH v1 net 11/16] net: Fix a data-race around sysctl_mem.
From:   Steven Rostedt <rostedt@...dmis.org>
Date:   Wed, 6 Jul 2022 09:27:11 -0400
> On Wed, 6 Jul 2022 09:17:07 -0400
> Steven Rostedt <rostedt@...dmis.org> wrote:
> 
> > On Tue, 5 Jul 2022 22:21:25 -0700
> > Kuniyuki Iwashima <kuniyu@...zon.com> wrote:
> > 
> > > --- a/include/trace/events/sock.h
> > > +++ b/include/trace/events/sock.h
> > > @@ -122,9 +122,9 @@ TRACE_EVENT(sock_exceed_buf_limit,
> > >  
> > >  	TP_printk("proto:%s sysctl_mem=%ld,%ld,%ld allocated=%ld sysctl_rmem=%d rmem_alloc=%d sysctl_wmem=%d wmem_alloc=%d wmem_queued=%d kind=%s",
> > >  		__entry->name,
> > > -		__entry->sysctl_mem[0],
> > > -		__entry->sysctl_mem[1],
> > > -		__entry->sysctl_mem[2],
> > > +		READ_ONCE(__entry->sysctl_mem[0]),
> > > +		READ_ONCE(__entry->sysctl_mem[1]),
> > > +		READ_ONCE(__entry->sysctl_mem[2]),  
> > 
> > This is not reading anything to do with sysctl. It's reading the content of
> > what was recorded in the ring buffer.
> > 
> > That is, the READ_ONCE() here is not necessary, and if anything will break
> > user space parsing, as this is exported to user space to tell it how to
> > read the binary format in the ring buffer.
> 
> I take that back. Looking at the actual trace event, it is pointing to
> sysctl memory, which is a major bug.
> 
> TRACE_EVENT(sock_exceed_buf_limit,
> 
>         TP_PROTO(struct sock *sk, struct proto *prot, long allocated, int kind),
> 
>         TP_ARGS(sk, prot, allocated, kind),
> 
>         TP_STRUCT__entry(
>                 __array(char, name, 32)
>                 __field(long *, sysctl_mem)
> 
> sysctl_mem is a pointer.
> 
>                 __field(long, allocated)
>                 __field(int, sysctl_rmem)
>                 __field(int, rmem_alloc)
>                 __field(int, sysctl_wmem)
>                 __field(int, wmem_alloc)
>                 __field(int, wmem_queued)
>                 __field(int, kind)
>         ),
> 
>         TP_fast_assign(
>                 strncpy(__entry->name, prot->name, 32);
> 
>                 __entry->sysctl_mem = prot->sysctl_mem;
> 
> 
> They save the pointer **IN THE RING BUFFER**!!!
> 
>                 __entry->allocated = allocated;
>                 __entry->sysctl_rmem = sk_get_rmem0(sk, prot);
>                 __entry->rmem_alloc = atomic_read(&sk->sk_rmem_alloc);
>                 __entry->sysctl_wmem = sk_get_wmem0(sk, prot);
>                 __entry->wmem_alloc = refcount_read(&sk->sk_wmem_alloc);
>                 __entry->wmem_queued = READ_ONCE(sk->sk_wmem_queued);
>                 __entry->kind = kind;
>         ),
> 
>         TP_printk("proto:%s sysctl_mem=%ld,%ld,%ld allocated=%ld sysctl_rmem=%d rmem_alloc=%d sysctl_wmem=%d wmem_alloc=%d wmem_queued=%d kind=%s",
>                 __entry->name,
>                 __entry->sysctl_mem[0],
>                 __entry->sysctl_mem[1],
>                 __entry->sysctl_mem[2],
> 
> They are now reading a stale pointer, which can be read at any time. That
> is, you get the information of what is in sysctl_mem at the time the ring
> buffer is read (which is useless from user space), and not at the time of
> the event.
> 
> Thanks for pointing this out. This needs to be fixed.
For the record, Steve fixed this properly here, so I'll drop the tracing
part in v2.
https://lore.kernel.org/netdev/20220706105040.54fc03b0@gandalf.local.home/
Powered by blists - more mailing lists
 
