[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20220706162753.47894-1-kuniyu@amazon.com>
Date: Wed, 6 Jul 2022 09:27:53 -0700
From: Kuniyuki Iwashima <kuniyu@...zon.com>
To: <rostedt@...dmis.org>
CC: <davem@...emloft.net>, <edumazet@...gle.com>,
<keescook@...omium.org>, <kuba@...nel.org>, <kuni1840@...il.com>,
<kuniyu@...zon.com>, <linux-kernel@...r.kernel.org>,
<mcgrof@...nel.org>, <netdev@...r.kernel.org>, <pabeni@...hat.com>,
<satoru.moriya@....com>, <yzaikin@...gle.com>
Subject: Re: [PATCH v1 net 11/16] net: Fix a data-race around sysctl_mem.
From: Steven Rostedt <rostedt@...dmis.org>
Date: Wed, 6 Jul 2022 09:27:11 -0400
> On Wed, 6 Jul 2022 09:17:07 -0400
> Steven Rostedt <rostedt@...dmis.org> wrote:
>
> > On Tue, 5 Jul 2022 22:21:25 -0700
> > Kuniyuki Iwashima <kuniyu@...zon.com> wrote:
> >
> > > --- a/include/trace/events/sock.h
> > > +++ b/include/trace/events/sock.h
> > > @@ -122,9 +122,9 @@ TRACE_EVENT(sock_exceed_buf_limit,
> > >
> > > TP_printk("proto:%s sysctl_mem=%ld,%ld,%ld allocated=%ld sysctl_rmem=%d rmem_alloc=%d sysctl_wmem=%d wmem_alloc=%d wmem_queued=%d kind=%s",
> > > __entry->name,
> > > - __entry->sysctl_mem[0],
> > > - __entry->sysctl_mem[1],
> > > - __entry->sysctl_mem[2],
> > > + READ_ONCE(__entry->sysctl_mem[0]),
> > > + READ_ONCE(__entry->sysctl_mem[1]),
> > > + READ_ONCE(__entry->sysctl_mem[2]),
> >
> > This is not reading anything to do with sysctl. It's reading the content of
> > what was recorded in the ring buffer.
> >
> > That is, the READ_ONCE() here is not necessary, and if anything will break
> > user space parsing, as this is exported to user space to tell it how to
> > read the binary format in the ring buffer.
>
> I take that back. Looking at the actual trace event, it is pointing to
> sysctl memory, which is a major bug.
>
> TRACE_EVENT(sock_exceed_buf_limit,
>
> TP_PROTO(struct sock *sk, struct proto *prot, long allocated, int kind),
>
> TP_ARGS(sk, prot, allocated, kind),
>
> TP_STRUCT__entry(
> __array(char, name, 32)
> __field(long *, sysctl_mem)
>
> sysctl_mem is a pointer.
>
> __field(long, allocated)
> __field(int, sysctl_rmem)
> __field(int, rmem_alloc)
> __field(int, sysctl_wmem)
> __field(int, wmem_alloc)
> __field(int, wmem_queued)
> __field(int, kind)
> ),
>
> TP_fast_assign(
> strncpy(__entry->name, prot->name, 32);
>
> __entry->sysctl_mem = prot->sysctl_mem;
>
>
> They save the pointer **IN THE RING BUFFER**!!!
>
> __entry->allocated = allocated;
> __entry->sysctl_rmem = sk_get_rmem0(sk, prot);
> __entry->rmem_alloc = atomic_read(&sk->sk_rmem_alloc);
> __entry->sysctl_wmem = sk_get_wmem0(sk, prot);
> __entry->wmem_alloc = refcount_read(&sk->sk_wmem_alloc);
> __entry->wmem_queued = READ_ONCE(sk->sk_wmem_queued);
> __entry->kind = kind;
> ),
>
> TP_printk("proto:%s sysctl_mem=%ld,%ld,%ld allocated=%ld sysctl_rmem=%d rmem_alloc=%d sysctl_wmem=%d wmem_alloc=%d wmem_queued=%d kind=%s",
> __entry->name,
> __entry->sysctl_mem[0],
> __entry->sysctl_mem[1],
> __entry->sysctl_mem[2],
>
> They are now reading a stale pointer, which can be read at any time. That
> is, you get the information of what is in sysctl_mem at the time the ring
> buffer is read (which is useless from user space), and not at the time of
> the event.
>
> Thanks for pointing this out. This needs to be fixed.
For the record, Steve fixed this properly here, so I'll drop the tracing
part in v2.
https://lore.kernel.org/netdev/20220706105040.54fc03b0@gandalf.local.home/
Powered by blists - more mailing lists