lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20190522151530.0aca8bce@cakuba.netronome.com>
Date:   Wed, 22 May 2019 15:15:30 -0700
From:   Jakub Kicinski <jakub.kicinski@...ronome.com>
To:     John Fastabend <john.fastabend@...il.com>
Cc:     ast@...nel.org, Eric Dumazet <edumazet@...gle.com>,
        netdev@...r.kernel.org,
        David Beckett <david.beckett@...ronome.com>,
        David Miller <davem@...emloft.net>
Subject: Re: [bpf PATCH v4 1/4] bpf: tls, implement unhash to avoid
 transition out of ESTABLISHED

On Wed, 22 May 2019 14:57:33 -0700, John Fastabend wrote:
> Jakub Kicinski wrote:
> > On Thu, 09 May 2019 21:57:49 -0700, John Fastabend wrote:  
> 
> [...]
> 
> > 
> > Looks like David Beckett managed to trigger another nasty on the
> > release path :/
> > 
> >     BUG: kernel NULL pointer dereference, address: 0000000000000012
> >     PGD 0 P4D 0
> >     Oops: 0000 [#1] SMP PTI
> >     CPU: 7 PID: 0 Comm: swapper/7 Not tainted
> >     5.2.0-rc1-00139-g14629453a6d3 #21 RIP: 0010:tcp_peek_len+0x10/0x60
> >     RSP: 0018:ffffc02e41c54b98 EFLAGS: 00010246
> >     RAX: 0000000000000000 RBX: ffff9cf924c4e030 RCX: 0000000000000051
> >     RDX: 0000000000000000 RSI: 000000000000000c RDI: ffff9cf97128f480
> >     RBP: ffff9cf9365e0300 R08: ffff9cf94fe7d2c0 R09: 0000000000000000
> >     R10: 000000000000036b R11: ffff9cf939735e00 R12: ffff9cf91ad9ae40
> >     R13: ffff9cf924c4e000 R14: ffff9cf9a8fcbaae R15: 0000000000000020
> >     FS: 0000000000000000(0000) GS:ffff9cf9af7c0000(0000)
> >     knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0:
> >     0000000080050033 CR2: 0000000000000012 CR3: 000000013920a003 CR4:
> >     00000000003606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> >     0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> >     0000000000000400 Call Trace:
> >      <IRQ>
> >      strp_data_ready+0x48/0x90
> >      tls_data_ready+0x22/0xd0 [tls]
> >      tcp_rcv_established+0x569/0x620
> >      tcp_v4_do_rcv+0x127/0x1e0
> >      tcp_v4_rcv+0xad7/0xbf0
> >      ip_protocol_deliver_rcu+0x2c/0x1c0
> >      ip_local_deliver_finish+0x41/0x50
> >      ip_local_deliver+0x6b/0xe0
> >      ? ip_protocol_deliver_rcu+0x1c0/0x1c0
> >      ip_rcv+0x52/0xd0
> >      ? ip_rcv_finish_core.isra.20+0x380/0x380
> >      __netif_receive_skb_one_core+0x7e/0x90
> >      netif_receive_skb_internal+0x42/0xf0
> >      napi_gro_receive+0xed/0x150
> >      nfp_net_poll+0x7a2/0xd30 [nfp]
> >      ? kmem_cache_free_bulk+0x286/0x310
> >      net_rx_action+0x149/0x3b0
> >      __do_softirq+0xe3/0x30a
> >      ? handle_irq_event_percpu+0x6a/0x80
> >      irq_exit+0xe8/0xf0
> >      do_IRQ+0x85/0xd0
> >      common_interrupt+0xf/0xf
> >      </IRQ>
> >     RIP: 0010:cpuidle_enter_state+0xbc/0x450
> > 
> > If I read this right strparser calls sock->ops->peek_len(sock), but the
> > sock->sk is already NULL.  I'm guess this is because inet_release()
> > does:
> > 
> > 		sock->sk = NULL;
> > 		sk->sk_prot->close(sk, timeout);
> > 
> > And I don't really see a way for ktls to know that sock->sk is about to
> > be cleared, and therefore no way to stop strparser.  Or for strparser
> > to always do the check, given tcp_peek_len() will do another dereference
> > of sock->sk :S
> > 
> > That's mostly a guess, it takes me half an hour of ktls connections
> > running to repro.
> > 
> > Any advice would be appreciated..  Can we move the sock->sk assignment
> > after close?..
> > 
> > diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
> > index 5183a2daba64..aff93e7cdb31 100644
> > --- a/net/ipv4/af_inet.c
> > +++ b/net/ipv4/af_inet.c
> > @@ -428,8 +428,8 @@ int inet_release(struct socket *sock)
> >                 if (sock_flag(sk, SOCK_LINGER) &&
> >                     !(current->flags & PF_EXITING))
> >                         timeout = sk->sk_lingertime;
> > -               sock->sk = NULL;
> >                 sk->sk_prot->close(sk, timeout);
> > +               sock->sk = NULL;
> >         }
> >         return 0;
> >  }
> > 
> > I don't see IPv6 clearing this pointer, perhaps we don't have to?

Correction here, IPv6 just calls the IPv4 code, that's why IPv6 was
also fixed after my change.

> > We tested it and it seems to works, but this is pre-git code, so
> > it's hard to tell what the reason to clear was :)  
> 
> How about making strp_peek_len tolerant of a null sock->sk?
> 
> diff --git a/net/strparser/strparser.c b/net/strparser/strparser.c
> index e137698e8aef..79518f93d2d8 100644
> --- a/net/strparser/strparser.c
> +++ b/net/strparser/strparser.c
> @@ -84,9 +84,10 @@ static void strp_parser_err(struct strparser *strp, int err,
>  static inline int strp_peek_len(struct strparser *strp)
>  {
>         if (strp->sk) {
> -               struct socket *sock = strp->sk->sk_socket;
> +               struct socket *sock = READ_ONCE(strp->sk->sk_socket);
>  
> -               return sock->ops->peek_len(sock);
> +               if (likely(sock))
> +                       return sock->ops->peek_len(sock);
>         }

Mmm..  I'm not sure - sk->sk_socket doesn't get cleared AFAICT, 
the NULL deref is on sk_state of sock->sk so sock is non-NULL here,
then:

int tcp_peek_len(struct socket *sock)
{
	return tcp_inq(sock->sk);
}
EXPORT_SYMBOL(tcp_peek_len);

Will pass NULL to tcp_inq, which then does:

static inline int tcp_inq(struct sock *sk)
{
	struct tcp_sock *tp = tcp_sk(sk);
	int answ;

	if ((1 << sk->sk_state) & (TCPF_SYN_SENT | TCPF_SYN_RECV)) {
		answ = 0;

And sk->sk_state is what crashes the machine.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ