lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 21 Jun 2019 10:58:54 -0700
From:   Joe Stringer <joe@...d.net.nz>
To:     Eric Dumazet <eric.dumazet@...il.com>,
        Florian Westphal <fw@...len.de>
Cc:     netdev <netdev@...r.kernel.org>,
        john fastabend <john.fastabend@...il.com>,
        Daniel Borkmann <daniel@...earbox.net>,
        Lorenz Bauer <lmb@...udflare.com>,
        Jakub Sitnicki <jakub@...udflare.com>,
        Paolo Abeni <pabeni@...hat.com>
Subject: Removing skb_orphan() from ip_rcv_core()

Hi folks, picking this up again..

As discussed during LSFMM, I've been looking at adding something like
an `skb_sk_assign()` helper to BPF so that logic similar to TPROXY can
be implemented with integration into other BPF logic, however
currently any attempts to do so are blocked by the skb_orphan() call
in ip_rcv_core() (which will effectively ignore any socket assign
decision made by the TC BPF program).

Recently I was attempting to remove the skb_orphan() call, and I've
been trying different things but there seems to be some context I'm
missing. Here's the core of the patch:

diff --git a/net/ipv4/ip_input.c b/net/ipv4/ip_input.c
index ed97724c5e33..16aea980318a 100644
--- a/net/ipv4/ip_input.c
+++ b/net/ipv4/ip_input.c
@@ -500,8 +500,6 @@ static struct sk_buff *ip_rcv_core(struct sk_buff
*skb, struct net *net)
       memset(IPCB(skb), 0, sizeof(struct inet_skb_parm));
       IPCB(skb)->iif = skb->skb_iif;

-       /* Must drop socket now because of tproxy. */
-       skb_orphan(skb);

       return skb;

The statement that the socket must be dropped because of tproxy
doesn't make sense to me, because the PRE_ROUTING hook is hit after
this, which will call into the tproxy logic and eventually
nf_tproxy_assign_sock() which already does the skb_orphan() itself.

However, if I drop these lines then I end up causing sockets to
release references too many times. Seems like if we don't orphan the
skb here, then later logic assumes that we have one more reference
than we actually have, and decrements the count when it shouldn't
(perhaps the skb_steal_sock() call in __inet_lookup_skb() which seems
to assume we always have a reference to the socket?)

Splat:

refcount_t hit zero at sk_stop_timer+0x2c/0x30 in cilium-agent[16359],
uid/euid: 0/0
WARNING: CPU: 0 PID: 16359 at kernel/panic.c:686 refcount_error_report+0x9c/0xa1
...
? inet_put_port+0xa6/0xd0
inet_csk_clear_xmit_timers+0x2e/0x50
tcp_done+0x8b/0xf0
tcp_reset+0x49/0xc0
tcp_validate_incoming+0x2f7/0x410
tcp_rcv_state_process+0x250/0xdb6
? tcp_v4_connect+0x46f/0x4e0
tcp_v4_do_rcv+0xbd/0x1f0
__release_sock+0x84/0xd0
release_sock+0x30/0xa0
inet_stream_connect+0x47/0x60

(Full version: https://gist.github.com/joestringer/d5313e4bf4231e2c46405bd7a3053936
)

This seems potentially related to some of the socket referencing
discussion in the peer thread "[RFC bpf-next 0/7] Programming socket
lookup with BPF".

During LSFMM, it seemed like no-one knew quite why the skb_orphan() is
necessary in that path in the current version of the code, and that we
may be able to remove it. Florian, I know you weren't in the room for
that discussion, so raising it again now with a stack trace, Do you
have some sense what's going on here and whether there's a path
towards removing it from this path or allowing the skb->sk to be
retained during ip_rcv() in some conditions?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ