[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAF=yD-+tvv=S3Grw8=UrD2m4w+_n7VGAbQgX9iruOppQnKCbEw@mail.gmail.com>
Date: Wed, 13 Jul 2016 13:48:54 -0400
From: Willem de Bruijn <willemdebruijn.kernel@...il.com>
To: "Eric W. Biederman" <ebiederm@...ssion.com>
Cc: Network Development <netdev@...r.kernel.org>,
David Miller <davem@...emloft.net>,
Craig Gallek <kraig@...gle.com>,
Eric Dumazet <eric.dumazet@...il.com>,
Willem de Bruijn <willemb@...gle.com>
Subject: Re: [PATCH net] sock_diag: invert socket destroy broadcast check
On Fri, Jun 24, 2016 at 6:22 PM, Willem de Bruijn
<willemdebruijn.kernel@...il.com> wrote:
> On Fri, Jun 24, 2016 at 4:41 PM, Eric W. Biederman
> <ebiederm@...ssion.com> wrote:
>> Willem de Bruijn <willemdebruijn.kernel@...il.com> writes:
>>
>>> From: Willem de Bruijn <willemb@...gle.com>
>>>
>>> Socket destruction is only broadcast for a socket sk if a diag
>>> listener is registered and sk is not a kernel socket.
>>>
>>> Invert the test to not even check for listeners for kernel sockets.
>>>
>>> The sock_diag_has_destroy_listeners invocation dereferences
>>> sock_net(sk), which for kernel sockets can be invalid as they do not
>>> take a reference on the network namespace.
>>
>> No. That isn't so. A kernel socket for a network namespace must be
>> destroyed in the network namespace teardown.
I spent some more time looking at this.
inet_ctl_sock_destroy does not destroy the socket if there are still
skbuff with a reference on it (or its sk_wmem_alloc). Skbs are
orphaned when they leave the namespace through dev_forward_skb, but
not when sent out a physical nic (correctly, that would break TSQ).
The bug happened with macvlan on top of bonding on top of a physical
nic. The macvlan lives in a temporary namespace. After the macvlan and
network namespace are destroyed, the physical device has a TCP RST skb
from net.ipv4->tcp_sk queued for tx completion.
I have not able to reproduce this exact scenario, likely because tx
completion handling is on the order of microseconds and not easily
slowed sufficiently for testing. Using a tap device with skb_orphan
commented out, I can cause the issue. Commenting out skb_orrphan is
clearly a gross hack. The point I wanted to verify is that underlying
device is not stopped --and its queues cleaned of skb-- when the
macvlan device is destroyed.
Network namespace teardown is complex. Am I missing a step that does
prevents the above, or does this indeed sound feasible in principle
(if very unlikely in practice)?
Powered by blists - more mailing lists