netdev - Re: Soft lockup in inet_put

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <9DF94C8E-1463-4C10-81E3-E6F4534097CB@fb.com>
Date:   Tue, 20 Dec 2016 03:40:56 +0000
From:   Josef Bacik <jbacik@...com>
To:     Eric Dumazet <eric.dumazet@...il.com>
CC:     Tom Herbert <tom@...bertland.com>,
        David Miller <davem@...emloft.net>,
        Hannes Frederic Sowa <hannes@...essinduktion.org>,
        Craig Gallek <kraigatgoog@...il.com>,
        Linux Kernel Network Developers <netdev@...r.kernel.org>
Subject: Re: Soft lockup in inet_put_port on 4.6


> On Dec 19, 2016, at 9:42 PM, Eric Dumazet <eric.dumazet@...il.com> wrote:
> 
>> On Mon, 2016-12-19 at 18:07 -0800, Tom Herbert wrote:
>> 
>> When sockets created SO_REUSEPORT move to TW state they are placed
>> back on the the tb->owners. fastreuse port is no longer set so we have
>> to walk potential long list of sockets in tb->owners to open a new
>> listener socket. I imagine this is happens when we try to open a new
>> listener SO_REUSEPORT after the system has been running a while and so
>> we hit the long tb->owners list.
> 
> Hmm...  __inet_twsk_hashdance() does not change tb->fastreuse
> 
> So where tb->fastreuse is cleared ?
> 
> If all your sockets have SO_REUSEPORT set, this should not happen.
> 

The app starts out with no SO_REUSEPORT, and then we restart it with that option enabled.  What I suspect is we have all the twsks from the original service, and the fastreuse stuff is cleared.  My naive patch resets it once we add a reuseport sk to the tb and that makes the problem go away.  I'm reworking all of this logic and adding some extra info to the tb to make the reset actually safe.  I'll send those patches out tomorrow. Thanks,

Josef