netdev - Re: Soft lockup in inet_put

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1481828016.24490.5@smtp.office365.com>
Date:   Thu, 15 Dec 2016 13:53:36 -0500
From:   Josef Bacik <jbacik@...com>
To:     Tom Herbert <tom@...bertland.com>
CC:     Craig Gallek <kraigatgoog@...il.com>,
        Hannes Frederic Sowa <hannes@...essinduktion.org>,
        Eric Dumazet <eric.dumazet@...il.com>,
        "Linux Kernel Network Developers" <netdev@...r.kernel.org>
Subject: Re: Soft lockup in inet_put_port on 4.6

On Tue, Dec 13, 2016 at 6:32 PM, Tom Herbert <tom@...bertland.com> 
wrote:
> On Tue, Dec 13, 2016 at 3:03 PM, Craig Gallek <kraigatgoog@...il.com> 
> wrote:
>>  On Tue, Dec 13, 2016 at 3:51 PM, Tom Herbert <tom@...bertland.com> 
>> wrote:
>>>  I think there may be some suspicious code in inet_csk_get_port. At
>>>  tb_found there is:
>>> 
>>>                  if (((tb->fastreuse > 0 && reuse) ||
>>>                       (tb->fastreuseport > 0 &&
>>>                        !rcu_access_pointer(sk->sk_reuseport_cb) &&
>>>                        sk->sk_reuseport && uid_eq(tb->fastuid, 
>>> uid))) &&
>>>                      smallest_size == -1)
>>>                          goto success;
>>>                  if (inet_csk(sk)->icsk_af_ops->bind_conflict(sk, 
>>> tb, true)) {
>>>                          if ((reuse ||
>>>                               (tb->fastreuseport > 0 &&
>>>                                sk->sk_reuseport &&
>>>                                
>>> !rcu_access_pointer(sk->sk_reuseport_cb) &&
>>>                                uid_eq(tb->fastuid, uid))) &&
>>>                              smallest_size != -1 && --attempts >= 
>>> 0) {
>>>                                  spin_unlock_bh(&head->lock);
>>>                                  goto again;
>>>                          }
>>>                          goto fail_unlock;
>>>                  }
>>> 
>>>  AFAICT there is redundancy in these two conditionals.  The same 
>>> clause
>>>  is being checked in both: (tb->fastreuseport > 0 &&
>>>  !rcu_access_pointer(sk->sk_reuseport_cb) && sk->sk_reuseport &&
>>>  uid_eq(tb->fastuid, uid))) && smallest_size == -1. If this is true 
>>> the
>>>  first conditional should be hit, goto done,  and the second will 
>>> never
>>>  evaluate that part to true-- unless the sk is changed (do we need
>>>  READ_ONCE for sk->sk_reuseport_cb?).
>>  That's an interesting point... It looks like this function also
>>  changed in 4.6 from using a single local_bh_disable() at the 
>> beginning
>>  with several spin_lock(&head->lock) to exclusively
>>  spin_lock_bh(&head->lock) at each locking point.  Perhaps the full 
>> bh
>>  disable variant was preventing the timers in your stack trace from
>>  running interleaved with this function before?
> 
> Could be, although dropping the lock shouldn't be able to affect the
> search state. TBH, I'm a little lost in reading function, the
> SO_REUSEPORT handling is pretty complicated. For instance,
> rcu_access_pointer(sk->sk_reuseport_cb) is checked three times in that
> function and also in every call to inet_csk_bind_conflict. I wonder if
> we can simply this under the assumption that SO_REUSEPORT is only
> allowed if the port number (snum) is explicitly specified.

Ok first I have data for you Hannes, here's the time distributions 
before during and after the lockup (with all the debugging in place the 
box eventually recovers).  I've attached it as a text file since it is 
long.

Second is I was thinking about why we would spend so much time doing 
the ->owners list, and obviously it's because of the massive amount of 
timewait sockets on the owners list.  I wrote the following dumb patch 
and tested it and the problem has disappeared completely.  Now I don't 
know if this is right at all, but I thought it was weird we weren't 
copying the soreuseport option from the original socket onto the twsk.  
Is there are reason we aren't doing this currently?  Does this help 
explain what is happening?  Thanks,

Josef

View attachment "timing-dist.txt" of type "text/plain" (30972 bytes)

View attachment "tw-reuseport.patch" of type "text/x-patch" (1230 bytes)