netdev - Re: Soft lockup in inet_put

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aca378b5-aae4-49cf-0542-d1167a5768b8@stressinduktion.org>
Date:   Fri, 16 Dec 2016 01:07:41 +0100
From:   Hannes Frederic Sowa <hannes@...essinduktion.org>
To:     Josef Bacik <jbacik@...com>, Tom Herbert <tom@...bertland.com>
Cc:     Craig Gallek <kraigatgoog@...il.com>,
        Eric Dumazet <eric.dumazet@...il.com>,
        Linux Kernel Network Developers <netdev@...r.kernel.org>
Subject: Re: Soft lockup in inet_put_port on 4.6

Hi Josef,

On 15.12.2016 19:53, Josef Bacik wrote:
> On Tue, Dec 13, 2016 at 6:32 PM, Tom Herbert <tom@...bertland.com> wrote:
>> On Tue, Dec 13, 2016 at 3:03 PM, Craig Gallek <kraigatgoog@...il.com>
>> wrote:
>>>  On Tue, Dec 13, 2016 at 3:51 PM, Tom Herbert <tom@...bertland.com>
>>> wrote:
>>>>  I think there may be some suspicious code in inet_csk_get_port. At
>>>>  tb_found there is:
>>>>
>>>>                  if (((tb->fastreuse > 0 && reuse) ||
>>>>                       (tb->fastreuseport > 0 &&
>>>>                        !rcu_access_pointer(sk->sk_reuseport_cb) &&
>>>>                        sk->sk_reuseport && uid_eq(tb->fastuid,
>>>> uid))) &&
>>>>                      smallest_size == -1)
>>>>                          goto success;
>>>>                  if (inet_csk(sk)->icsk_af_ops->bind_conflict(sk,
>>>> tb, true)) {
>>>>                          if ((reuse ||
>>>>                               (tb->fastreuseport > 0 &&
>>>>                                sk->sk_reuseport &&
>>>>                               
>>>> !rcu_access_pointer(sk->sk_reuseport_cb) &&
>>>>                                uid_eq(tb->fastuid, uid))) &&
>>>>                              smallest_size != -1 && --attempts >= 0) {
>>>>                                  spin_unlock_bh(&head->lock);
>>>>                                  goto again;
>>>>                          }
>>>>                          goto fail_unlock;
>>>>                  }
>>>>
>>>>  AFAICT there is redundancy in these two conditionals.  The same clause
>>>>  is being checked in both: (tb->fastreuseport > 0 &&
>>>>  !rcu_access_pointer(sk->sk_reuseport_cb) && sk->sk_reuseport &&
>>>>  uid_eq(tb->fastuid, uid))) && smallest_size == -1. If this is true the
>>>>  first conditional should be hit, goto done,  and the second will never
>>>>  evaluate that part to true-- unless the sk is changed (do we need
>>>>  READ_ONCE for sk->sk_reuseport_cb?).
>>>  That's an interesting point... It looks like this function also
>>>  changed in 4.6 from using a single local_bh_disable() at the beginning
>>>  with several spin_lock(&head->lock) to exclusively
>>>  spin_lock_bh(&head->lock) at each locking point.  Perhaps the full bh
>>>  disable variant was preventing the timers in your stack trace from
>>>  running interleaved with this function before?
>>
>> Could be, although dropping the lock shouldn't be able to affect the
>> search state. TBH, I'm a little lost in reading function, the
>> SO_REUSEPORT handling is pretty complicated. For instance,
>> rcu_access_pointer(sk->sk_reuseport_cb) is checked three times in that
>> function and also in every call to inet_csk_bind_conflict. I wonder if
>> we can simply this under the assumption that SO_REUSEPORT is only
>> allowed if the port number (snum) is explicitly specified.
> 
> Ok first I have data for you Hannes, here's the time distributions
> before during and after the lockup (with all the debugging in place the
> box eventually recovers).  I've attached it as a text file since it is
> long.

Thanks a lot!

> Second is I was thinking about why we would spend so much time doing the
> ->owners list, and obviously it's because of the massive amount of
> timewait sockets on the owners list.  I wrote the following dumb patch
> and tested it and the problem has disappeared completely.  Now I don't
> know if this is right at all, but I thought it was weird we weren't
> copying the soreuseport option from the original socket onto the twsk. 
> Is there are reason we aren't doing this currently?  Does this help
> explain what is happening?  Thanks,

The patch is interesting and a good clue, but I am immediately a bit
concerned that we don't copy/tag the socket with the uid also to keep
the security properties for SO_REUSEPORT. I have to think a bit more
about this.

We have seen hangs during connect. I am afraid this patch wouldn't help
there while also guaranteeing uniqueness.

Thanks a lot!