lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 1 Nov 2022 16:29:07 -0700
From:   Martin KaFai Lau <martin.lau@...ux.dev>
To:     Kuniyuki Iwashima <kuniyu@...zon.com>,
        Joanne Koong <joannelkoong@...il.com>
Cc:     Kuniyuki Iwashima <kuni1840@...il.com>, netdev@...r.kernel.org,
        "David S. Miller" <davem@...emloft.net>,
        Eric Dumazet <edumazet@...gle.com>,
        Paolo Abeni <pabeni@...hat.com>,
        Jakub Kicinski <kuba@...nel.org>,
        Mat Martineau <mathew.j.martineau@...ux.intel.com>,
        Martin KaFai Lau <martin.lau@...nel.org>
Subject: Re: [RFC] bhash2 and WARN_ON() for inconsistent sk saddr.

On 10/28/22 5:12 PM, Kuniyuki Iwashima wrote:
> Hi,
> 
> I want to discuss bhash2 and WARN_ON() being fired every day this month
> on my syzkaller instance without repro.
> 
>    WARNING: CPU: 0 PID: 209 at net/ipv4/inet_connection_sock.c:548 inet_csk_get_port (net/ipv4/inet_connection_sock.c:548 (discriminator 1))
>    ...
>    inet_csk_listen_start (net/ipv4/inet_connection_sock.c:1205)
>    inet_listen (net/ipv4/af_inet.c:228)
>    __sys_listen (net/socket.c:1810)
>    __x64_sys_listen (net/socket.c:1819 net/socket.c:1817 net/socket.c:1817)
>    do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
>    entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)
> 
> For the very first implementation of bhash2, there was a similar report
> hitting the same WARN_ON().  The fix was to update the bhash2 bucket when
> the kernel changes sk->sk_rcv_saddr from INADDR_ANY.  Then, syzbot has a
> repro, so we can indeed confirm that it no longer triggers the warning on
> the latest kernel.
> 
>    https://lore.kernel.org/netdev/0000000000003f33bc05dfaf44fe@google.com/
> 
> However, Mat reported at that time that there were at least two variants,
> the latter being the same as mine.
> 
>    https://lore.kernel.org/netdev/4bae9df4-42c1-85c3-d350-119a151d29@linux.intel.com/
>    https://lore.kernel.org/netdev/23d8e9f4-016-7de1-9737-12c3146872ca@linux.intel.com/
> 
> This week I started looking into this issue and finally figured out
> why we could not catch all cases with a single repro.
> 
> Please see the source addresses of s2/s3 below after connect() fails.
> The s2 case is another variant of the first syzbot report, which has
> been already _fixed_.  And the s3 case is a repro for the issue that
> Mat and I saw.
> 
>    # sysctl -w net.ipv4.tcp_syn_retries=1
>    net.ipv4.tcp_syn_retries = 1
>    # python3
>    >>> from socket import *
>    >>>
>    >>> s1 = socket()
>    >>> s1.setsockopt(SOL_SOCKET, SO_REUSEADDR, 1)
>    >>> s1.bind(('0.0.0.0', 10000))
>    >>> s1.connect(('127.0.0.1', 10000))
>    >>>
>    >>> s2 = socket()
>    >>> s2.setsockopt(SOL_SOCKET, SO_REUSEADDR, 1)
>    >>> s2.bind(('0.0.0.0', 10000))
>    >>> s2
>    <socket.socket fd=4, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('0.0.0.0', 10000)>
>    >>>
>    >>> s2.connect(('127.0.0.1', 10000))
>    Traceback (most recent call last):
>      File "<stdin>", line 1, in <module>
>    OSError: [Errno 99] Cannot assign requested address
>    >>>
>    >>> s2
>    <socket.socket fd=4, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('127.0.0.1', 10000)>
>                                                                                                     ^^^ ???
>    >>> s3 = socket()
>    >>> s3.setsockopt(SOL_SOCKET, SO_REUSEADDR, 1)
>    >>> s3.bind(('0.0.0.0', 10000))
>    >>> s3
>    <socket.socket fd=5, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('0.0.0.0', 10000)>
>    >>>
>    >>> s3.connect(('0.0.0.1', 1))
>    Traceback (most recent call last):
>      File "<stdin>", line 1, in <module>
>    TimeoutError: [Errno 110] Connection timed out
>    >>>
>    >>> s3
>    <socket.socket fd=5, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('0.0.0.0', 10000)>
> 
> We can fire the WARN_ON() by s3.listen().
> 
>    >>> s3.listen()
>    [ 1096.845905] ------------[ cut here ]------------
>    [ 1096.846290] WARNING: CPU: 0 PID: 209 at net/ipv4/inet_connection_sock.c:548 inet_csk_get_port+0x6bb/0x9e0
> 
> In the s3 case, connect() resets sk->sk_rcv_saddr to INADDR_ANY without
> updating the bhash2 bucket; OTOH sk has the correct non-NULL bhash bucket.
> So, when we call listen() for s3, inet_csk_get_port() does not call
> inet_bind_hash() and the WARN_ON() for bhash2 fires.
> 
>    if (!inet_csk(sk)->icsk_bind_hash)
>    	inet_bind_hash(sk, tb, tb2, port);
>    WARN_ON(inet_csk(sk)->icsk_bind_hash != tb);
>    WARN_ON(inet_csk(sk)->icsk_bind2_hash != tb2);
> 
> Here I think the s2 case also _should_ trigger WARN_ON().  The issue
> seems to be fixed, but it's just because we forgot to _fix_ the source
> address in error paths after inet6?_hash_connect() in tcp_v[46]_connect().
> (Of course, DCCP as well).
> 
> In the s3 case, connect() falls into a different path.  We reach the
> sock_error label in __inet_stream_connect() and call sk_prot->disconnect(),
> which resets sk->sk_rcv_saddr.
> 
> Then, there could be two subsequent issues.
> 
>    * listen() leaks a newly allocated bhash2 bucket
> 
>    * In inet_put_port(), inet_bhashfn_portaddr() computes a wrong hash for
>      inet_csk(sk)->icsk_bind2_hash, resulting in list corruption.
> 
> We can fix these easily but it still leaves inconsistent sockets in bhash2,
> so we need to fix the root cause.
> 
> As a side note, this issue only happens when we bind() only port before
> connect().  If we do not bind() port, tcp_set_state(sk, TCP_CLOSE) calls
> inet_put_port() and unhashes the sk from bhash2.
> 
> 
> At first, I applied the patch below so that both sk2 and sk3 trigger
> WARN_ON().  Then, I tried two approaches:
> 
>    * Fix up the bhash2 entry when calling sk_rcv_saddr
> 
>    * Change the bhash2 entry only when connect() succeeds
> 
> The former does not work when we run out of memory.  When we change saddr
> from INADDR_ANY, inet_bhash2_update_saddr() could free (INADDR_ANY, port)
> bhash2 bucket.  Then, we possibly could not allocate it again when
> restoring saddr in the failure path.
> 
> The latter does not work when a sk is in non-blocking mode.  In this case,
> a user might not call the second connect() to fix up the bhash2 bucket.
> 
>    >>> s4 = socket()
>    >>> s4.bind(('0.0.0.0', 10000))
>    >>> s4.setblocking(False)
>    >>> s4
>    <socket.socket fd=3, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('0.0.0.0', 10000)>
> 
>    >>> s4.connect(('0.0.0.1', 1))
>    Traceback (most recent call last):
>      File "<stdin>", line 1, in <module>
>    BlockingIOError: [Errno 115] Operation now in progress
>    >>> s4
>    <socket.socket fd=3, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('10.0.2.15', 10000)>
> 
> Also, the former approach does not work for the non-blocking case.  Let's
> say the second connect() fails.  What if we fail to allocate an INADDR_ANY
> bhash2 bucket?  We have to change saddr to INADDR_ANY but cannot.... but
> the connect() actually failed....??
> 
>    >>> s4.connect(('0.0.0.1', 1))
>    Traceback (most recent call last):
>      File "<stdin>", line 1, in <module>
>    ConnectionRefusedError: [Errno 111] Connection refused
>    >>> s4
>    <socket.socket fd=3, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('0.0.0.0', 10000)>
> 
> 
> Now, I'm thinking bhash2 bucket needs a refcnt not to be freed while
> refcnt is greater than 1.  And we need to change the conflict logic
> so that the kernel ignores empty bhash2 bucket.  Such changes could
> be big for the net tree, but the next LTS will likely be v6.1 which
> has bhash2.
> 
> What do you think is the best way to fix the issue?

Thanks for the repro script and the detailed analysis on the issue.  iiuc, this 
is limited to the sk that was bind() to ADDR_ANY:port-1234 (or 
!SOCK_BINDADDR_LOCK).  Does it make sense to avoid adding the sk with 
ADDR_ANY:port-1234 to bhash2 at all?  From inet_use_bhash2_on_bind(), it does 
not seem ADDR_ANY will use bhash2.  or I have missed some cases?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ