lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <fa269649-21eb-be76-e552-36a3aa4f3da4@gmail.com>
Date:   Thu, 2 Sep 2021 18:53:53 -0400
From:   Desmond Cheong Zhi Xi <desmondcheongzx@...il.com>
To:     Eric Dumazet <eric.dumazet@...il.com>, marcel@...tmann.org,
        johan.hedberg@...il.com, luiz.dentz@...il.com, davem@...emloft.net,
        kuba@...nel.org, sudipm.mukherjee@...il.com
Cc:     linux-bluetooth@...r.kernel.org, netdev@...r.kernel.org,
        linux-kernel@...r.kernel.org, skhan@...uxfoundation.org,
        gregkh@...uxfoundation.org,
        linux-kernel-mentees@...ts.linuxfoundation.org,
        syzbot+2f6d7c28bb4bf7e82060@...kaller.appspotmail.com
Subject: Re: [PATCH v6 1/6] Bluetooth: schedule SCO timeouts with delayed_work

On 2/9/21 5:41 pm, Eric Dumazet wrote:
> 
> 
> On 9/2/21 12:32 PM, Desmond Cheong Zhi Xi wrote:
>>
>> Hi Eric,
>>
>> This actually seems to be a pre-existing error in sco_sock_connect that we now hit in sco_sock_timeout.
>>
>> Any thoughts on the following patch to address the problem?
>>
>> Link: https://lore.kernel.org/lkml/20210831065601.101185-1-desmondcheongzx@gmail.com/
> 
> 
> syzbot is still working on finding a repro, this is obviously not trivial,
> because this is a race window.
> 
> I think this can happen even with a single SCO connection.
> 
> This might be triggered more easily forcing a delay in sco_sock_timeout()
> 
> diff --git a/net/bluetooth/sco.c b/net/bluetooth/sco.c
> index 98a88158651281c9f75c4e0371044251e976e7ef..71ebe0243fab106c676c308724fe3a3f92a62cbd 100644
> --- a/net/bluetooth/sco.c
> +++ b/net/bluetooth/sco.c
> @@ -84,8 +84,14 @@ static void sco_sock_timeout(struct work_struct *work)
>   
>          sco_conn_lock(conn);
>          sk = conn->sk;
> -       if (sk)
> +       if (sk) {
> +               // lets pretend cpu has been busy (in interrupts) for 100ms
> +               int i;
> +               for (i=0;i<100000;i++)
> +                       udelay(1);
> +
>                  sock_hold(sk);
> +       }>          sco_conn_unlock(conn);
>   
>          if (!sk)
> 
> 
> Stack trace tells us that sco_sock_timeout() is running after last reference
> on socket has been released.
> 
> __refcount_add include/linux/refcount.h:199 [inline]
>   __refcount_inc include/linux/refcount.h:250 [inline]
>   refcount_inc include/linux/refcount.h:267 [inline]
>   sock_hold include/net/sock.h:702 [inline]
>   sco_sock_timeout+0x216/0x290 net/bluetooth/sco.c:88
>   process_one_work+0x98d/0x1630 kernel/workqueue.c:2276
>   worker_thread+0x658/0x11f0 kernel/workqueue.c:2422
>   kthread+0x3e5/0x4d0 kernel/kthread.c:319
>   ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
> 
> This is why I suggested to delay sock_put() to make sure this can not happen.
> 
> diff --git a/net/bluetooth/sco.c b/net/bluetooth/sco.c
> index 98a88158651281c9f75c4e0371044251e976e7ef..bd0222e3f05a6bcb40cffe8405c9dfff98d7afde 100644
> --- a/net/bluetooth/sco.c
> +++ b/net/bluetooth/sco.c
> @@ -195,10 +195,11 @@ static void sco_conn_del(struct hci_conn *hcon, int err)
>                  sco_sock_clear_timer(sk);
>                  sco_chan_del(sk, err);
>                  release_sock(sk);
> -               sock_put(sk);
>   
>                  /* Ensure no more work items will run before freeing conn. */
>                  cancel_delayed_work_sync(&conn->timeout_work);
> +
> +               sock_put(sk);
>          }
>   
>          hcon->sco_data = NULL;
> 

I see where you're going with this, but once sco_chan_del returns, any
instance of sco_sock_timeout that hasn't yet called sock_hold will
simply return, because conn->sk is NULL. Adding a delay to the
sco_conn_lock critical section in sco_sock_timeout would not affect this
because sco_chan_del clears conn->sk while holding onto the lock.

The main reason that cancel_delayed_work_sync is run there is to make
sure that we don't have a UAF on the SCO connection itself after we free
conn.

For a single SCO connection with well-formed channel, I think there
can't be a race. Here's the reasoning:

- For the timeout to be scheduled, a socket must have a channel with a
connection.

- When a channel between a socket and connection is established, the
socket transitions from BT_OPEN to BT_CONNECTED, BT_CONNECT, or
BT_CONNECT2.

- For a socket to be released, it has to be zapped. For sockets that
have a state of BT_CONNECTED, BT_CONNECT, or BT_CONNECT2, they are
zapped only when the channel is deleted.

- If the channel is deleted (which is protected by sco_conn_lock), then
conn->sk is NULL, and sco_sock_timeout simply exits. If we had entered
the critical section in sco_sock_timeout before the channel was deleted,
then we increased the reference count on the socket, so it won't be
freed until sco_sock_timeout is done.

Hence, sco_sock_timeout doesn't race with the release of a socket that
has a well-formed channel with a connection.

But if multiple connections are allocated and overwritten in
sco_sock_connect, then none of the above assumptions hold because the
SCO connection can't be cleaned up (i.e. conn->sk cannot be set to NULL)
when the associated socket is released. This scenario happens in the
syzbot reproducer for the crash here:
https://syzkaller.appspot.com/bug?id=bcc246d137428d00ed14b476c2068579515fe2bc

That aside, upon taking a closer look, I think there is indeed a race
lurking in sco_conn_del, but it's not the one that syzbot is hitting.
Our sock_hold simply comes too late, and by the time it's called we
might have already have freed the socket.

So probably something like this needs to happen:

diff --git a/net/bluetooth/sco.c b/net/bluetooth/sco.c
index fa25b07120c9..ea18e5b56343 100644
--- a/net/bluetooth/sco.c
+++ b/net/bluetooth/sco.c
@@ -187,10 +187,11 @@ static void sco_conn_del(struct hci_conn *hcon, int err)
  	/* Kill socket */
  	sco_conn_lock(conn);
  	sk = conn->sk;
+	if (sk)
+		sock_hold(sk);
  	sco_conn_unlock(conn);
  
  	if (sk) {
-		sock_hold(sk);
  		lock_sock(sk);
  		sco_sock_clear_timer(sk);
  		sco_chan_del(sk, err);

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ