lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CABBYNZLRdu_f9eNEapPp5mNqgcUE0jby5VPpaMaArY_FjyjB8Q@mail.gmail.com>
Date: Mon, 16 Dec 2024 09:32:53 -0500
From: Luiz Augusto von Dentz <luiz.dentz@...il.com>
To: Cheng Jiang <quic_chejiang@...cinc.com>
Cc: Marcel Holtmann <marcel@...tmann.org>, Johan Hedberg <johan.hedberg@...il.com>, 
	"David S. Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>, 
	Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>, Simon Horman <horms@...nel.org>, 
	linux-bluetooth@...r.kernel.org, netdev@...r.kernel.org, 
	linux-kernel@...r.kernel.org, quic_jiaymao@...cinc.com, 
	quic_shuaz@...cinc.com, quic_zijuhu@...cinc.com, quic_mohamull@...cinc.com
Subject: Re: [PATCH v1] Bluetooth: hci_sync: Fix disconnect complete event
 timeout issue

Hi Cheng,

On Mon, Dec 16, 2024 at 3:08 AM Cheng Jiang <quic_chejiang@...cinc.com> wrote:
>
> Sometimes, the remote device doesn't acknowledge the LL_TERMINATE_IND
> in time, requiring the controller to wait for the supervision timeout,
> which may exceed 2 seconds. In the current implementation, the
> HCI_EV_DISCONN_COMPLETE event is ignored if it arrives late, since
> the hci_abort_conn_sync has cleaned up the connection after 2 seconds.
> This causes the mgmt to get stuck, resulting in bluetoothd waiting
> indefinitely for the mgmt response to the disconnect. To recover,
> restarting bluetoothd is necessary.
>
> bluetoothctl log like this:
> [Designer Mouse]# disconnect D9:B5:6C:F2:51:91
> Attempting to disconnect from D9:B5:6C:F2:51:91
> [Designer Mouse]#
> [Designer Mouse]# power off
> [Designer Mouse]#
> Failed to set power off: org.freedesktop.DBus.Error.NoReply.
>
> Signed-off-by: Cheng Jiang <quic_chejiang@...cinc.com>
> ---
>  include/net/bluetooth/hci_core.h |  2 ++
>  net/bluetooth/hci_conn.c         |  9 +++++++++
>  net/bluetooth/hci_event.c        |  9 +++++++++
>  net/bluetooth/hci_sync.c         | 18 ++++++++++++++++++
>  4 files changed, 38 insertions(+)
>
> diff --git a/include/net/bluetooth/hci_core.h b/include/net/bluetooth/hci_core.h
> index 734cd50cd..2ab079dcf 100644
> --- a/include/net/bluetooth/hci_core.h
> +++ b/include/net/bluetooth/hci_core.h
> @@ -753,6 +753,8 @@ struct hci_conn {
>
>         struct bt_codec codec;
>
> +       struct completion disc_ev_comp;
> +
>         void (*connect_cfm_cb)  (struct hci_conn *conn, u8 status);
>         void (*security_cfm_cb) (struct hci_conn *conn, u8 status);
>         void (*disconn_cfm_cb)  (struct hci_conn *conn, u8 reason);
> diff --git a/net/bluetooth/hci_conn.c b/net/bluetooth/hci_conn.c
> index d097e308a..e0244e191 100644
> --- a/net/bluetooth/hci_conn.c
> +++ b/net/bluetooth/hci_conn.c
> @@ -1028,6 +1028,15 @@ static struct hci_conn *__hci_conn_add(struct hci_dev *hdev, int type, bdaddr_t
>
>         hci_conn_init_sysfs(conn);
>
> +       /* This disc_ev_comp is inited when we send a disconnect request to
> +        * the remote device but fail to receive the disconnect complete
> +        * event within the expected time (2 seconds). This occurs because
> +        * the remote device doesn't ack the terminate indication, forcing
> +        * the controller to wait for the supervision timeout.
> +        */
> +       init_completion(&conn->disc_ev_comp);
> +       complete(&conn->disc_ev_comp);
> +
>         return conn;
>  }
>
> diff --git a/net/bluetooth/hci_event.c b/net/bluetooth/hci_event.c
> index 2cc7a9306..60ecb2b18 100644
> --- a/net/bluetooth/hci_event.c
> +++ b/net/bluetooth/hci_event.c
> @@ -3366,6 +3366,15 @@ static void hci_disconn_complete_evt(struct hci_dev *hdev, void *data,
>         if (!conn)
>                 goto unlock;
>
> +       /* Wake up disc_ev_comp here is ok. Since we hold the hdev lock
> +        * hci_abort_conn_sync will wait hdev lock release to continue.
> +        */
> +       if (!completion_done(&conn->disc_ev_comp)) {
> +               complete(&conn->disc_ev_comp);
> +               /* Add some delay for hci_abort_conn_sync to handle the complete */
> +               usleep_range(100, 1000);
> +       }
> +
>         if (ev->status) {
>                 mgmt_disconnect_failed(hdev, &conn->dst, conn->type,
>                                        conn->dst_type, ev->status);
> diff --git a/net/bluetooth/hci_sync.c b/net/bluetooth/hci_sync.c
> index 0badec712..783d04b57 100644
> --- a/net/bluetooth/hci_sync.c
> +++ b/net/bluetooth/hci_sync.c
> @@ -5590,6 +5590,24 @@ int hci_abort_conn_sync(struct hci_dev *hdev, struct hci_conn *conn, u8 reason)
>                 break;
>         }
>
> +       /* Check whether the connection is successfully disconnected.
> +        * Sometimes the remote device doesn't acknowledge the
> +        * LL_TERMINATE_IND in time, requiring the controller to wait
> +        * for the supervision timeout, which may exceed 2 seconds. In
> +        * this case, we need to wait for the HCI_EV_DISCONN_COMPLETE
> +        * event before cleaning up the connection.
> +        */
> +       if (err == -ETIMEDOUT) {
> +               u32 idle_delay = msecs_to_jiffies(10 * conn->le_supv_timeout);
> +
> +               reinit_completion(&conn->disc_ev_comp);
> +               if (!wait_for_completion_timeout(&conn->disc_ev_comp, idle_delay)) {
> +                       bt_dev_warn(hdev, "Failed to get complete");
> +                       mgmt_disconnect_failed(hdev, &conn->dst, conn->type,
> +                                              conn->dst_type, conn->abort_reason);
> +               }
> +       }

Why don't we just set the supervision timeout as timeout then? If we
will have to wait for it anyway just change hci_disconnect_sync to use
10 * conn->le_supv_timeout as timeout instead.

That said, we really need to fix bluetoothd if it is not able to be
cleaned up if SET_POWERED command fails, but it looks like it is
handling errors correctly so it sounds like something else is at play.

>         hci_dev_lock(hdev);
>
>         /* Check if the connection has been cleaned up concurrently */
>
> base-commit: e25c8d66f6786300b680866c0e0139981273feba
> --
> 2.34.1
>


-- 
Luiz Augusto von Dentz

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ