lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <bb9505d6-e8ae-47dc-a1e0-6d1974dc12ac@quicinc.com>
Date: Tue, 17 Dec 2024 10:49:00 +0800
From: "Cheng Jiang (IOE)" <quic_chejiang@...cinc.com>
To: Luiz Augusto von Dentz <luiz.dentz@...il.com>
CC: Marcel Holtmann <marcel@...tmann.org>,
        Johan Hedberg
	<johan.hedberg@...il.com>,
        "David S. Miller" <davem@...emloft.net>,
        "Eric
 Dumazet" <edumazet@...gle.com>,
        Jakub Kicinski <kuba@...nel.org>, Paolo Abeni
	<pabeni@...hat.com>,
        Simon Horman <horms@...nel.org>, <linux-bluetooth@...r.kernel.org>,
        <netdev@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
        <quic_jiaymao@...cinc.com>, <quic_shuaz@...cinc.com>,
        <quic_zijuhu@...cinc.com>, <quic_mohamull@...cinc.com>
Subject: Re: [PATCH v1] Bluetooth: hci_sync: Fix disconnect complete event
 timeout issue

Hi Luiz,

On 12/16/2024 10:42 PM, Luiz Augusto von Dentz wrote:
> Hi Cheng,
> 
> On Mon, Dec 16, 2024 at 9:32 AM Luiz Augusto von Dentz
> <luiz.dentz@...il.com> wrote:
>>
>> Hi Cheng,
>>
>> On Mon, Dec 16, 2024 at 3:08 AM Cheng Jiang <quic_chejiang@...cinc.com> wrote:
>>>
>>> Sometimes, the remote device doesn't acknowledge the LL_TERMINATE_IND
>>> in time, requiring the controller to wait for the supervision timeout,
>>> which may exceed 2 seconds. In the current implementation, the
>>> HCI_EV_DISCONN_COMPLETE event is ignored if it arrives late, since
>>> the hci_abort_conn_sync has cleaned up the connection after 2 seconds.
>>> This causes the mgmt to get stuck, resulting in bluetoothd waiting
>>> indefinitely for the mgmt response to the disconnect. To recover,
>>> restarting bluetoothd is necessary.
>>>
>>> bluetoothctl log like this:
>>> [Designer Mouse]# disconnect D9:B5:6C:F2:51:91
>>> Attempting to disconnect from D9:B5:6C:F2:51:91
>>> [Designer Mouse]#
>>> [Designer Mouse]# power off
>>> [Designer Mouse]#
>>> Failed to set power off: org.freedesktop.DBus.Error.NoReply.
>>>
>>> Signed-off-by: Cheng Jiang <quic_chejiang@...cinc.com>
>>> ---
>>>  include/net/bluetooth/hci_core.h |  2 ++
>>>  net/bluetooth/hci_conn.c         |  9 +++++++++
>>>  net/bluetooth/hci_event.c        |  9 +++++++++
>>>  net/bluetooth/hci_sync.c         | 18 ++++++++++++++++++
>>>  4 files changed, 38 insertions(+)
>>>
>>> diff --git a/include/net/bluetooth/hci_core.h b/include/net/bluetooth/hci_core.h
>>> index 734cd50cd..2ab079dcf 100644
>>> --- a/include/net/bluetooth/hci_core.h
>>> +++ b/include/net/bluetooth/hci_core.h
>>> @@ -753,6 +753,8 @@ struct hci_conn {
>>>
>>>         struct bt_codec codec;
>>>
>>> +       struct completion disc_ev_comp;
>>> +
>>>         void (*connect_cfm_cb)  (struct hci_conn *conn, u8 status);
>>>         void (*security_cfm_cb) (struct hci_conn *conn, u8 status);
>>>         void (*disconn_cfm_cb)  (struct hci_conn *conn, u8 reason);
>>> diff --git a/net/bluetooth/hci_conn.c b/net/bluetooth/hci_conn.c
>>> index d097e308a..e0244e191 100644
>>> --- a/net/bluetooth/hci_conn.c
>>> +++ b/net/bluetooth/hci_conn.c
>>> @@ -1028,6 +1028,15 @@ static struct hci_conn *__hci_conn_add(struct hci_dev *hdev, int type, bdaddr_t
>>>
>>>         hci_conn_init_sysfs(conn);
>>>
>>> +       /* This disc_ev_comp is inited when we send a disconnect request to
>>> +        * the remote device but fail to receive the disconnect complete
>>> +        * event within the expected time (2 seconds). This occurs because
>>> +        * the remote device doesn't ack the terminate indication, forcing
>>> +        * the controller to wait for the supervision timeout.
>>> +        */
>>> +       init_completion(&conn->disc_ev_comp);
>>> +       complete(&conn->disc_ev_comp);
>>> +
>>>         return conn;
>>>  }
>>>
>>> diff --git a/net/bluetooth/hci_event.c b/net/bluetooth/hci_event.c
>>> index 2cc7a9306..60ecb2b18 100644
>>> --- a/net/bluetooth/hci_event.c
>>> +++ b/net/bluetooth/hci_event.c
>>> @@ -3366,6 +3366,15 @@ static void hci_disconn_complete_evt(struct hci_dev *hdev, void *data,
>>>         if (!conn)
>>>                 goto unlock;
>>>
>>> +       /* Wake up disc_ev_comp here is ok. Since we hold the hdev lock
>>> +        * hci_abort_conn_sync will wait hdev lock release to continue.
>>> +        */
>>> +       if (!completion_done(&conn->disc_ev_comp)) {
>>> +               complete(&conn->disc_ev_comp);
>>> +               /* Add some delay for hci_abort_conn_sync to handle the complete */
>>> +               usleep_range(100, 1000);
>>> +       }
>>> +
>>>         if (ev->status) {
>>>                 mgmt_disconnect_failed(hdev, &conn->dst, conn->type,
>>>                                        conn->dst_type, ev->status);
>>> diff --git a/net/bluetooth/hci_sync.c b/net/bluetooth/hci_sync.c
>>> index 0badec712..783d04b57 100644
>>> --- a/net/bluetooth/hci_sync.c
>>> +++ b/net/bluetooth/hci_sync.c
>>> @@ -5590,6 +5590,24 @@ int hci_abort_conn_sync(struct hci_dev *hdev, struct hci_conn *conn, u8 reason)
>>>                 break;
>>>         }
>>>
>>> +       /* Check whether the connection is successfully disconnected.
>>> +        * Sometimes the remote device doesn't acknowledge the
>>> +        * LL_TERMINATE_IND in time, requiring the controller to wait
>>> +        * for the supervision timeout, which may exceed 2 seconds. In
>>> +        * this case, we need to wait for the HCI_EV_DISCONN_COMPLETE
>>> +        * event before cleaning up the connection.
>>> +        */
>>> +       if (err == -ETIMEDOUT) {
>>> +               u32 idle_delay = msecs_to_jiffies(10 * conn->le_supv_timeout);
>>> +
>>> +               reinit_completion(&conn->disc_ev_comp);
>>> +               if (!wait_for_completion_timeout(&conn->disc_ev_comp, idle_delay)) {
>>> +                       bt_dev_warn(hdev, "Failed to get complete");
>>> +                       mgmt_disconnect_failed(hdev, &conn->dst, conn->type,
>>> +                                              conn->dst_type, conn->abort_reason);
>>> +               }
>>> +       }
>>
>> Why don't we just set the supervision timeout as timeout then? If we
>> will have to wait for it anyway just change hci_disconnect_sync to use
>> 10 * conn->le_supv_timeout as timeout instead.
>>
hci_disconnect_sync uses __hci_cmd_sync_status_sk to wait for the 
HCI_EV_DISCONN_COMPLETE event, which will send the command in hci_cmd_work. 
In hci_cmd_work, it will start a timer with HCI_CMD_TIMEOUT(2s) to wait 
for the event. So even in hci_disconnect_sync we can set the timeout to
supervision timeout, hci_disconnect_sync still timeout after 2s. 

>> That said, we really need to fix bluetoothd if it is not able to be
>> cleaned up if SET_POWERED command fails, but it looks like it is
>> handling errors correctly so it sounds like something else is at play.
> 
The issue arises after a 2-second timeout of hci_disconnect_sync. During 
hci_abort_conn_sync, the connection is cleaned up by hci_conn_failed. 
after supervision timeout, the disconnect complete event arrives, but 
it returns at line 3370 since the connection has already been removed. 
As a result, bluetoothd does not send the mgmt event for MGMT_OP_DISCONNECT 
to the application layer (bluetoothctl), causing bluetoothctl to get stuck 
and unable to perform other mgmt commands.


3355 static void hci_disconn_complete_evt(struct hci_dev *hdev, void *data,
3356              struct sk_buff *skb)
3357 {
3358   struct hci_ev_disconn_complete *ev = data;
3359   u8 reason;
3360   struct hci_conn_params *params;
3361   struct hci_conn *conn;
3362   bool mgmt_connected;
3363
3364   bt_dev_dbg(hdev, "status 0x%2.2x", ev->status);
3365
3366   hci_dev_lock(hdev);
3367
3368   conn = hci_conn_hash_lookup_handle(hdev, __le16_to_cpu(ev->handle));
3369   if (!conn)
3370     goto unlock;
3371
3372   if (ev->status) {
3373     mgmt_disconnect_failed(hdev, &conn->dst, conn->type,
3374                conn->dst_type, ev->status);
3375     goto unlock;
3376   }
3377
3378   conn->state = BT_CLOSED;
3379
3380   mgmt_connected = test_and_clear_bit(HCI_CONN_MGMT_CONNECTED, &conn->flags);
3381

> I double checked this and apparently this could no longer fail:
> 
> +               /* Disregard possible errors since hci_conn_del shall have been
> +                * called even in case of errors had occurred since it would
> +                * then cause hci_conn_failed to be called which calls
> +                * hci_conn_del internally.
> +                */
> +               hci_abort_conn_sync(hdev, conn, reason);
> 
> So it will clean up the hci_conn no matter what is the timeout, so
> either you don't have this change or it is not working for some
> reason.
> 
The issue is mgmt command is not repsonsed by bluetoothd, then the bluetootlctl is 
blocked. It does not happen during the power off stage. It happened when disconnect 
a BLE device, but the disconnect complete event sent from controller to host 2s later.
Then it causes the mgmt in bluetoothctl is blocked as decribed as above. 

>>>         hci_dev_lock(hdev);
>>>
>>>         /* Check if the connection has been cleaned up concurrently */
>>>
>>> base-commit: e25c8d66f6786300b680866c0e0139981273feba
>>> --
>>> 2.34.1
>>>
>>
>>
>> --
>> Luiz Augusto von Dentz
> 
> 
> 


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ