lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <DC038B82-5539-48D5-84BE-4575F2E794AD@holtmann.org>
Date:   Sat, 20 Mar 2021 18:33:26 +0100
From:   Marcel Holtmann <marcel@...tmann.org>
To:     Manish Mandlik <mmandlik@...gle.com>
Cc:     Luiz Augusto von Dentz <luiz.dentz@...il.com>,
        Alain Michaud <alainm@...omium.org>,
        CrosBT Upstreaming <chromeos-bluetooth-upstreaming@...omium.org>,
        Bluetooth Kernel Mailing List 
        <linux-bluetooth@...r.kernel.org>,
        Abhishek Pandit-Subedi <abhishekpandit@...omium.org>,
        "David S. Miller" <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>,
        Johan Hedberg <johan.hedberg@...il.com>,
        LKML <linux-kernel@...r.kernel.org>, netdev@...r.kernel.org
Subject: Re: [PATCH v1] Bluetooth: Add ncmd=0 recovery handling

Hi Manish,

> During command status or command complete event, the controller may set
> ncmd=0 indicating that it is not accepting any more commands. In such a
> case, host holds off sending any more commands to the controller. If the
> controller doesn't recover from such condition, host will wait forever.
> 
> This patch adds a timer when controller gets into such condition and
> resets the controller if controller doesn't recover within the timeout
> period.
> 
> Reviewed-by: Abhishek Pandit-Subedi <abhishekpandit@...omium.org>
> Signed-off-by: Manish Mandlik <mmandlik@...gle.com>
> ---
> Hello Maintainers,
> 
> We noticed that during suspend, sometimes the controller firmware gets
> into a state where it is not accepting any more commands (it returns
> ncmd=0 in Command Status):
> 
> < HCI Command: Disconnect (0x01|0x0006) plen 3  #398 [hci0] 83.760502
>         Handle: 1
>         Reason: Remote Device Terminated due to Power Off (0x15)
>> HCI Event: Command Status (0x0f) plen 4       #399 [hci0] 83.761694
>       Disconnect (0x01|0x0006) ncmd 0
>         Status: Success (0x00)
> 
> In such a case, the host holds off sending any more packets to the
> controller until it is ready to accept more commands. If the controller
> doesn't recover from such a condition, Command Timeout does not get
> triggered as Command Timeout is queued only once the packet is sent to
> the controller; hence, the host will wait forever. 
> 
> This patch adds a timer to recover from this condition. Since the
> suspend timeout is 2 seconds, I'm using 4 seconds timeout to recover
> from ncmd=0. This should give ample amount of time for recovery and
> should not create any race conditions with the suspend. Once we resume
> from the suspend normally, the timer would expire and reset the
> controller. I have verified this patch locally and able to connect to
> peer device after resume from suspend. Please let me know your thoughts
> on this.
> 
> Thanks,
> Manish.
> 
> include/net/bluetooth/hci.h      |  1 +
> include/net/bluetooth/hci_core.h |  1 +
> net/bluetooth/hci_core.c         | 15 +++++++++++++++
> net/bluetooth/hci_event.c        | 10 ++++++++++
> 4 files changed, 27 insertions(+)
> 
> diff --git a/include/net/bluetooth/hci.h b/include/net/bluetooth/hci.h
> index ea4ae551c426..c4b0650fb9ae 100644
> --- a/include/net/bluetooth/hci.h
> +++ b/include/net/bluetooth/hci.h
> @@ -339,6 +339,7 @@ enum {
> #define HCI_PAIRING_TIMEOUT	msecs_to_jiffies(60000)	/* 60 seconds */
> #define HCI_INIT_TIMEOUT	msecs_to_jiffies(10000)	/* 10 seconds */
> #define HCI_CMD_TIMEOUT		msecs_to_jiffies(2000)	/* 2 seconds */
> +#define HCI_NCMD_TIMEOUT	msecs_to_jiffies(4000)	/* 4 seconds */
> #define HCI_ACL_TX_TIMEOUT	msecs_to_jiffies(45000)	/* 45 seconds */
> #define HCI_AUTO_OFF_TIMEOUT	msecs_to_jiffies(2000)	/* 2 seconds */
> #define HCI_POWER_OFF_TIMEOUT	msecs_to_jiffies(5000)	/* 5 seconds */
> diff --git a/include/net/bluetooth/hci_core.h b/include/net/bluetooth/hci_core.h
> index ebdd4afe30d2..f14692b39fd5 100644
> --- a/include/net/bluetooth/hci_core.h
> +++ b/include/net/bluetooth/hci_core.h
> @@ -470,6 +470,7 @@ struct hci_dev {
> 	struct delayed_work	service_cache;
> 
> 	struct delayed_work	cmd_timer;
> +	struct delayed_work	ncmd_timer;
> 
> 	struct work_struct	rx_work;
> 	struct work_struct	cmd_work;
> diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
> index b0d9c36acc03..5ee1609456bd 100644
> --- a/net/bluetooth/hci_core.c
> +++ b/net/bluetooth/hci_core.c
> @@ -2769,6 +2769,20 @@ static void hci_cmd_timeout(struct work_struct *work)
> 	queue_work(hdev->workqueue, &hdev->cmd_work);
> }
> 
> +/* HCI ncmd timer function */
> +static void hci_ncmd_timeout(struct work_struct *work)
> +{
> +	struct hci_dev *hdev = container_of(work, struct hci_dev,
> +					    ncmd_timer.work);
> +
> +	bt_dev_err(hdev, "ncmd timeout");
> +
> +	if (hci_dev_do_close(hdev))
> +		return;
> +
> +	hci_dev_do_open(hdev);
> +}
> +

I am pretty certain this can dead-lock if ncmd=0 happens inside hci_dev_do_open,do_close itself.

The second thing is that do_close+do_open is heavy hammer you are swinging here. It will also result in mgmt powered down/up. Is this something you really want since bluetoothd will notice this and has to re-init everything.

Regards

Marcel

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ