[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CACSM0CMXdpaKZ9LeC0QS=dzfYXr4Ac0pd9wzX+aP3m4M3K8COw@mail.gmail.com>
Date: Thu, 18 Dec 2025 20:09:40 +0300
From: Cihangir Aktürk <cakturk@...il.com>
To: Luiz Augusto von Dentz <luiz.dentz@...il.com>
Cc: linux-bluetooth@...r.kernel.org, Marcel Holtmann <marcel@...tmann.org>,
Johan Hedberg <johan.hedberg@...il.com>, linux-kernel@...r.kernel.org,
syzbot+87badbb9094e008e0685@...kaller.appspotmail.com
Subject: Re: [PATCH] Bluetooth: hci: fix refcounts in LE remote features command
On Thu, Dec 18, 2025 at 12:36 AM Luiz Augusto von Dentz
<luiz.dentz@...il.com> wrote:
>
> Hi Cihangir,
Hi Luiz,
> On Tue, Dec 16, 2025 at 2:13 PM Cihangir Akturk <cakturk@...il.com> wrote:
> >
> > KASAN reported a slab-use-after-free in le_read_features_complete()
> > running from hci_cmd_sync_work. le_read_features_complete() can run
> > after hci_rx_work/hci_conn_del() has removed the link, so the destroy
> > callback may touch a freed hci_conn and trigger a KASAN use-after-free.
> > Duplicate submissions also need to drop the references to avoid leaking
> > the hold and blocking teardown.
> >
> > Fix this by taking hci_conn_get() before queueing, passing the conn
> > directly as work data, and balancing hci_conn_hold()/drop() and
> > hci_conn_get()/put() in the completion and all error/-EEXIST paths so
> > the connection stays referenced while the work runs and is released
> > afterwards.
> >
> > Reported-by: syzbot+87badbb9094e008e0685@...kaller.appspotmail.com
> > Signed-off-by: Cihangir Akturk <cakturk@...il.com>
> > ---
> > net/bluetooth/hci_sync.c | 37 ++++++++++++++++++++++++++-----------
> > 1 file changed, 26 insertions(+), 11 deletions(-)
> >
> > I am not entirely sure that removing the err == -ECANCELED early return
> > is strictly correct. My assumption is that, with the changes in this
> > patch, there does not appear to be another cancellation path that
> > reliably balances hci_conn_drop() and hci_conn_put() for this command.
> > Based on that assumption, keeping the early return could leave the
> > references taken before queuing unbalanced on cancellation, so I opted
> > to remove it to keep the lifetime handling consistent.
> >
> > diff --git a/net/bluetooth/hci_sync.c b/net/bluetooth/hci_sync.c
> > index a9f5b1a68356..5a3d288e7015 100644
> > --- a/net/bluetooth/hci_sync.c
> > +++ b/net/bluetooth/hci_sync.c
> > @@ -776,14 +776,23 @@ _hci_cmd_sync_lookup_entry(struct hci_dev *hdev, hci_cmd_sync_work_func_t func,
> > * - Lookup if an entry already exist and only if it doesn't creates a new entry
> > * and queue it.
> > */
> > -int hci_cmd_sync_queue_once(struct hci_dev *hdev, hci_cmd_sync_work_func_t func,
> > +static int __hci_cmd_sync_queue_once(struct hci_dev *hdev, hci_cmd_sync_work_func_t func,
> > void *data, hci_cmd_sync_work_destroy_t destroy)
> > {
> > if (hci_cmd_sync_lookup_entry(hdev, func, data, destroy))
> > - return 0;
> > + return -EEXIST;
> >
> > return hci_cmd_sync_queue(hdev, func, data, destroy);
> > }
> > +
> > +int hci_cmd_sync_queue_once(struct hci_dev *hdev, hci_cmd_sync_work_func_t func,
> > + void *data, hci_cmd_sync_work_destroy_t destroy)
> > +{
> > + int ret;
> > +
> > + ret = __hci_cmd_sync_queue_once(hdev, func, data, destroy);
> > + return ret == -EEXIST ? 0 : ret;
> > +}
> > EXPORT_SYMBOL(hci_cmd_sync_queue_once);
> >
> > /* Run HCI command:
> > @@ -7338,10 +7347,8 @@ static void le_read_features_complete(struct hci_dev *hdev, void *data, int err)
> >
> > bt_dev_dbg(hdev, "err %d", err);
> >
> > - if (err == -ECANCELED)
> > - return;
> > -
> > hci_conn_drop(conn);
> > + hci_conn_put(conn);
> > }
> >
> > static int hci_le_read_all_remote_features_sync(struct hci_dev *hdev,
> > @@ -7408,12 +7415,20 @@ int hci_le_read_remote_features(struct hci_conn *conn)
> > * role is possible. Otherwise just transition into the
> > * connected state without requesting the remote features.
> > */
> > - if (conn->out || (hdev->le_features[0] & HCI_LE_PERIPHERAL_FEATURES))
> > - err = hci_cmd_sync_queue_once(hdev,
> > - hci_le_read_remote_features_sync,
> > - hci_conn_hold(conn),
> > - le_read_features_complete);
> > - else
> > + if (conn->out || (hdev->le_features[0] & HCI_LE_PERIPHERAL_FEATURES)) {
> > + hci_conn_get(conn);
> > + hci_conn_hold(conn);
> > + err = __hci_cmd_sync_queue_once(hdev,
> > + hci_le_read_remote_features_sync,
> > + conn,
> > + le_read_features_complete);
> > + if (err) {
> > + hci_conn_drop(conn);
> > + hci_conn_put(conn);
> > + if (err == -EEXIST)
> > + err = 0;
> > + }
> > + } else
>
> Sort of overkill, why do we have to use 2 references? Also we do have
> code for dequeuing callbacks using conn as user_data so either that is
> not working or there is something else at play here. Maybe we need to
> change the order so that dequeue happens before hci_conn_cleanup:
>
>From what I understand based on the KASAN trace, the issue happens
when a disconnect event is handled in hci_event_work while, at the
same time, hci_cmd_sync_work processes the LE Read Remote Features
command. So, le_read_features_complete() ends up calling
hci_conn_drop() on a connection that appears to have already been
freed.
Holding a reference with hci_conn_hold() alone does not seem
sufficient to prevent the disconnect path from removing and freeing
the hci_conn. That was the reason I tried taking an additional
reference with hci_conn_get(), to keep the connection object around
until the work finishes.
> diff --git a/net/bluetooth/hci_conn.c b/net/bluetooth/hci_conn.c
> index dc085856f5e9..b64c0e53d9cd 100644
> --- a/net/bluetooth/hci_conn.c
> +++ b/net/bluetooth/hci_conn.c
> @@ -1232,15 +1232,15 @@ void hci_conn_del(struct hci_conn *conn)
> skb_queue_purge(&conn->data_q);
> skb_queue_purge(&conn->tx_q.queue);
>
> + /* Dequeue callbacks using connection pointer as data */
> + hci_cmd_sync_dequeue(hdev, NULL, conn, NULL);
> +
> /* Remove the connection from the list and cleanup its remaining
> * state. This is a separate function since for some cases like
> * BT_CONNECT_SCAN we *only* want the cleanup part without the
> * rest of hci_conn_del.
> */
> hci_conn_cleanup(conn);
> -
> - /* Dequeue callbacks using connection pointer as data */
> - hci_cmd_sync_dequeue(hdev, NULL, conn, NULL);
> }
>
> struct hci_dev *hci_get_route(bdaddr_t *dst, bdaddr_t *src, uint8_t src_type)
>
>
> > err = -EOPNOTSUPP;
> >
> > return err;
> > --
> > 2.52.0
> >
>
I tried this change and tested the suggested ordering, but in my
testing the issue still appears to reproduce, so it does not seem to
fully address the problem.
>
> --
> Luiz Augusto von Dentz
Powered by blists - more mailing lists