[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CABBYNZJ=SgwgNz7K9j0nK-aV-wgMF6_yZ9psF1ydWG7HAhwFYw@mail.gmail.com>
Date: Tue, 13 Feb 2024 08:53:06 -0500
From: Luiz Augusto von Dentz <luiz.dentz@...il.com>
To: Jonas Dreßler <verdre@...d.nl>
Cc: syzbot <syzbot+3f0a39be7a2035700868@...kaller.appspotmail.com>,
davem@...emloft.net, edumazet@...gle.com, johan.hedberg@...il.com,
kuba@...nel.org, linux-bluetooth@...r.kernel.org,
linux-kernel@...r.kernel.org, luiz.von.dentz@...el.com, marcel@...tmann.org,
netdev@...r.kernel.org, pabeni@...hat.com, syzkaller-bugs@...glegroups.com
Subject: Re: [syzbot] [bluetooth?] KASAN: slab-use-after-free Write in __hci_acl_create_connection_sync
Hi Jonas,
On Mon, Feb 12, 2024 at 7:29 PM Jonas Dreßler <verdre@...d.nl> wrote:
>
> Hi Luiz,
>
> On 09.02.24 14:36, Luiz Augusto von Dentz wrote:
> > Hi Jonas,
> >
> > On Fri, Feb 9, 2024 at 7:37 AM Jonas Dreßler <verdre@...d.nl> wrote:
> >>
> >> Hi everyone!
> >>
> >> On 08.02.24 16:32, syzbot wrote:
> >>> syzbot has bisected this issue to:
> >>>
> >>> commit 456561ba8e495e9320c1f304bf1cd3d1043cbe7b
> >>> Author: Jonas Dreßler <verdre@...d.nl>
> >>> Date: Tue Feb 6 11:08:13 2024 +0000
> >>>
> >>> Bluetooth: hci_conn: Only do ACL connections sequentially
> >>>
> >>> bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=154f8550180000
> >>> start commit: b1d3a0e70c38 Add linux-next specific files for 20240208
> >>> git tree: linux-next
> >>> final oops: https://syzkaller.appspot.com/x/report.txt?x=174f8550180000
> >>> console output: https://syzkaller.appspot.com/x/log.txt?x=134f8550180000
> >>> kernel config: https://syzkaller.appspot.com/x/.config?x=bb693ba195662a06
> >>> dashboard link: https://syzkaller.appspot.com/bug?extid=3f0a39be7a2035700868
> >>> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=11d95147e80000
> >>> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=107c2d8fe80000
> >>>
> >>> Reported-by: syzbot+3f0a39be7a2035700868@...kaller.appspotmail.com
> >>> Fixes: 456561ba8e49 ("Bluetooth: hci_conn: Only do ACL connections sequentially")
> >>>
> >>> For information about bisection process see: https://goo.gl/tpsmEJ#bisection
> >>
> >> Hmm, looking at the backtraces, I think the issue that the introduction of the
> >> sequential connect has introduced another async case: In hci_connect_acl(), when
> >> we call hci_acl_create_connection_sync(), the conn state no longer immediately
> >> gets set to BT_CONNECT, but remains in BT_OPEN or BT_CLOSED until the hci_sync
> >> queue actually executes __hci_acl_create_connection_sync().
> >
> > Need to double check but I think we do set BT_CONNECT in case of LE
> > when it is queued so which shall prevent it to be queued multiple
> > times.
>
> Nope, we set it only from within the hci_sync callback, see
> hci_connect_le_sync(). IMO that makes sense, because in
> hci_abort_conn_sync(), we send a HCI_OP_CREATE_CONN_CANCEL in case
> of conn->state == BT_CONNECT, and this OP we wouldn't want to send
> while the command is still waiting in the queue.
Yep, well I did a change which should work regardless of BT_CONNECT
only being set at the callback:
https://patchwork.kernel.org/project/bluetooth/patch/20240209141612.3554051-1-luiz.dentz@gmail.com/
> >
> >> This means that now hci_connect_acl() is happy to do multiple
> >> hci_acl_create_connection_sync calls, and the hci_sync machinery will happily
> >> execute them right after each other. Then the newly introduced hci_abort_conn_sync()
> >> in __hci_acl_create_connection_sync() calls hci_conn_del() and frees the conn
> >> object, so the second time we enter __hci_acl_create_connection_sync(),
> >> things blow up.
> >>
> >> It looks to me like in theory the hci_connect_le_sync() logic is prone to a
> >> similar issue, but in practice that's prohibited because in hci_connect_le_sync()
> >> we lookup whether the conn object still exists and bail out if it doesn't.
> >>
> >> Even for LE though I think we can queue multiple hci_connect_le_sync() calls
> >> and those will happily send HCI_OP_LE_CREATE_CONN no matter what the connection
> >> state actually is?
> >>
> >> So assuming this analysis is correct, what do we do to fix this? It seems to me
> >> that
> >>
> >> 1) we want a BT_CONNECT_QUEUED state for connections, so that the state
> >> machine covers this additional stage that we have for ACL and LE connections now.
> >
> > BT_CONNECT already indicates that connection procedure is in progress.
>
> I still think an additional state is necessary. Alternatively (and maybe
> a lot nicer than the extra state) would be to add some functions to hci_sync
> to check for and remove queued/ongoing commands, I'm thinking of a new
>
> bool hci_cmd_sync_has(struct hci_dev *hdev, hci_cmd_sync_work_func_t func, void *data);
> void hci_cmd_sync_cancel(struct hci_dev *hdev, hci_cmd_sync_work_func_t func, void *data);
>
> I think if we had those, in addition to not needing the additional state,
> we could also simplify the hci_abort_conn() code quite a bit and possibly
> get rid of the passing of connection handles to hci_connect_le_sync().
>
> I'll give that a try and propose a small patch tomorrow.
Yeah, I did something like in the past for hci_cmd_sync_queue_once, I
will resend it since it had the following function as well:
bool hci_cmd_sync_lookup(struct hci_dev *hdev, hci_cmd_sync_work_func_t func,
void *data, hci_cmd_sync_work_destroy_t destroy,
bool cancel);
> Cheers,
> Jonas
>
> >
> >> 2) the conn object can still disappear while the __hci_acl_create_connection_sync()
> >> is queued, so we need something like the "if conn doesn't exist anymore, bail out"
> >> check from hci_connect_le_sync() in __hci_acl_create_connection_sync(), too.
> >
> > Btw, I'd probably clean up the connect function and create something
> > like hci_connect/hci_connect_sync which takes care of the details
> > internally like it was done to abort.
> >
> >> That said, the current check in hci_connect_le_sync() that's using the connection
> >> handle to lookup the conn does not seem great, aren't these handles re-used
> >> after connections are torn down?
> >
> > Well we could perhaps do a lookup by pointer to see if the connection
> > hasn't been removed in the meantime, that said to force a clash on the
> > handles it need to happen in between abort, which frees the handle,
> > and connect, anyway the real culprit here is that we should be able to
> > abort the cmd_sync callback like we do in LE:
> >
> > https://github.com/bluez/bluetooth-next/blob/master/net/bluetooth/hci_conn.c#L2943
> >
> > That way we stop the connect callback to run and don't have to worry
> > about handle re-use.
> >
> >> Cheers,
> >> Jonas
> >
> >
> >
--
Luiz Augusto von Dentz
Powered by blists - more mailing lists