lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 26 Apr 2019 16:55:19 +0800
From:   João Paulo Rechi Vita <jprvita@...il.com>
To:     Marcel Holtmann <marcel@...tmann.org>
Cc:     Johan Hedberg <johan.hedberg@...il.com>,
        Balakrishna Godavarthi <bgodavar@...eaurora.org>,
        ytkim@....qualcomm.com, "David S. Miller" <davem@...emloft.net>,
        linux-bluetooth@...r.kernel.org,
        Network Development <netdev@...r.kernel.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Linux Upstreaming Team <linux@...lessm.com>,
        João Paulo Rechi Vita <jprvita@...lessm.com>
Subject: Re: [PATCH] Bluetooth: Ignore CC events not matching the last HCI command

On Thu, Apr 25, 2019 at 7:08 PM Marcel Holtmann <marcel@...tmann.org> wrote:
>
> Hi Joao,
>
> > This commit makes the kernel not send the next queued HCI command until
> > a command complete arrives for the last HCI command sent to the
> > controller. This change avoids a problem with some buggy controllers
> > (seen on two SKUs of QCA9377) that send an extra command complete event
> > for the previous command after the kernel had already sent a new HCI
> > command to the controller.
> >
> > The problem was reproduced when starting an active scanning procedure,
> > where an extra command complete event arrives for the LE_SET_RANDOM_ADDR
> > command. When this happends the kernel ends up not processing the
> > command complete for the following commmand, LE_SET_SCAN_PARAM, and
> > ultimately behaving as if a passive scanning procedure was being
> > performed, when in fact controller is performing an active scanning
> > procedure. This makes it impossible to discover BLE devices as no device
> > found events are sent to userspace.
> >
> > This problem is reproducible on 100% of the attempts on the affected
> > controllers. The extra command complete event can be seen at timestamp
> > 27.420131 on the btmon logs bellow.
> >
> > Bluetooth monitor ver 5.50
> > = Note: Linux version 5.0.0+ (x86_64)                                  0.352340
> > = Note: Bluetooth subsystem version 2.22                               0.352343
> > = New Index: 80:C5:F2:8F:87:84 (Primary,USB,hci0)               [hci0] 0.352344
> > = Open Index: 80:C5:F2:8F:87:84                                 [hci0] 0.352345
> > = Index Info: 80:C5:F2:8F:87:84 (Qualcomm)                      [hci0] 0.352346
> > @ MGMT Open: bluetoothd (privileged) version 1.14             {0x0001} 0.352347
> > @ MGMT Open: btmon (privileged) version 1.14                  {0x0002} 0.352366
> > @ MGMT Open: btmgmt (privileged) version 1.14                {0x0003} 27.302164
> > @ MGMT Command: Start Discovery (0x0023) plen 1       {0x0003} [hci0] 27.302310
> >        Address type: 0x06
> >          LE Public
> >          LE Random
> > < HCI Command: LE Set Random Address (0x08|0x0005) plen 6   #1 [hci0] 27.302496
> >        Address: 15:60:F2:91:B2:24 (Non-Resolvable)
> >> HCI Event: Command Complete (0x0e) plen 4                 #2 [hci0] 27.419117
> >      LE Set Random Address (0x08|0x0005) ncmd 1
> >        Status: Success (0x00)
> > < HCI Command: LE Set Scan Parameters (0x08|0x000b) plen 7  #3 [hci0] 27.419244
> >        Type: Active (0x01)
> >        Interval: 11.250 msec (0x0012)
> >        Window: 11.250 msec (0x0012)
> >        Own address type: Random (0x01)
> >        Filter policy: Accept all advertisement (0x00)
> >> HCI Event: Command Complete (0x0e) plen 4                 #4 [hci0] 27.420131
> >      LE Set Random Address (0x08|0x0005) ncmd 1
> >        Status: Success (0x00)
> > < HCI Command: LE Set Scan Enable (0x08|0x000c) plen 2      #5 [hci0] 27.420259
> >        Scanning: Enabled (0x01)
> >        Filter duplicates: Enabled (0x01)
> >> HCI Event: Command Complete (0x0e) plen 4                 #6 [hci0] 27.420969
> >      LE Set Scan Parameters (0x08|0x000b) ncmd 1
> >        Status: Success (0x00)
> >> HCI Event: Command Complete (0x0e) plen 4                 #7 [hci0] 27.421983
> >      LE Set Scan Enable (0x08|0x000c) ncmd 1
> >        Status: Success (0x00)
> > @ MGMT Event: Command Complete (0x0001) plen 4        {0x0003} [hci0] 27.422059
> >      Start Discovery (0x0023) plen 1
> >        Status: Success (0x00)
> >        Address type: 0x06
> >          LE Public
> >          LE Random
> > @ MGMT Event: Discovering (0x0013) plen 2             {0x0003} [hci0] 27.422067
> >        Address type: 0x06
> >          LE Public
> >          LE Random
> >        Discovery: Enabled (0x01)
> > @ MGMT Event: Discovering (0x0013) plen 2             {0x0002} [hci0] 27.422067
> >        Address type: 0x06
> >          LE Public
> >          LE Random
> >        Discovery: Enabled (0x01)
> > @ MGMT Event: Discovering (0x0013) plen 2             {0x0001} [hci0] 27.422067
> >        Address type: 0x06
> >          LE Public
> >          LE Random
> >        Discovery: Enabled (0x01)
> >
> > Signed-off-by: João Paulo Rechi Vita <jprvita@...lessm.com>
> > ---
> > include/net/bluetooth/hci_core.h | 1 +
> > net/bluetooth/hci_core.c         | 5 +++++
> > net/bluetooth/hci_event.c        | 3 +++
> > 3 files changed, 9 insertions(+)
> >
> > diff --git a/include/net/bluetooth/hci_core.h b/include/net/bluetooth/hci_core.h
> > index 094e61e07030..85bed4e916d3 100644
> > --- a/include/net/bluetooth/hci_core.h
> > +++ b/include/net/bluetooth/hci_core.h
> > @@ -364,6 +364,7 @@ struct hci_dev {
> >       struct sk_buff_head     cmd_q;
> >
> >       struct sk_buff          *sent_cmd;
> > +     __u8                    sent_cmd_pending_cc;
>
> is there anything wrong with a bool type? And frankly using a flag seems more safe than just a boolean.
>

Ok, I'll change it into a flag.

> >
> >       struct mutex            req_lock;
> >       wait_queue_head_t       req_wait_q;
> > diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
> > index d6b2540ba7f8..37893b0c6077 100644
> > --- a/net/bluetooth/hci_core.c
> > +++ b/net/bluetooth/hci_core.c
> > @@ -4380,9 +4380,13 @@ void hci_req_cmd_complete(struct hci_dev *hdev, u16 opcode, u8 status,
> >               if (test_bit(HCI_INIT, &hdev->flags) && opcode == HCI_OP_RESET)
> >                       hci_resend_last(hdev);
> >
> > +             bt_dev_err(hdev,
> > +                        "unexpected CC event for opcode 0x%4.4x", opcode);
>
> Have you tested this? This will cause many error from tools like hciconfig or hcitool.
>

I have tested this, but not with hciconfig or hcitool custom commands.
Do you have an example that I could use to verify this?

> >               return;
> >       }
> >
> > +     hdev->sent_cmd_pending_cc = 0;
> > +
> >       /* If the command succeeded and there's still more commands in
> >        * this request the request is not yet complete.
> >        */
> > @@ -4493,6 +4497,7 @@ static void hci_cmd_work(struct work_struct *work)
> >
> >               hdev->sent_cmd = skb_clone(skb, GFP_KERNEL);
> >               if (hdev->sent_cmd) {
> > +                     hdev->sent_cmd_pending_cc = 1;
> >                       atomic_dec(&hdev->cmd_cnt);
> >                       hci_send_frame(hdev, skb);
> >                       if (test_bit(HCI_RESET, &hdev->flags))
>
> Fun part is actually that even commands injected via hcitool etc. will be added to the sent_cmd queue unless they are vendor specific commands. However some manufactures choose to use command complete for their vendor commands as well (others don’t).
>

So injected commands and vendor-specific commands are not part of an
hci_request, but end up in this same hdev->cmd_q somehow. Is that
correct?

> > diff --git a/net/bluetooth/hci_event.c b/net/bluetooth/hci_event.c
> > index 609fd6871c5a..7541a4bc9444 100644
> > --- a/net/bluetooth/hci_event.c
> > +++ b/net/bluetooth/hci_event.c
> > @@ -3404,6 +3404,9 @@ static void hci_cmd_complete_evt(struct hci_dev *hdev, struct sk_buff *skb,
> >       hci_req_cmd_complete(hdev, *opcode, *status, req_complete,
> >                            req_complete_skb);
> >
> > +     if (hdev->sent_cmd_pending_cc)
> > +             return;
> > +
> >       if (atomic_read(&hdev->cmd_cnt) && !skb_queue_empty(&hdev->cmd_q))
> >               queue_work(hdev->workqueue, &hdev->cmd_work);
>
> I rather see this done super cleanly. We have a pending cmd request from the kernel, make sure it gets answered with the corresponding command complete or command status and not just some other event. If there is a command complete or command status for a command that we don’t have pending, then ignore.
>

So, just to make this clear: are you saying we should only try to
match commands with events if the command is part of a hci_request?
IIUC the way to verify if a command is part of a request is to check
if hdev->req_status == HCI_REQ_PEND.

> Frankly, you are just hacking around here.
>

Well, I tried to make this point on my first submission -- I am not
familiar with the HCI layer, but having identified the bug and have
the affected hardware at hand, I thought I might as well try to fix it
so users with this controller can use BLE devices. I am happy to keep
working on it some guidance, but I would not mind testing patches
either if you think that would be easier.

> Regards
>
> Marcel
>

Thanks for the reviews so far. I'm sending a new revision
incorporating my understanding of the feedback above,

Best regards,

--
João Paulo Rechi Vita
http://about.me/jprvita

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ