lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 28 Oct 2021 16:41:40 -0700
From:   Luiz Augusto von Dentz <luiz.dentz@...il.com>
To:     Marcel Holtmann <marcel@...tmann.org>
Cc:     Archie Pusaka <apusaka@...gle.com>,
        linux-bluetooth <linux-bluetooth@...r.kernel.org>,
        CrosBT Upstreaming <chromeos-bluetooth-upstreaming@...omium.org>,
        Archie Pusaka <apusaka@...omium.org>,
        Miao-chen Chou <mcchou@...omium.org>,
        "David S. Miller" <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>,
        Johan Hedberg <johan.hedberg@...il.com>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        "open list:NETWORKING [GENERAL]" <netdev@...r.kernel.org>
Subject: Re: [PATCH] Bluetooth: Limit duration of Remote Name Resolve

Hi Marcel,

On Thu, Oct 28, 2021 at 6:30 AM Marcel Holtmann <marcel@...tmann.org> wrote:
>
> Hi Archie,
>
> > When doing remote name request, we cannot scan. In the normal case it's
> > OK since we can expect it to finish within a short amount of time.
> > However, there is a possibility to scan lots of devices that
> > (1) requires Remote Name Resolve
> > (2) is unresponsive to Remote Name Resolve
> > When this happens, we are stuck to do Remote Name Resolve until all is
> > done before continue scanning.
> >
> > This patch adds a time limit to stop us spending too long on remote
> > name request. The limit is increased for every iteration where we fail
> > to complete the RNR in order to eventually solve all names.
> >
> > Signed-off-by: Archie Pusaka <apusaka@...omium.org>
> > Reviewed-by: Miao-chen Chou <mcchou@...omium.org>
> >
> > ---
> > Hi maintainers, we found one instance where a test device spends ~90
> > seconds to do Remote Name Resolving, hence this patch.
> > I think it's better if we reset the time limit to the default value
> > at some point, but I don't have a good proposal where to do that, so
> > in the end I didn't.
>
> do you have a btmon trace for this as well?
>
> The HCI Remote Name Request is essentially a paging procedure and then a few LMP messages. It is fundamentally a connection request inside BR/EDR and if you have a remote device that has page scan disabled, but inquiry scan enabled, then you get into this funky situation. Sadly, the BR/EDR parts don’t give you any hint on this weird combination. You can't configure BlueZ that way since it is really stupid setup and I remember that GAP doesn’t have this case either, but it can happen. So we might want to check if that is what happens. And of course it needs to be a Bluetooth 2.0 device or a device that doesn’t support Secure Simple Pairing. There is a chance of really bad radio interference, but that is then just bad luck and is only going to happen every once in a blue moon.

I wonder what does the remote sets as Page_Scan_Repetition_Mode in the
Inquiry Result, it seems quite weird that the specs allows such stage
but it doesn't have a value to represent in the inquiry result, anyway
I guess changing that now wouldn't really make any different given
such device is probably never gonna update.

> That said, you should receive a Page Timeout in the Remote Name Request Complete event for what you describe. Or you just use HCI Remote Name Request Cancel to abort the paging. If I remember correctly then the setting for Page Timeout is also applied to Remote Name resolving procedure. So we could tweak that value. Actually once we get the “sync” work merged, we could configure different Page Timeout for connection requests and name resolving if that would help. Not sure if this is worth it, since we could as simple just cancel the request.

If I recall this correctly we used to have something like that back in
the days the daemon had control over the discovery, the logic was that
each round of discovery including the name resolving had a fixed time
e.g. 10 sec, so if not all device found had their name resolved we
would stop and proceed to the next round that way we avoid this
problem of devices not resolving and nothing being discovered either.
Luckily today there might not be many devices around without EIR
including their names but still I think it would be better to limit
the amount time we spend resolving names, also it looks like it sets
NAME_NOT_KNOWN when RNR fails and it never proceeds to request the
name again so I wonder why would it be waiting ~90 seconds, we don't
seem to change the page timeout so it should be using the default
which is 5.12s so I think there is something else at play.

Or perhaps it is finally time to drop legacy name resolution? At this
point they should be very rare and those keeping some ancient device
around will just have to deal with the address directly so we can get
rid of paging devices while discovery which is quite a big burden I'd
say.

>
> > include/net/bluetooth/hci_core.h |  5 +++++
> > net/bluetooth/hci_event.c        | 12 ++++++++++++
> > 2 files changed, 17 insertions(+)
> >
> > diff --git a/include/net/bluetooth/hci_core.h b/include/net/bluetooth/hci_core.h
> > index dd8840e70e25..df9ffedf1d29 100644
> > --- a/include/net/bluetooth/hci_core.h
> > +++ b/include/net/bluetooth/hci_core.h
> > @@ -87,6 +87,8 @@ struct discovery_state {
> >       u8                      (*uuids)[16];
> >       unsigned long           scan_start;
> >       unsigned long           scan_duration;
> > +     unsigned long           name_resolve_timeout;
> > +     unsigned long           name_resolve_duration;
> > };
> >
> > #define SUSPEND_NOTIFIER_TIMEOUT      msecs_to_jiffies(2000) /* 2 seconds */
> > @@ -805,6 +807,8 @@ static inline void sco_recv_scodata(struct hci_conn *hcon, struct sk_buff *skb)
> > #define INQUIRY_CACHE_AGE_MAX   (HZ*30)   /* 30 seconds */
> > #define INQUIRY_ENTRY_AGE_MAX   (HZ*60)   /* 60 seconds */
> >
> > +#define NAME_RESOLVE_INIT_DURATION   5120    /* msec */
> > +
> > static inline void discovery_init(struct hci_dev *hdev)
> > {
> >       hdev->discovery.state = DISCOVERY_STOPPED;
> > @@ -813,6 +817,7 @@ static inline void discovery_init(struct hci_dev *hdev)
> >       INIT_LIST_HEAD(&hdev->discovery.resolve);
> >       hdev->discovery.report_invalid_rssi = true;
> >       hdev->discovery.rssi = HCI_RSSI_INVALID;
> > +     hdev->discovery.name_resolve_duration = NAME_RESOLVE_INIT_DURATION;
> > }
> >
> > static inline void hci_discovery_filter_clear(struct hci_dev *hdev)
> > diff --git a/net/bluetooth/hci_event.c b/net/bluetooth/hci_event.c
> > index 3cba2bbefcd6..104a1308f454 100644
> > --- a/net/bluetooth/hci_event.c
> > +++ b/net/bluetooth/hci_event.c
> > @@ -2086,6 +2086,15 @@ static bool hci_resolve_next_name(struct hci_dev *hdev)
> >       if (list_empty(&discov->resolve))
> >               return false;
> >
> > +     /* We should stop if we already spent too much time resolving names.
> > +      * However, double the name resolve duration for the next iterations.
> > +      */
> > +     if (time_after(jiffies, discov->name_resolve_timeout)) {
> > +             bt_dev_dbg(hdev, "Name resolve takes too long, stopping.");
> > +             discov->name_resolve_duration *= 2;
> > +             return false;
> > +     }
> > +
> >       e = hci_inquiry_cache_lookup_resolve(hdev, BDADDR_ANY, NAME_NEEDED);
> >       if (!e)
> >               return false;
> > @@ -2634,6 +2643,9 @@ static void hci_inquiry_complete_evt(struct hci_dev *hdev, struct sk_buff *skb)
> >       if (e && hci_resolve_name(hdev, e) == 0) {
> >               e->name_state = NAME_PENDING;
> >               hci_discovery_set_state(hdev, DISCOVERY_RESOLVING);
> > +
> > +             discov->name_resolve_timeout = jiffies +
> > +                             msecs_to_jiffies(discov->name_resolve_duration);
>
> So if this is really caused by a device with page scan disabled and inquiry scan enabled, then this fix is just a paper over hole approach. If you have more devices requiring name resolving, you end up penalizing them and make the discovery procedure worse up to the extend that no names are resolved. So I wouldn’t be in favor of this.
>
> What LE scan window/interval is actually working against what configured BR/EDR page timeout here? The discovery procedure is something that a user triggers so we always had that one take higher priority since the user is expecting results. This means any tweaking needs to be considered carefully since it is an immediate user impact if the name is missing.
>
> Is this a LE background scan you are worried about or an LE active scan that runs in parallel during discovery?
>
> Regards
>
> Marcel
>


-- 
Luiz Augusto von Dentz

Powered by blists - more mailing lists