[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <16548568.4e8b0.17ab525363c.Coremail.linma@zju.edu.cn>
Date: Sat, 17 Jul 2021 23:45:52 +0800 (GMT+08:00)
From: LinMa <linma@....edu.cn>
To: "Tetsuo Handa" <penguin-kernel@...ove.sakura.ne.jp>
Cc: "Desmond Cheong Zhi Xi" <desmondcheongzx@...il.com>,
"Luiz Augusto von Dentz" <luiz.dentz@...il.com>,
"Johan Hedberg" <johan.hedberg@...il.com>,
"Marcel Holtmann" <marcel@...tmann.org>,
"linux-bluetooth@...r.kernel.org" <linux-bluetooth@...r.kernel.org>,
"David S. Miller" <davem@...emloft.net>,
"Jakub Kicinski" <kuba@...nel.org>,
"open list:NETWORKING [GENERAL]" <netdev@...r.kernel.org>
Subject: Re: Yet Another Patch for CVE-2021-3573
Ooooooops
I just found that Luiz has already figured out one good patch:
https://www.spinics.net/lists/linux-bluetooth/msg92649.html
Sorry for the noise and happy weekend.
Thanks
Lin Ma
> -----Original Messages-----
> From: LinMa <linma@....edu.cn>
> Sent Time: 2021-07-17 23:41:22 (Saturday)
> To: "Tetsuo Handa" <penguin-kernel@...ove.sakura.ne.jp>
> Cc: "Desmond Cheong Zhi Xi" <desmondcheongzx@...il.com>, "Luiz Augusto von Dentz" <luiz.dentz@...il.com>, "Johan Hedberg" <johan.hedberg@...il.com>, "Marcel Holtmann" <marcel@...tmann.org>, "linux-bluetooth@...r.kernel.org" <linux-bluetooth@...r.kernel.org>, "David S. Miller" <davem@...emloft.net>, "Jakub Kicinski" <kuba@...nel.org>, "open list:NETWORKING [GENERAL]" <netdev@...r.kernel.org>
> Subject: Yet Another Patch for CVE-2021-3573
>
> Hi everyone,
>
> After reading large lines of code in the net directory of the kernel, I have the following thinkings and may need your guys' suggestions.
>
> >> >> Saw this and thought I'd offer my two cents.
> >> >> BUG: sleeping function called from invalid context
> >> >> This is the original problem that Tetsuo's patch was trying to fix.
> >>
> >> Yes.
> >>
> >> >> Under the hood of lock_sock, we call lock_sock_nested which might sleep
> >> >> because of the mutex_acquire.
> >>
> >> Both lock_sock() and lock_sock_nested() might sleep.
> >>
> >> >> But we shouldn't sleep while holding the rw spinlock.
> >>
> >> Right. In atomic context (e.g. inside interrupt handler, schedulable context
> >> with interrupts or preemption disabled, schedulable context inside RCU read
> >> critical section, schedulable context inside a spinlock section), we must not
> >> call functions (e.g. waiting for a mutex, waiting for a semaphore, waiting for
> >> a page fault) which are not atomic.
> >>
> >> >> So we either have to acquire a spinlock instead of a mutex as was done before,
> >>
> >> Regarding hci_sock_dev_event(HCI_DEV_UNREG) case, we can't use a spinlock.
> >>
> >> Like LinMa explained, lock_sock() has to be used in order to serialize functions
> >> (e.g. hci_sock_sendmsg()) which access hci_pi(sk)->hdev between lock_sock(sk) and
> >> release_sock(sk). And like I explained, we can't defer resetting hci_pi(sk)->hdev
> >> to NULL, for hci_sock_dev_event(HCI_DEV_UNREG) is responsible for resetting
> >> hci_pi(sk)->hdev to NULL because the caller of hci_sock_dev_event(HCI_DEV_UNREG)
> >> immediately destroys resources associated with this hdev.
> >>
>
> This is the critical part of the BUG here. As you can read some similar code, for example, code at /net/iucv/af_iucv.c.
>
> The iucv_sock_bind() function will bind the device:
>
> static int iucv_sock_bind(struct socket *sock, struct sockaddr *addr,
> int addr_len)
> {
> ...
> iucv->hs_dev = dev;
>
> And this field will be assigned as NULL only when the socket is closed.
>
> static void iucv_sock_close(struct sock *sk)
> {
> ...
> if (iucv->hs_dev) {
> dev_put(iucv->hs_dev);
> iucv->hs_dev = NULL;
> sk->sk_bound_dev_if = 0;
> }
>
> Even in the afiucv_netdev_event() function, there is non business with iucv->hs_dev.
>
> So why the hci_sock_dev_event(HCI_DEV_UNREG) need to set the NULL pointer and then decrease the ref-count?
> As Tetsuo said, because the hci_unregister_dev() function, which is the caller of hci_sock_dev_event() will
> reclaim the resource of the hdev object. It will destroy the workqueue and also clean up the sysfs.
>
> If we achieve our patches like the iucv stack, or some other ref-count idea (https://lkml.org/lkml/2021/6/22/1347)
> without care, the bad thing will happen. Because there is nothing useful in the hdev object, any changes to it make no sense.
>
> But wait, the write or dereference for this object can be illegal, but there should be some legal actions, like reading flags?
>
> Hence, we can still delay the release of the hdev object to hci_sock_release (like other net code does).
> We just need to take care of the checking part.
>
> One quick patch is shown below, my POC didn't trigger any warning but more checks are needed.
>
> diff --git a/net/bluetooth/hci_sock.c b/net/bluetooth/hci_sock.c
> index 251b9128f..db665f78a 100644
> --- a/net/bluetooth/hci_sock.c
> +++ b/net/bluetooth/hci_sock.c
> @@ -764,12 +764,9 @@ void hci_sock_dev_event(struct hci_dev *hdev, int event)
> sk_for_each(sk, &hci_sk_list.head) {
> bh_lock_sock_nested(sk);
> if (hci_pi(sk)->hdev == hdev) {
> - hci_pi(sk)->hdev = NULL;
> sk->sk_err = EPIPE;
> sk->sk_state = BT_OPEN;
> sk->sk_state_change(sk);
> -
> - hci_dev_put(hdev);
> }
> bh_unlock_sock(sk);
> }
> @@ -880,6 +877,7 @@ static int hci_sock_release(struct socket *sock)
>
> atomic_dec(&hdev->promisc);
> hci_dev_put(hdev);
> + hci_pi(sk)->hdev = NULL;
> }
>
> sock_orphan(sk);
> @@ -1727,10 +1725,10 @@ static int hci_sock_sendmsg(struct socket *sock, struct msghdr *msg,
> break;
> case HCI_CHANNEL_MONITOR:
> err = -EOPNOTSUPP;
> - goto done;
> + goto donefast;
> case HCI_CHANNEL_LOGGING:
> err = hci_logging_frame(sk, msg, len);
> - goto done;
> + goto donefast;
> default:
> mutex_lock(&mgmt_chan_list_lock);
> chan = __hci_mgmt_chan_find(hci_pi(sk)->channel);
> @@ -1740,15 +1738,16 @@ static int hci_sock_sendmsg(struct socket *sock, struct msghdr *msg,
> err = -EINVAL;
>
> mutex_unlock(&mgmt_chan_list_lock);
> - goto done;
> + goto donefast;
> }
>
> hdev = hci_pi(sk)->hdev;
> if (!hdev) {
> err = -EBADFD;
> - goto done;
> + goto donefast;
> }
>
> + hci_dev_lock(hdev);
> if (!test_bit(HCI_UP, &hdev->flags)) {
> err = -ENETDOWN;
> goto done;
> @@ -1832,6 +1831,8 @@ static int hci_sock_sendmsg(struct socket *sock, struct msghdr *msg,
> err = len;
>
> done:
> + hci_dev_unlock(hdev);
> +donefast:
> release_sock(sk);
> return err;
>
>
> In short, this patch delays the hci_dev_put() to hci_sock_release() and keeps the old bh_lock_sock_nested().
>
> Once we did that, the UAF in hci_sock_bound_ioctl() are fixed. ( The four different commands in hci_sock_bound_ioctl will just
> traverse a empty linked list )
>
> For another UAF point: hci_sock_sendmsg(), this patch uses hci_dev_lock() to make sure the flags and resource in hdev will not be
> released till the sendmsg is finished. (Dislike the hci_sock_create(), the hci_sock_sendmsg() can sleep so the mutex lock is possible)
>
> Of course, more auditing is needed but I just want to share this to you. Any suggestions and discussions will be much appreciated.
>
>
> >> >> or we need to move lock_sock out of the rw spinlock critical section as Tetsuo proposes.
> >>
> >> Exactly. Since this is a regression introduced when fixing CVE-2021-3573, Linux
> >> distributors are waiting for this patch so that they can apply the fix for CVE-2021-3573.
> >> This patch should be sent to linux.git and stables as soon as possible. But due to little
> >> attention on this patch, I'm already testing this patch in linux-next.git via my tree.
> >> I'll drop when Bluetooth maintainers pick this patch up for linux-5.14-rcX. (Or should I
> >> directly send to Linus?)
> >>
> >> >>
> >> >
> >> > My bad, was thinking more about the problem and noticed your poc was for hci_sock_sendmsg,
> >> > not hci_sock_dev_event.
> >>
> >> I didn't catch this part. Are you talking about a different poc?
> >> As far as I'm aware, exp.c in POC.zip was for hci_sock_bound_ioctl(HCIUNBLOCKADDR).
> >>
> >> hci_sock_bound_ioctl(HCIUNBLOCKADDR) (which is called between lock_sock() and release_sock())
> >> calls copy_from_user() which might cause page fault, and userfaultfd mechanism allows an attacker
> >> to slowdown page fault handling enough to hci_sock_dev_event(HCI_DEV_UNREG) to return without
> >> waiting for hci_sock_bound_ioctl(HCIUNBLOCKADDR) to call release_sock(). This race window
> >> results in UAF (doesn't it, LinMa?).
> >>
> >> > In this case, it's not clear to me why the atomic context is being violated.
> >>
> >> In atomic context (in hci_sock_dev_event(HCI_DEV_UNREG) case, between
> >> read_lock(&hci_sk_list.lock) and read_unlock(&hci_sk_list.lock)), we must not call
> >> lock_sock(sk) which might wait for hci_sock_bound_ioctl(HCIUNBLOCKADDR) to call release_sock().
> >>
> >> >
> >> > Sorry for the noise.
> >> >
> >> >>>
> >> >>> The patch provided by Desmond adds the local_bh_disable() before the bh_lock_sock() so I also try that in
> >> >>>
> >> >>> --- a/net/bluetooth/hci_sock.c
> >> >>> +++ b/net/bluetooth/hci_sock.c
> >> >>> @@ -762,6 +762,7 @@ void hci_sock_dev_event(struct hci_dev *hdev, int event)
> >> >>> /* Detach sockets from device */
> >> >>> read_lock(&hci_sk_list.lock);
> >> >>> sk_for_each(sk, &hci_sk_list.head) {
> >> >>> + local_bh_disable();
> >> >>> bh_lock_sock_nested(sk);
> >> >>> if (hci_pi(sk)->hdev == hdev) {
> >> >>> hci_pi(sk)->hdev = NULL;
> >> >>> @@ -772,6 +773,7 @@ void hci_sock_dev_event(struct hci_dev *hdev, int event)
> >> >>> hci_dev_put(hdev);
> >> >>> }
> >> >>> bh_unlock_sock(sk);
> >> >>> + local_bh_enable();
> >> >>> }
> >> >>> read_unlock(&hci_sk_list.lock);
> >> >>> }
> >> >>>
> >> >>> But this is not useful, the UAF still occurs
> >> >>>
> >> >>
> >> >> I might be very mistaken on this, but I believe the UAF still happens because
> >> >> you can't really mix bh_lock_sock* and lock_sock* to protect the same things.
> >>
> >> Right. https://www.kernel.org/doc/html/v5.13/kernel-hacking/locking.html
> >>
> >> >> The former holds the spinlock &sk->sk_lock.slock and synchronizes between
> >> >> user contexts and bottom halves,
> >>
> >> serializes access to resources which might be accessed from atomic (i.e. non-schedulable) contexts
> >>
> >> >> while the latter holds a mutex on &sk->sk_lock.dep_map to synchronize between
> >> >> multiple users.
> >>
> >> serializes access to resources which are accessed from only schedulable (i.e. non-atomic) contexts
> >>
> >> >>
> >> >> One option I can think of would be to switch instances of lock_sock to bh_lock_sock_nested
> >> >> for users that might race (such as hci_sock_sendmsg, hci_sock_bound_ioctl, and others as
> >> >> needed). But I'm not sure if that's quite what we want, plus we would need to ensure that
> >> >> sleeping functions aren't called between the bh_lock/unlock.
> >>
> >> We can't do it for hci_sock_dev_event(HCI_DEV_UNREG).
>
> Regards
>
> Lin Ma
Powered by blists - more mailing lists