lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <45cbb403.157a.1806695af40.Coremail.duoming@zju.edu.cn>
Date:   Tue, 26 Apr 2022 23:55:27 +0800 (GMT+08:00)
From:   duoming@....edu.cn
To:     "Paolo Abeni" <pabeni@...hat.com>
Cc:     krzysztof.kozlowski@...aro.org, linux-kernel@...r.kernel.org,
        davem@...emloft.net, gregkh@...uxfoundation.org,
        alexander.deucher@....com, broonie@...nel.org,
        akpm@...ux-foundation.org, netdev@...r.kernel.org, linma@....edu.cn
Subject: Re: Re: [PATCH net V2] nfc: nfcmrvl: main: reorder destructive
 operations in nfcmrvl_nci_unregister_dev to avoid bugs

hello,

On Mon, Tue, 26 Apr 2022 13:17:21 +0200 Paolo wrote:

> > There are destructive operations such as nfcmrvl_fw_dnld_abort and
> > gpio_free in nfcmrvl_nci_unregister_dev. The resources such as firmware,
> > gpio and so on could be destructed while the upper layer functions such as
> > nfcmrvl_fw_dnld_start and nfcmrvl_nci_recv_frame is executing, which leads
> > to double-free, use-after-free and null-ptr-deref bugs.
> > 
> > There are three situations that could lead to double-free bugs.
> > 
> > The first situation is shown below:
> > 
> >    (Thread 1)                 |      (Thread 2)
> > nfcmrvl_fw_dnld_start         |
> >  ...                          |  nfcmrvl_nci_unregister_dev
> >  release_firmware()           |   nfcmrvl_fw_dnld_abort
> >   kfree(fw) //(1)             |    fw_dnld_over
> >                               |     release_firmware
> >   ...                         |      kfree(fw) //(2)
> >                               |     ...
> > 
> > The second situation is shown below:
> > 
> >    (Thread 1)                 |      (Thread 2)
> > nfcmrvl_fw_dnld_start         |
> >  ...                          |
> >  mod_timer                    |
> >  (wait a time)                |
> >  fw_dnld_timeout              |  nfcmrvl_nci_unregister_dev
> >    fw_dnld_over               |   nfcmrvl_fw_dnld_abort
> >     release_firmware          |    fw_dnld_over
> >      kfree(fw) //(1)          |     release_firmware
> >      ...                      |      kfree(fw) //(2)
> > 
> > The third situation is shown below:
> > 
> >        (Thread 1)               |       (Thread 2)
> > nfcmrvl_nci_recv_frame          |
> >  if(..->fw_download_in_progress)|
> >   nfcmrvl_fw_dnld_recv_frame    |
> >    queue_work                   |
> >                                 |
> > fw_dnld_rx_work                 | nfcmrvl_nci_unregister_dev
> >  fw_dnld_over                   |  nfcmrvl_fw_dnld_abort
> >   release_firmware              |   fw_dnld_over
> >    kfree(fw) //(1)              |    release_firmware
> >                                 |     kfree(fw) //(2)
> > 
> > The firmware struct is deallocated in position (1) and deallocated
> > in position (2) again.
> > 
> > The crash trace triggered by POC is like below:
> > 
> > [  122.640457] BUG: KASAN: double-free or invalid-free in fw_dnld_over+0x28/0xf0
> > [  122.640457] Call Trace:
> > [  122.640457]  <TASK>
> > [  122.640457]  kfree+0xb0/0x330
> > [  122.640457]  fw_dnld_over+0x28/0xf0
> > [  122.640457]  nfcmrvl_nci_unregister_dev+0x61/0x70
> > [  122.640457]  nci_uart_tty_close+0x87/0xd0
> > [  122.640457]  tty_ldisc_kill+0x3e/0x80
> > [  122.640457]  tty_ldisc_hangup+0x1b2/0x2c0
> > [  122.640457]  __tty_hangup.part.0+0x316/0x520
> > [  122.640457]  tty_release+0x200/0x670
> > [  122.640457]  __fput+0x110/0x410
> > [  122.640457]  task_work_run+0x86/0xd0
> > [  122.640457]  exit_to_user_mode_prepare+0x1aa/0x1b0
> > [  122.640457]  syscall_exit_to_user_mode+0x19/0x50
> > [  122.640457]  do_syscall_64+0x48/0x90
> > [  122.640457]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> > [  122.640457] RIP: 0033:0x7f68433f6beb
> > 
> > What's more, there are also use-after-free and null-ptr-deref bugs
> > in nfcmrvl_fw_dnld_start. If we deallocate firmware struct, gpio or
> > set null to the members of priv->fw_dnld in nfcmrvl_nci_unregister_dev,
> > then, we dereference firmware, gpio or the members of priv->fw_dnld in
> > nfcmrvl_fw_dnld_start, the UAF or NPD bugs will happen.
> > 
> > This patch reorders destructive operations after nci_unregister_device
> > to avoid the double-free, UAF and NPD bugs, as nci_unregister_device
> > is well synchronized and won't return if there is a running routine.
> > This was mentioned in commit 3e3b5dfcd16a ("NFC: reorder the logic in
> > nfc_{un,}register_device").

> It looks like the above is not enough to close all the possible races,
> specifically it looks like fw_dnld_timeout() and fw_dnld_rx_work() may
> still race one vs another. 

The fw_dnld_timeout() and fw_dnld_rx_work() could not race with each other.
Because fw_dnld_rx_work() is used to receive packets from firmware download
process, the release_firmware() in fw_dnld_rx_work() will not execute unless it
has received firmware packets, the code is shown below. 

fw_dnld_rx_work                                   
  while ((skb = skb_dequeue(&fw_dnld->rx_q))) {
    ...  
    fw_dnld_over
      release_firmware
  }

The release_firmware() is only called when firmware download process failed, and
fw_dnld_rx_work() will not receive any packets from firmware download process.
As a result, the race is non-existent. The process is shown below:

fw_dnld_rx_work          | nfcmrvl_fw_dnld_start
  ...                    |   ...
                         |   release_firmware
  (wait for skb)         |   (the following will not send any skb)
  while (skb = ...) {    |   
    ...                  | 
    fw_dnld_over         |
      release_firmware   |
  }                      |
  (receive no packet)    |

> I *think* that the approach you already suggested here:
> 
> https://lore.kernel.org/netdev/1d34425a0ea8a553a66dcf4f22ca55cc920dbb42.1649913521.git.duoming@zju.edu.cn/
> 
> should be safer - but you have to protect with the same spinlock even
> every fw_dnld->fw modification.

The spinlock should not be used to protect release_firmware(), because there is vfree() in release_firmware().
The vfree() is a function that may sleep. If we use spinlock, the sleep in atomic bug will happen. The process
is shown below:

spin_lock()
release_firmware()
  firmware_free_data()
    vfree()  //sleep in atomic
spin_unlock()

We reorders destructive operations after nci_unregister_device(), 
using nci_unregister_device() to synchonize with firmware download routine.
But I found this is not enough, when nci_unregister_device() is return, there 
are still firmware download routine is running.

[   65.835462] BUG: KASAN: use-after-free in nci_fw_download+0x26/0x60
[   65.840236] Read of size 8 at addr ffff88800c2f5008
...
[   65.845755] Call Trace:
[   65.856235]  nci_fw_download+0x26/0x60
[   65.856235]  nfc_fw_download+0x99/0xe0
[   65.856235]  nfc_genl_fw_download+0x10b/0x170
...

In order to solve this problem, I propose the following solutions:

diff --git a/drivers/nfc/nfcmrvl/main.c b/drivers/nfc/nfcmrvl/main.c
index 2fcf545012b..1a5284de434 100644
--- a/drivers/nfc/nfcmrvl/main.c
+++ b/drivers/nfc/nfcmrvl/main.c
@@ -183,6 +183,7 @@ void nfcmrvl_nci_unregister_dev(struct nfcmrvl_private *priv)
 {
        struct nci_dev *ndev = priv->ndev;
 
+       nci_unregister_device(ndev);
        if (priv->ndev->nfc_dev->fw_download_in_progress)
                nfcmrvl_fw_dnld_abort(priv);
 
@@ -191,7 +192,6 @@ void nfcmrvl_nci_unregister_dev(struct nfcmrvl_private *priv)
        if (gpio_is_valid(priv->config.reset_n_io))
                gpio_free(priv->config.reset_n_io);
 
-       nci_unregister_device(ndev);
        nci_free_device(ndev);
        kfree(priv);
 }

diff --git a/net/nfc/core.c b/net/nfc/core.c
index dc7a2404efd..da8199f67d4 100644
--- a/net/nfc/core.c
+++ b/net/nfc/core.c
@@ -25,6 +25,7 @@
 #define NFC_CHECK_PRES_FREQ_MS 2000
 
 int nfc_devlist_generation;
+bool nfc_download;
 DEFINE_MUTEX(nfc_devlist_mutex);
 
 /* NFC device ID bitmap */
@@ -38,7 +39,7 @@ int nfc_fw_download(struct nfc_dev *dev, const char *firmware_name)
 
        device_lock(&dev->dev);
 
-       if (!device_is_registered(&dev->dev)) {
+       if (!device_is_registered(&dev->dev) || !nfc_download) {
                rc = -ENODEV;
                goto error;
        }
@@ -1134,6 +1135,7 @@ int nfc_register_device(struct nfc_dev *dev)
                        dev->rfkill = NULL;
                }
        }
+       nfc_download = true;
        device_unlock(&dev->dev);
 
        rc = nfc_genl_device_added(dev);
@@ -1166,6 +1168,7 @@ void nfc_unregister_device(struct nfc_dev *dev)
                rfkill_unregister(dev->rfkill);
                rfkill_destroy(dev->rfkill);
        }
+       nfc_download = false;
        device_unlock(&dev->dev);
 
        if (dev->ops->check_presence) {
-- 

The new solution not only reorders destructive operations after nci_unregister_device,
but also adds bool variable protected by device_lock to synchronize between
cleanup routine and firmware download routine. The process is shown below.

       (Thread 1)               |       (Thread 2)
nfcmrvl_nci_unregister_dev      |
  nci_unregister_device         |
    nfc_unregister_device       | nfc_fw_download
      device_lock()             |
      ...                       |
      nfc_download = false;//(1)|   ...
      device_unlock()           |
  ...                           |   device_lock()
  (destructive operations)      |   if(.. || !nfc_download)
                                |     goto error; //(2)
                                |   rc = dev->ops->fw_download();
                                | error:
                                |   device_unlock()

If the device is detaching, nfc_download will be set to false in position (1) and 
the download function will goto error.

       (Thread 1)                  |       (Thread 2)
nfcmrvl_nci_unregister_dev         |
  nci_unregister_device            |
    nfc_unregister_device          | nfc_fw_download
                                   |   device_lock()
                                   |   rc = dev->ops->fw_download();
                                   |   dev->fw_download_in_progress = false;
                                   |   device_unlock()
      device_lock()                |   ...
      ...                          |   
      nfc_download = false;        |   ...
      device_unlock()              |
  ...                              |
  if (..->fw_download_in_progress) |
    nfcmrvl_fw_dnld_abort          |
      fw_dnld_over                 |
        release_firmware           |

If the download function is executing, nci_unregister_device() will wait until
firmware download routine is finished. What`s more, when the firmware download
routine is finished, the fw_download_in_progress() in nfc_dev struct will be set
to false. the release_firmware() in nfcmrvl_nci_unregister_dev() will not execute,
which could synchonize with fw_dnld_rx_work() and fw_dnld_timeout().

I think the new solution is good, it has been tested enough. I sent "[PATCH net v3]
nfc: nfcmrvl: main: reorder destructive operations in nfcmrvl_nci_unregister_dev to avoid bugs"
just now.

Best regards,
Duoming Zhou

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ