lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <42487b67.fb6d.1808e537a6f.Coremail.duoming@zju.edu.cn>
Date:   Wed, 4 May 2022 17:07:57 +0800 (GMT+08:00)
From:   duoming@....edu.cn
To:     "Krzysztof Kozlowski" <krzysztof.kozlowski@...aro.org>
Cc:     linux-kernel@...r.kernel.org, kuba@...nel.org,
        gregkh@...uxfoundation.org, davem@...emloft.net,
        edumazet@...gle.com, pabeni@...hat.com, alexander.deucher@....com,
        akpm@...ux-foundation.org, broonie@...nel.org,
        netdev@...r.kernel.org, linma@....edu.cn
Subject: Re: [PATCH net v5 2/2] nfc: nfcmrvl: main: reorder destructive
 operations in nfcmrvl_nci_unregister_dev to avoid bugs

Hello,

On Wed, 4 May 2022 08:32:15 +0200 Krzysztof wrote:

> >>>>> There are destructive operations such as nfcmrvl_fw_dnld_abort and
> >>>>> gpio_free in nfcmrvl_nci_unregister_dev. The resources such as firmware,
> >>>>> gpio and so on could be destructed while the upper layer functions such as
> >>>>> nfcmrvl_fw_dnld_start and nfcmrvl_nci_recv_frame is executing, which leads
> >>>>> to double-free, use-after-free and null-ptr-deref bugs.
> >>>>>
> >>>>> There are three situations that could lead to double-free bugs.
> >>>>>
> >>>>> The first situation is shown below:
> >>>>>
> >>>>>    (Thread 1)                 |      (Thread 2)
> >>>>> nfcmrvl_fw_dnld_start         |
> >>>>>  ...                          |  nfcmrvl_nci_unregister_dev
> >>>>>  release_firmware()           |   nfcmrvl_fw_dnld_abort
> >>>>>   kfree(fw) //(1)             |    fw_dnld_over
> >>>>>                               |     release_firmware
> >>>>>   ...                         |      kfree(fw) //(2)
> >>>>>                               |     ...
> >>>>>
> >>>>> The second situation is shown below:
> >>>>>
> >>>>>    (Thread 1)                 |      (Thread 2)
> >>>>> nfcmrvl_fw_dnld_start         |
> >>>>>  ...                          |
> >>>>>  mod_timer                    |
> >>>>>  (wait a time)                |
> >>>>>  fw_dnld_timeout              |  nfcmrvl_nci_unregister_dev
> >>>>>    fw_dnld_over               |   nfcmrvl_fw_dnld_abort
> >>>>>     release_firmware          |    fw_dnld_over
> >>>>>      kfree(fw) //(1)          |     release_firmware
> >>>>>      ...                      |      kfree(fw) //(2)
> >>>>
> >>>> How exactly the case here is being prevented?
> >>>>
> >>>> If nfcmrvl_nci_unregister_dev() happens slightly earlier, before
> >>>> fw_dnld_timeout() on the left side (T1), the T1 will still hit it, won't it?
> >>>
> >>> I think it could be prevented. We use nci_unregister_device() to synchronize, if the
> >>> firmware download routine is running, the cleanup routine will wait it to finish. 
> >>> The flag "fw_download_in_progress" will be set to false, if the the firmware download
> >>> routine is finished. 
> >>
> >> fw_download_in_progress is not synchronized in
> >> nfcmrvl_nci_unregister_dev(), so even if fw_dnld_timeout() set it to
> >> true, the nfcmrvl_nci_unregister_dev() happening concurrently will not
> >> see updated fw_download_in_progress.
> > 
> > The fw_download_in_progress is set to false in nfc_fw_download(). The nfc_fw_download() is
> > synchronized with nfc_unregister_device().
> 
> No, it is not. There is no synchronization primitive in
> nfc_unregister_device(), at least explicitly.

There is device_lock() in nfc_fw_download() which could synchronize with nfc_unregister_device().
The code in nfc_unregister_device() is shown below:

nfc_unregister_device()
  ...
  device_lock(&dev->dev);
  ...
  dev->shutting_down = true;
  device_unlock(&dev->dev);

> > If nfc_fw_download() is running, nfc_unregister_device()
> > will wait nfc_fw_download() to finish. So the nfcmrvl_nci_unregister_dev() could see the updated
> > fw_download_in_progress. The process is shown below:
> > 
> >         (Thread 1)                                         |       (Thread 2)
> >  nfcmrvl_nci_unregister_dev                                | nfc_fw_download
> >    nci_unregister_device                                   |  ...
> >                                                            |  device_lock()
> >      ...                                                   |  dev->fw_download_in_progress = false; //(1)
> >                                                            |  device_unlock()
> >      nfc_unregister_device                                 | 
> >    if (priv->ndev->nfc_dev->fw_download_in_progress) //(2) | 
> >      nfcmrvl_fw_dnld_abort(priv); //not execute            |   
> > 
> > We set fw_download_in_progress to false in position (1) and the check in position (2) will fail,
> > the nfcmrvl_fw_dnld_abort() in nfcmrvl_nci_unregister_dev() will not execute. So the double-free
> > bugs could be prevented.
> 
> You just repeated the same not answering the question. The
> fw_download_in_progress at point (2) can be still true, on that CPU. I
> explain it third time so let me rephrase it - the
> fw_download_in_progress can be reordered by compiler or CPU to:
> 
> T1                                          | T2
> nfcmrvl_nci_unregister_dev()
>   nci_unregister_device()
>     var = fw_download_in_progress; (true)
>                                             | nfc_fw_download
>                                             | device_lock
>                                             | dev->fw_download = false;
>                                             | device_unlock
>     if (var)                                |
>       nfcmrvl_fw_dnld_abort(priv);          |
> 
> Every write barrier must be paired with read barrier. Every lock on one
> access to variable, must be paired with same lock on other access to
> variable .

There is paired write barrier and read barrier "device_lock(&dev->dev)" between 
nfc_unregister_device() and nfc_fw_download(). The shutting_down flag of nfc_dev
protected by device_lock() is set to true in position(2) and check in position(1).

         (Thread 1)                                       |       (Thread 2)
nfcmrvl_nci_unregister_dev                                | nfc_fw_download
  nci_unregister_device                                   |  ...
                                                          |  device_lock(&dev->dev)
    ...                                                   |  if (dev->shutting_down) //(1)
                                                          |    goto error;
                                                          |  ...
                                                          |  dev->fw_download_in_progress = false; //(3)
                                                          |  device_unlock(&dev->dev)
    nfc_unregister_device                                 |
      device_lock(&dev->dev);                             |
      dev->shutting_down = true; //(2)                    | 
      device_unlock(&dev->dev);                           | 
    if (priv->ndev->nfc_dev->fw_download_in_progress)//(4)|   
      nfcmrvl_fw_dnld_abort(priv);                        |

If nfc_fw_download() is running, nfc_unregister_device() will wait nfc_fw_download() to finish,
and the nfc_dev->fw_download_in_progress is set to false in position (3). The check in position(4)
is false, because we have set nfc_dev->fw_download_in_progress to false in position(3).

If we set shutting_down to true in position(2) before check in position(1) the nfc_fw_download()
will goto error.

> >>> Although the timer handler fw_dnld_timeout() could be running, nfcmrvl_nci_unregister_dev()
> >>> will check the flag "fw_download_in_progress" which is already set to false and nfcmrvl_fw_dnld_abort()
> >>> in nfcmrvl_nci_unregister_dev() will not execute.
> >>
> >> I am sorry, but you cannot move code around hoping it will by itself
> >> solve synchronization issues.
> > 
> > I think this solution sove synchronization issues. If you still have any questions welcome to ask me.
> 
> No, you still do not get the pint. You cannot move code around because
> this itself does not solve missing synchronization primitives and
> related issues.

I think this solution could solve synchronization issues. If you still have any questions welcome to ask me.

Best regards,
Duoming Zhou

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ