linux-kernel - Re: Re: [PATCH] drivers: nfc: nfcmrvl: fix double free bug in nfcmrvl_nci_unregister

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <29021fb7.2b99.1801376f858.Coremail.duoming@zju.edu.cn>
Date:   Sun, 10 Apr 2022 20:33:25 +0800 (GMT+08:00)
From:   duoming@....edu.cn
To:     "Krzysztof Kozlowski" <krzysztof.kozlowski@...aro.org>
Cc:     linux-kernel@...r.kernel.org, netdev@...r.kernel.org,
        akpm@...ux-foundation.org, davem@...emloft.net,
        gregkh@...uxfoundation.org, alexander.deucher@....com,
        broonie@...nel.org
Subject: Re: Re: [PATCH] drivers: nfc: nfcmrvl: fix double free bug in
 nfcmrvl_nci_unregister_dev()

Hello,

On Sun, 10 Apr 2022 11:27:09 +0200 Krzysztof Kozlowski wrote:

> > There is a potential double bug in nfcmrvl usb driver between
> > unregister and resume operation.
> > 
> 
> Thank you for your patch. There is something to discuss/improve.
> 
> > The race that cause that double free bug can be shown as below:
> 
> Your patch solves the most visible race, but because of lack of locking,
> I believe race still might exist:
> 
>    (FREE)                   |      (USE)
>                             | nfcmrvl_resume
>                             |  nfcmrvl_submit_bulk_urb
>                             |   nfcmrvl_bulk_complete
>                             |    nfcmrvl_nci_recv_frame
>                             |     nfcmrvl_fw_dnld_recv_frame
>                             |      queue_work
>                             |       fw_dnld_rx_work
> nfcmrvl_disconnect          |
>  nfcmrvl_nci_unregister_dev |
>   nfcmrvl_fw_dnld_deinit    |
>    wait for the workqueue to finish |
>                             |        fw_dnld_over
>                             |         release_firmware
>                             |          kfree(fw);
>                             |          no synchronization //(1)
>   if (fw_download_in_progress)
>     - no synchronization, so CPU sees old value
>   nfcmrvl_fw_dnld_abort     |
>    fw_dnld_over             |         ...
>     if (priv->fw_dnld.fw)   |
>     release_firmware        |
>      kfree(fw); //(2)       |
>      ...                    |         fw = NULL;
> 
> The kfree() from (2) would still free old value. Even if fw=NULL happens
> earlier, it is not propagated back to the other CPU, unless there are
> some implicit barriers due to workqueue?
> 
> Is it safe then to rely on such implicit barriers from workqueue?

I think it is safe to rely on such barriers from destroy_workqueue(). 
The destroy_workqueue will wait all work items to finish, the code
behind it will not execute until all work items are finished.
The progress is shown below:

destroy_workqueue()-->drain_workqueue()-->flush_workqueue()-->wait_for_completion().

The function drain_workqueue() will wait until the workqueue becomes empty.
It sets wq->flags to __WQ_DRAINING, this could ensure the new coming work items
will not be queued into the draining workqueue. Because __queue_work will check
the __WQ_DRAINING flag. 

Then drain_workqueue() calls flush_workqueue() to ensure that any scheduled work
has run to completion. Because wait_for_completion() is called in flush_workqueue(),
it will block current thread and wait work items to completion. 

Best regards,
Duoming Zhou.