[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <42487b67.fb6d.1808e537a6f.Coremail.duoming@zju.edu.cn>
Date: Wed, 4 May 2022 17:07:57 +0800 (GMT+08:00)
From: duoming@....edu.cn
To: "Krzysztof Kozlowski" <krzysztof.kozlowski@...aro.org>
Cc: linux-kernel@...r.kernel.org, kuba@...nel.org,
gregkh@...uxfoundation.org, davem@...emloft.net,
edumazet@...gle.com, pabeni@...hat.com, alexander.deucher@....com,
akpm@...ux-foundation.org, broonie@...nel.org,
netdev@...r.kernel.org, linma@....edu.cn
Subject: Re: [PATCH net v5 2/2] nfc: nfcmrvl: main: reorder destructive
operations in nfcmrvl_nci_unregister_dev to avoid bugs
Hello,
On Wed, 4 May 2022 08:32:15 +0200 Krzysztof wrote:
> >>>>> There are destructive operations such as nfcmrvl_fw_dnld_abort and
> >>>>> gpio_free in nfcmrvl_nci_unregister_dev. The resources such as firmware,
> >>>>> gpio and so on could be destructed while the upper layer functions such as
> >>>>> nfcmrvl_fw_dnld_start and nfcmrvl_nci_recv_frame is executing, which leads
> >>>>> to double-free, use-after-free and null-ptr-deref bugs.
> >>>>>
> >>>>> There are three situations that could lead to double-free bugs.
> >>>>>
> >>>>> The first situation is shown below:
> >>>>>
> >>>>> (Thread 1) | (Thread 2)
> >>>>> nfcmrvl_fw_dnld_start |
> >>>>> ... | nfcmrvl_nci_unregister_dev
> >>>>> release_firmware() | nfcmrvl_fw_dnld_abort
> >>>>> kfree(fw) //(1) | fw_dnld_over
> >>>>> | release_firmware
> >>>>> ... | kfree(fw) //(2)
> >>>>> | ...
> >>>>>
> >>>>> The second situation is shown below:
> >>>>>
> >>>>> (Thread 1) | (Thread 2)
> >>>>> nfcmrvl_fw_dnld_start |
> >>>>> ... |
> >>>>> mod_timer |
> >>>>> (wait a time) |
> >>>>> fw_dnld_timeout | nfcmrvl_nci_unregister_dev
> >>>>> fw_dnld_over | nfcmrvl_fw_dnld_abort
> >>>>> release_firmware | fw_dnld_over
> >>>>> kfree(fw) //(1) | release_firmware
> >>>>> ... | kfree(fw) //(2)
> >>>>
> >>>> How exactly the case here is being prevented?
> >>>>
> >>>> If nfcmrvl_nci_unregister_dev() happens slightly earlier, before
> >>>> fw_dnld_timeout() on the left side (T1), the T1 will still hit it, won't it?
> >>>
> >>> I think it could be prevented. We use nci_unregister_device() to synchronize, if the
> >>> firmware download routine is running, the cleanup routine will wait it to finish.
> >>> The flag "fw_download_in_progress" will be set to false, if the the firmware download
> >>> routine is finished.
> >>
> >> fw_download_in_progress is not synchronized in
> >> nfcmrvl_nci_unregister_dev(), so even if fw_dnld_timeout() set it to
> >> true, the nfcmrvl_nci_unregister_dev() happening concurrently will not
> >> see updated fw_download_in_progress.
> >
> > The fw_download_in_progress is set to false in nfc_fw_download(). The nfc_fw_download() is
> > synchronized with nfc_unregister_device().
>
> No, it is not. There is no synchronization primitive in
> nfc_unregister_device(), at least explicitly.
There is device_lock() in nfc_fw_download() which could synchronize with nfc_unregister_device().
The code in nfc_unregister_device() is shown below:
nfc_unregister_device()
...
device_lock(&dev->dev);
...
dev->shutting_down = true;
device_unlock(&dev->dev);
> > If nfc_fw_download() is running, nfc_unregister_device()
> > will wait nfc_fw_download() to finish. So the nfcmrvl_nci_unregister_dev() could see the updated
> > fw_download_in_progress. The process is shown below:
> >
> > (Thread 1) | (Thread 2)
> > nfcmrvl_nci_unregister_dev | nfc_fw_download
> > nci_unregister_device | ...
> > | device_lock()
> > ... | dev->fw_download_in_progress = false; //(1)
> > | device_unlock()
> > nfc_unregister_device |
> > if (priv->ndev->nfc_dev->fw_download_in_progress) //(2) |
> > nfcmrvl_fw_dnld_abort(priv); //not execute |
> >
> > We set fw_download_in_progress to false in position (1) and the check in position (2) will fail,
> > the nfcmrvl_fw_dnld_abort() in nfcmrvl_nci_unregister_dev() will not execute. So the double-free
> > bugs could be prevented.
>
> You just repeated the same not answering the question. The
> fw_download_in_progress at point (2) can be still true, on that CPU. I
> explain it third time so let me rephrase it - the
> fw_download_in_progress can be reordered by compiler or CPU to:
>
> T1 | T2
> nfcmrvl_nci_unregister_dev()
> nci_unregister_device()
> var = fw_download_in_progress; (true)
> | nfc_fw_download
> | device_lock
> | dev->fw_download = false;
> | device_unlock
> if (var) |
> nfcmrvl_fw_dnld_abort(priv); |
>
> Every write barrier must be paired with read barrier. Every lock on one
> access to variable, must be paired with same lock on other access to
> variable .
There is paired write barrier and read barrier "device_lock(&dev->dev)" between
nfc_unregister_device() and nfc_fw_download(). The shutting_down flag of nfc_dev
protected by device_lock() is set to true in position(2) and check in position(1).
(Thread 1) | (Thread 2)
nfcmrvl_nci_unregister_dev | nfc_fw_download
nci_unregister_device | ...
| device_lock(&dev->dev)
... | if (dev->shutting_down) //(1)
| goto error;
| ...
| dev->fw_download_in_progress = false; //(3)
| device_unlock(&dev->dev)
nfc_unregister_device |
device_lock(&dev->dev); |
dev->shutting_down = true; //(2) |
device_unlock(&dev->dev); |
if (priv->ndev->nfc_dev->fw_download_in_progress)//(4)|
nfcmrvl_fw_dnld_abort(priv); |
If nfc_fw_download() is running, nfc_unregister_device() will wait nfc_fw_download() to finish,
and the nfc_dev->fw_download_in_progress is set to false in position (3). The check in position(4)
is false, because we have set nfc_dev->fw_download_in_progress to false in position(3).
If we set shutting_down to true in position(2) before check in position(1) the nfc_fw_download()
will goto error.
> >>> Although the timer handler fw_dnld_timeout() could be running, nfcmrvl_nci_unregister_dev()
> >>> will check the flag "fw_download_in_progress" which is already set to false and nfcmrvl_fw_dnld_abort()
> >>> in nfcmrvl_nci_unregister_dev() will not execute.
> >>
> >> I am sorry, but you cannot move code around hoping it will by itself
> >> solve synchronization issues.
> >
> > I think this solution sove synchronization issues. If you still have any questions welcome to ask me.
>
> No, you still do not get the pint. You cannot move code around because
> this itself does not solve missing synchronization primitives and
> related issues.
I think this solution could solve synchronization issues. If you still have any questions welcome to ask me.
Best regards,
Duoming Zhou
Powered by blists - more mailing lists