[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6aeef03a-eabb-e6d8-c100-9a74f3506f79@linaro.org>
Date: Wed, 4 May 2022 08:32:15 +0200
From: Krzysztof Kozlowski <krzysztof.kozlowski@...aro.org>
To: duoming@....edu.cn
Cc: linux-kernel@...r.kernel.org, kuba@...nel.org,
gregkh@...uxfoundation.org, davem@...emloft.net,
edumazet@...gle.com, pabeni@...hat.com, alexander.deucher@....com,
akpm@...ux-foundation.org, broonie@...nel.org,
netdev@...r.kernel.org, linma@....edu.cn
Subject: Re: [PATCH net v5 2/2] nfc: nfcmrvl: main: reorder destructive
operations in nfcmrvl_nci_unregister_dev to avoid bugs
On 02/05/2022 09:25, duoming@....edu.cn wrote:
>
>
>
>> -----原始邮件-----
>> 发件人: "Krzysztof Kozlowski" <krzysztof.kozlowski@...aro.org>
>> 发送时间: 2022-05-02 14:34:07 (星期一)
>> 收件人: duoming@....edu.cn
>> 抄送: linux-kernel@...r.kernel.org, kuba@...nel.org, gregkh@...uxfoundation.org, davem@...emloft.net, edumazet@...gle.com, pabeni@...hat.com, alexander.deucher@....com, akpm@...ux-foundation.org, broonie@...nel.org, netdev@...r.kernel.org, linma@....edu.cn
>> 主题: Re: [PATCH net v5 2/2] nfc: nfcmrvl: main: reorder destructive operations in nfcmrvl_nci_unregister_dev to avoid bugs
>>
>> On 29/04/2022 11:13, duoming@....edu.cn wrote:
>>> Hello,
>>>
>>> On Fri, 29 Apr 2022 09:27:48 +0200 Krzysztof wrote:
>>>
>>>>> There are destructive operations such as nfcmrvl_fw_dnld_abort and
>>>>> gpio_free in nfcmrvl_nci_unregister_dev. The resources such as firmware,
>>>>> gpio and so on could be destructed while the upper layer functions such as
>>>>> nfcmrvl_fw_dnld_start and nfcmrvl_nci_recv_frame is executing, which leads
>>>>> to double-free, use-after-free and null-ptr-deref bugs.
>>>>>
>>>>> There are three situations that could lead to double-free bugs.
>>>>>
>>>>> The first situation is shown below:
>>>>>
>>>>> (Thread 1) | (Thread 2)
>>>>> nfcmrvl_fw_dnld_start |
>>>>> ... | nfcmrvl_nci_unregister_dev
>>>>> release_firmware() | nfcmrvl_fw_dnld_abort
>>>>> kfree(fw) //(1) | fw_dnld_over
>>>>> | release_firmware
>>>>> ... | kfree(fw) //(2)
>>>>> | ...
>>>>>
>>>>> The second situation is shown below:
>>>>>
>>>>> (Thread 1) | (Thread 2)
>>>>> nfcmrvl_fw_dnld_start |
>>>>> ... |
>>>>> mod_timer |
>>>>> (wait a time) |
>>>>> fw_dnld_timeout | nfcmrvl_nci_unregister_dev
>>>>> fw_dnld_over | nfcmrvl_fw_dnld_abort
>>>>> release_firmware | fw_dnld_over
>>>>> kfree(fw) //(1) | release_firmware
>>>>> ... | kfree(fw) //(2)
>>>>
>>>> How exactly the case here is being prevented?
>>>>
>>>> If nfcmrvl_nci_unregister_dev() happens slightly earlier, before
>>>> fw_dnld_timeout() on the left side (T1), the T1 will still hit it, won't it?
>>>
>>> I think it could be prevented. We use nci_unregister_device() to synchronize, if the
>>> firmware download routine is running, the cleanup routine will wait it to finish.
>>> The flag "fw_download_in_progress" will be set to false, if the the firmware download
>>> routine is finished.
>>
>> fw_download_in_progress is not synchronized in
>> nfcmrvl_nci_unregister_dev(), so even if fw_dnld_timeout() set it to
>> true, the nfcmrvl_nci_unregister_dev() happening concurrently will not
>> see updated fw_download_in_progress.
>
> The fw_download_in_progress is set to false in nfc_fw_download(). The nfc_fw_download() is
> synchronized with nfc_unregister_device().
No, it is not. There is no synchronization primitive in
nfc_unregister_device(), at least explicitly.
> If nfc_fw_download() is running, nfc_unregister_device()
> will wait nfc_fw_download() to finish. So the nfcmrvl_nci_unregister_dev() could see the updated
> fw_download_in_progress. The process is shown below:
>
> (Thread 1) | (Thread 2)
> nfcmrvl_nci_unregister_dev | nfc_fw_download
> nci_unregister_device | ...
> | device_lock()
> ... | dev->fw_download_in_progress = false; //(1)
> | device_unlock()
> nfc_unregister_device |
> if (priv->ndev->nfc_dev->fw_download_in_progress) //(2) |
> nfcmrvl_fw_dnld_abort(priv); //not execute |
>
> We set fw_download_in_progress to false in position (1) and the check in position (2) will fail,
> the nfcmrvl_fw_dnld_abort() in nfcmrvl_nci_unregister_dev() will not execute. So the double-free
> bugs could be prevented.
You just repeated the same not answering the question. The
fw_download_in_progress at point (2) can be still true, on that CPU. I
explain it third time so let me rephrase it - the
fw_download_in_progress can be reordered by compiler or CPU to:
T1 | T2
nfcmrvl_nci_unregister_dev()
nci_unregister_device()
var = fw_download_in_progress; (true)
| nfc_fw_download
| device_lock
| dev->fw_download = false;
| device_unlock
if (var) |
nfcmrvl_fw_dnld_abort(priv); |
Every write barrier must be paired with read barrier. Every lock on one
access to variable, must be paired with same lock on other access to
variable .
>
>>> Although the timer handler fw_dnld_timeout() could be running, nfcmrvl_nci_unregister_dev()
>>> will check the flag "fw_download_in_progress" which is already set to false and nfcmrvl_fw_dnld_abort()
>>> in nfcmrvl_nci_unregister_dev() will not execute.
>>
>> I am sorry, but you cannot move code around hoping it will by itself
>> solve synchronization issues.
>
> I think this solution sove synchronization issues. If you still have any questions welcome to ask me.
No, you still do not get the pint. You cannot move code around because
this itself does not solve missing synchronization primitives and
related issues.
Best regards,
Krzysztof
Powered by blists - more mailing lists