[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b6994af6-2274-4671-d156-1609f349804d@huawei.com>
Date: Tue, 21 Feb 2023 20:48:01 +0800
From: Weifeng Su <suweifeng1@...wei.com>
To: Greg KH <gregkh@...uxfoundation.org>
CC: <mst@...hat.com>, <linux-kernel@...r.kernel.org>,
<shikemeng@...wei.com>, <liuzhiqiang26@...wei.com>,
<linfeilong@...wei.com>, <zhanghongtao22@...wei.com>
Subject: Re: [PATCH v2] uio:uio_pci_generic:Don't clear master bit when the
process does not exit
On 2023/2/21 2:11, Greg KH wrote:
> On Tue, Feb 21, 2023 at 01:10:44AM +0800, Su Weifeng wrote:
>> From: Weifeng Su <suweifeng1@...wei.com>
>>
>> The /dev/uioX device has concurrent operations in a few scenarios.
>>
>> For example, when a process using the device exits abnormally,
>> the management program starts the same process to operate the device.
>> When the process exits and closes the /dev/uioX device,
>> the master bit of the device is cleared. In this case, if the
>> new process is issuing commands, I/Os are suspended and cannot be
>> automatically recovered.
>>
>> Therefore, reference counting is added to clear the master bit
>> only when the last process exits.
>>
>> Signed-off-by: Weifeng Su <suweifeng1@...wei.com>
>> ---
>> The difference between the first patch and the first patch is that
>> the reference counting operation is performed using the atomic semantics,
>> just like other drivers under UIO:
>> cdfa835c6e5e87d145f("uio_hv_generic: defer opening vmbus until first use").
>
> And I would claim that that change too is incorrect.
>
> Did you test this with dup()? Lots of open/close cycles on the same
> device node? Passing around the file descriptor?
The patch is verified to be able to fix the low-probability problem in
our scenario. At the same time, we also perform the dup test on the
latest kernel(6.2.0-rc8-g0f5e65cd8f9b-dirty) based on your suggestions.
The test code model is as follows:
...
int main()
{
int fd = open("/dev/uio0", O_RDWR);
if (fd < 0) {
printf("error in open /dev/uio0\n");
return -1;
}
int dup_fd = dup(fd);
while (true) {
sleep(1);
}
}
After kill the process, The test results are as follows:
[ 335.730916] swf call uio open
[ 338.307306] swf call uio release
dup does not cause uio_pci_generic.c:open or uio_pci_generic.c:release
reference counting exceptions.
PS: In this example, a print is added to the entries of uio_open and
uio_release of the driver to confirm the function invoking status.
After reading the "dup" code, we confirm that the struct file uses the
f_count protect the same file in the process. The file opened by the
"dup" only adds reference counting and does not invoke the open
operation of the driver. Disabling the fd only decrementes the f_count
count. The release function of the driver is invoked to clear resources
only when f_count is 0. Different processes use different struct files
to open the same file, and f_count cannot enable the constraint
function. In this case, the driver needs to handle the concurrency
problem of multiple processes,It's like this patch
cdfa835c6e5e87d145f("uio_hv_generic: defer opening vmbus until first use")
>
> Logically, this is identical to your previous change, so why should it
> be accepted?
>
> Again, why not just use a real PCI driver for your device?
We use the uio_pci_generic driver because we use the DPDK, which is a
user-mode development platform on which you can develop the user-mode
driver.
>
> thanks,
>
> greg k-h
Best regards,
Weifeng Su
Powered by blists - more mailing lists