[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <0a694947-809d-48b2-9138-d3f6175fe09d@linux.microsoft.com>
Date: Wed, 26 Feb 2025 17:51:46 +0530
From: Naman Jain <namjain@...ux.microsoft.com>
To: Greg Kroah-Hartman <gregkh@...uxfoundation.org>
Cc: "K . Y . Srinivasan" <kys@...rosoft.com>,
Haiyang Zhang <haiyangz@...rosoft.com>, Wei Liu <wei.liu@...nel.org>,
Dexuan Cui <decui@...rosoft.com>,
Stephen Hemminger <stephen@...workplumber.org>,
linux-hyperv@...r.kernel.org, linux-kernel@...r.kernel.org,
stable@...nel.org, Saurabh Sengar <ssengar@...ux.microsoft.com>,
Michael Kelley <mhklinux@...look.com>, Long Li <longli@...rosoft.com>
Subject: Re: [PATCH] uio_hv_generic: Fix sysfs creation path for ring buffer
On 2/26/2025 3:33 PM, Greg Kroah-Hartman wrote:
> On Wed, Feb 26, 2025 at 10:43:41AM +0530, Naman Jain wrote:
>>
>>
>> On 2/25/2025 2:09 PM, Greg Kroah-Hartman wrote:
>>> On Tue, Feb 25, 2025 at 02:04:43PM +0530, Naman Jain wrote:
>>>>
>>>>
>>>> On 2/25/2025 11:42 AM, Greg Kroah-Hartman wrote:
>>>>> On Tue, Feb 25, 2025 at 10:50:01AM +0530, Naman Jain wrote:
>>>>>> On regular bootup, devices get registered to vmbus first, so when
>>>>>> uio_hv_generic driver for a particular device type is probed,
>>>>>> the device is already initialized and added, so sysfs creation in
>>>>>> uio_hv_generic probe works fine. However, when device is removed
>>>>>> and brought back, the channel rescinds and device again gets
>>>>>> registered to vmbus. However this time, the uio_hv_generic driver is
>>>>>> already registered to probe for that device and in this case sysfs
>>>>>> creation is tried before the device gets initialized completely.
>>>>>>
>>>>>> Fix this by moving the core logic of sysfs creation for ring buffer,
>>>>>> from uio_hv_generic to HyperV's vmbus driver, where rest of the sysfs
>>>>>> attributes for the channels are defined. While doing that, make use
>>>>>> of attribute groups and macros, instead of creating sysfs directly,
>>>>>> to ensure better error handling and code flow.
<snip>
>>>>>> +static int hv_mmap_ring_buffer_wrapper(struct file *filp, struct kobject *kobj,
>>>>>> + const struct bin_attribute *attr,
>>>>>> + struct vm_area_struct *vma)
>>>>>> +{
>>>>>> + struct vmbus_channel *channel = container_of(kobj, struct vmbus_channel, kobj);
>>>>>> +
>>>>>> + if (!channel->mmap_ring_buffer)
>>>>>> + return -ENODEV;
>>>>>> + return channel->mmap_ring_buffer(channel, vma);
>>>>>
>>>>> What is preventing mmap_ring_buffer from being set to NULL right after
>>>>> checking it and then calling it here? I see no locks here or where you
>>>>> are assigning this variable at all, so what is preventing these types of
>>>>> races?
>>>>>
>>>>> thanks,
>>>>>
>>>>> greg k-h
>>>>
>>>> Thank you so much for reviewing.
>>>> I spent some time to understand if this race condition can happen and it
>>>> seems execution flow is pretty sequential, for a particular channel of a
>>>> device.
>>>>
>>>> Unless hv_uio_remove (which makes channel->mmap_ring_buffer NULL) can be
>>>> called in parallel to hv_uio_probe (which had set
>>>> channel->mmap_ring_buffer to non NULL), I doubt race can happen here.
>>>>
>>>> Code Flow: (R, W-> Read, Write to channel->mmap_ring_buffer)
>>>>
>>>> vmbus_device_register
>>>> device_register
>>>> hv_uio_probe
>>>> hv_create_ring_sysfs (W to non NULL)
>>>> sysfs_update_group
>>>> vmbus_chan_attr_is_visible (R)
>>>> vmbus_add_channel_kobj
>>>> sysfs_create_group
>>>> vmbus_chan_attr_is_visible (R)
>>>> hv_mmap_ring_buffer_wrapper (critical section)
>>>>
>>>> hv_uio_remove
>>>> hv_remove_ring_sysfs (W to NULL)
>>>
>>> Yes, and right in here someone mmaps the file.
>>>
>>> I think you can race here, no locks at all feels wrong.
>>>
>>> Messing with sysfs groups and files like this is rough, and almost never
>>> a good idea, why can't you just do this all at once with the default
>>> groups, why is this being added/removed out-of-band?
>>>
>>> thanks,
>>>
>>> greg k-h
>>
>> The decision to avoid creating a "ring" sysfs attribute by default
>> likely stems from a specific use case where it wasn't needed for every
>> device. By creating it automatically, it keeps the uio_hv_generic
>> driver simpler and helps prevent potential race conditions. However, it
>> has an added cost of having ring buffer for all the channels, where it
>> is not required. I am trying to find if there are any more implications
>> of it.
>
> You do know about the "is_visable" attribute callback, right? Why not
> just use that instead of manually mucking around with the
> adding/removing of sysfs attributes at all? That is what it was
> designed for.
>
> thanks,
>
> greg k-h
Hi Greg,
Yes, I am utilizing that in my patch. For differentiating channels of a
uio_hv_generic device, and for *selectively* creating sysfs, we
introduced this field in channel struct "channel->mmap_ring_buffer",
which we were setting only in uio_hv_generic. But, by the time we set
this in uio_hv_generic driver, the sysfs creation has already gone
through and sysfs doesn't get updated dynamically. That's where there
was a need to call sysfs_update_group. I thought the better place to
keep sysfs_update_group would be in vmbus driver, where we are creating
the original sysfs entries, hence I had to add the wrapper functions.
This led us to the race condition we are trying to address now.
@@ -1838,6 +1872,10 @@ static umode_t vmbus_chan_attr_is_visible(struct
kobject *kobj,
attr == &chan_attr_monitor_id.attr))
return 0;
+ /* Hide ring attribute if channel's mmap_ring_buffer function is not
yet initialised */
+ if (attr == &chan_attr_ring_buffer.attr && !channel->mmap_ring_buffer)
+ return 0;
+
return attr->mode;
}
Thanks,
Naman
Powered by blists - more mailing lists