[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <960501c2-5ab2-4c81-86ac-a4477c0f708a@linux.microsoft.com>
Date: Thu, 27 Feb 2025 11:54:23 +0530
From: Naman Jain <namjain@...ux.microsoft.com>
To: Greg Kroah-Hartman <gregkh@...uxfoundation.org>
Cc: "K . Y . Srinivasan" <kys@...rosoft.com>,
Haiyang Zhang <haiyangz@...rosoft.com>, Wei Liu <wei.liu@...nel.org>,
Dexuan Cui <decui@...rosoft.com>,
Stephen Hemminger <stephen@...workplumber.org>,
linux-hyperv@...r.kernel.org, linux-kernel@...r.kernel.org,
stable@...nel.org, Saurabh Sengar <ssengar@...ux.microsoft.com>,
Michael Kelley <mhklinux@...look.com>, Long Li <longli@...rosoft.com>
Subject: Re: [PATCH] uio_hv_generic: Fix sysfs creation path for ring buffer
On 2/26/2025 8:03 PM, Greg Kroah-Hartman wrote:
> On Wed, Feb 26, 2025 at 05:51:46PM +0530, Naman Jain wrote:
>>
>>
>> On 2/26/2025 3:33 PM, Greg Kroah-Hartman wrote:
>>> On Wed, Feb 26, 2025 at 10:43:41AM +0530, Naman Jain wrote:
>>>>
>>>>
>>>> On 2/25/2025 2:09 PM, Greg Kroah-Hartman wrote:
>>>>> On Tue, Feb 25, 2025 at 02:04:43PM +0530, Naman Jain wrote:
>>>>>>
>>>>>>
>>>>>> On 2/25/2025 11:42 AM, Greg Kroah-Hartman wrote:
>>>>>>> On Tue, Feb 25, 2025 at 10:50:01AM +0530, Naman Jain wrote:
>>>>>>>> On regular bootup, devices get registered to vmbus first, so when
>>>>>>>> uio_hv_generic driver for a particular device type is probed,
>>>>>>>> the device is already initialized and added, so sysfs creation in
>>>>>>>> uio_hv_generic probe works fine. However, when device is removed
>>>>>>>> and brought back, the channel rescinds and device again gets
>>>>>>>> registered to vmbus. However this time, the uio_hv_generic driver is
>>>>>>>> already registered to probe for that device and in this case sysfs
>>>>>>>> creation is tried before the device gets initialized completely.
>>>>>>>>
>>>>>>>> Fix this by moving the core logic of sysfs creation for ring buffer,
>>>>>>>> from uio_hv_generic to HyperV's vmbus driver, where rest of the sysfs
>>>>>>>> attributes for the channels are defined. While doing that, make use
>>>>>>>> of attribute groups and macros, instead of creating sysfs directly,
>>>>>>>> to ensure better error handling and code flow.
>>
>> <snip>
>>
>>>>>>>> +static int hv_mmap_ring_buffer_wrapper(struct file *filp, struct kobject *kobj,
>>>>>>>> + const struct bin_attribute *attr,
>>>>>>>> + struct vm_area_struct *vma)
>>>>>>>> +{
>>>>>>>> + struct vmbus_channel *channel = container_of(kobj, struct vmbus_channel, kobj);
>>>>>>>> +
>>>>>>>> + if (!channel->mmap_ring_buffer)
>>>>>>>> + return -ENODEV;
>>>>>>>> + return channel->mmap_ring_buffer(channel, vma);
>>>>>>>
>>>>>>> What is preventing mmap_ring_buffer from being set to NULL right after
>>>>>>> checking it and then calling it here? I see no locks here or where you
>>>>>>> are assigning this variable at all, so what is preventing these types of
>>>>>>> races?
>>>>>>>
>>>>>>> thanks,
>>>>>>>
>>>>>>> greg k-h
>>>>>>
>>>>>> Thank you so much for reviewing.
>>>>>> I spent some time to understand if this race condition can happen and it
>>>>>> seems execution flow is pretty sequential, for a particular channel of a
>>>>>> device.
>>>>>>
>>>>>> Unless hv_uio_remove (which makes channel->mmap_ring_buffer NULL) can be
>>>>>> called in parallel to hv_uio_probe (which had set
>>>>>> channel->mmap_ring_buffer to non NULL), I doubt race can happen here.
>>>>>>
>>>>>> Code Flow: (R, W-> Read, Write to channel->mmap_ring_buffer)
>>>>>>
>>>>>> vmbus_device_register
>>>>>> device_register
>>>>>> hv_uio_probe
>>>>>> hv_create_ring_sysfs (W to non NULL)
>>>>>> sysfs_update_group
>>>>>> vmbus_chan_attr_is_visible (R)
>>>>>> vmbus_add_channel_kobj
>>>>>> sysfs_create_group
>>>>>> vmbus_chan_attr_is_visible (R)
>>>>>> hv_mmap_ring_buffer_wrapper (critical section)
>>>>>>
>>>>>> hv_uio_remove
>>>>>> hv_remove_ring_sysfs (W to NULL)
>>>>>
>>>>> Yes, and right in here someone mmaps the file.
>>>>>
>>>>> I think you can race here, no locks at all feels wrong.
>>>>>
>>>>> Messing with sysfs groups and files like this is rough, and almost never
>>>>> a good idea, why can't you just do this all at once with the default
>>>>> groups, why is this being added/removed out-of-band?
>>>>>
>>>>> thanks,
>>>>>
>>>>> greg k-h
>>>>
>>>> The decision to avoid creating a "ring" sysfs attribute by default
>>>> likely stems from a specific use case where it wasn't needed for every
>>>> device. By creating it automatically, it keeps the uio_hv_generic
>>>> driver simpler and helps prevent potential race conditions. However, it
>>>> has an added cost of having ring buffer for all the channels, where it
>>>> is not required. I am trying to find if there are any more implications
>>>> of it.
>>>
>>> You do know about the "is_visable" attribute callback, right? Why not
>>> just use that instead of manually mucking around with the
>>> adding/removing of sysfs attributes at all? That is what it was
>>> designed for.
>>>
>>> thanks,
>>>
>>> greg k-h
>>
>> Hi Greg,
>> Yes, I am utilizing that in my patch. For differentiating channels of a
>> uio_hv_generic device, and for *selectively* creating sysfs, we
>> introduced this field in channel struct "channel->mmap_ring_buffer",
>> which we were setting only in uio_hv_generic. But, by the time we set
>> this in uio_hv_generic driver, the sysfs creation has already gone
>> through and sysfs doesn't get updated dynamically. That's where there
>> was a need to call sysfs_update_group. I thought the better place to
>> keep sysfs_update_group would be in vmbus driver, where we are creating
>> the original sysfs entries, hence I had to add the wrapper functions.
>> This led us to the race condition we are trying to address now.
>>
>>
>> @@ -1838,6 +1872,10 @@ static umode_t vmbus_chan_attr_is_visible(struct
>> kobject *kobj,
>> attr == &chan_attr_monitor_id.attr))
>> return 0;
>>
>> + /* Hide ring attribute if channel's mmap_ring_buffer function is not yet
>> initialised */
>> + if (attr == &chan_attr_ring_buffer.attr && !channel->mmap_ring_buffer)
>> + return 0;
>> +
>> return attr->mode;
>
> Ok, that's good. BUT you need to change the detection of this to be
> before the device is set up in the driver. Why can't you do it in the
> device creation logic itself instead of after-the-fact when you will
> ALWAYS race with userspace?
>
> thanks,
>
> greg k-h
Sure, will check more on this. Thanks.
Regards,
Naman
Powered by blists - more mailing lists