[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4abf15ac-de18-48d4-9420-19d40f26fdd2@linux.microsoft.com>
Date: Thu, 31 Jul 2025 21:13:27 +0530
From: Naman Jain <namjain@...ux.microsoft.com>
To: Michael Kelley <mhklinux@...look.com>,
Thomas Weißschuh <linux@...ssschuh.net>
Cc: "stable@...r.kernel.org" <stable@...r.kernel.org>,
Wei Liu <wei.liu@...nel.org>, Dexuan Cui <decui@...rosoft.com>,
"K . Y . Srinivasan" <kys@...rosoft.com>,
Haiyang Zhang <haiyangz@...rosoft.com>, Long Li <longli@...rosoft.com>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
"linux-hyperv@...r.kernel.org" <linux-hyperv@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 6.12] Drivers: hv: Make the sysfs node size for the ring
buffer dynamic
On 7/30/2025 1:15 AM, Michael Kelley wrote:
> From: Thomas Weißschuh <linux@...ssschuh.net> Sent: Tuesday, July 29, 2025 11:47 AM
>>
>> On 2025-07-29 18:39:45+0000, Michael Kelley wrote:
>>> From: Naman Jain <namjain@...ux.microsoft.com> Sent: Wednesday, July 23, 2025 12:02 AM
>>>>
>>>> The ring buffer size varies across VMBus channels. The size of sysfs
>>>> node for the ring buffer is currently hardcoded to 4 MB. Userspace
>>>> clients either use fstat() or hardcode this size for doing mmap().
>>>> To address this, make the sysfs node size dynamic to reflect the
>>>> actual ring buffer size for each channel. This will ensure that
>>>> fstat() on ring sysfs node always returns the correct size of
>>>> ring buffer.
>>>>
>>>> This is a backport of the upstream commit
>>>> 65995e97a1ca ("Drivers: hv: Make the sysfs node size for the ring buffer dynamic")
>>>> with modifications, as the original patch has missing dependencies on
>>>> kernel v6.12.x. The structure "struct attribute_group" does not have
>>>> bin_size field in v6.12.x kernel so the logic of configuring size of
>>>> sysfs node for ring buffer has been moved to
>>>> vmbus_chan_bin_attr_is_visible().
>>>>
>>>> Original change was not a fix, but it needs to be backported to fix size
>>>> related discrepancy caused by the commit mentioned in Fixes tag.
>>>>
>>>> Fixes: bf1299797c3c ("uio_hv_generic: Align ring size to system page")
>>>> Cc: <stable@...r.kernel.org> # 6.12.x
>>>> Signed-off-by: Naman Jain <namjain@...ux.microsoft.com>
>>>> ---
>>>>
>>>> This change won't apply on older kernels currently due to missing
>>>> dependencies. I will take care of them after this goes in.
>>>>
>>>> I did not retain any Reviewed-by or Tested-by tags, since the code has
>>>> changed completely, while the functionality remains same.
>>>> Requesting Michael, Dexuan, Wei to please review again.
>>>>
>>>> ---
>>>> drivers/hv/vmbus_drv.c | 2 +-
>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
>>>> index 1f519e925f06..616e63fb2f15 100644
>>>> --- a/drivers/hv/vmbus_drv.c
>>>> +++ b/drivers/hv/vmbus_drv.c
>>>> @@ -1810,7 +1810,6 @@ static struct bin_attribute chan_attr_ring_buffer = {
>>>> .name = "ring",
>>>> .mode = 0600,
>>>> },
>>>> - .size = 2 * SZ_2M,
>>>> .mmap = hv_mmap_ring_buffer_wrapper,
>>>> };
>>>> static struct attribute *vmbus_chan_attrs[] = {
>>>> @@ -1866,6 +1865,7 @@ static umode_t vmbus_chan_bin_attr_is_visible(struct kobject *kobj,
>>>> /* Hide ring attribute if channel's ring_sysfs_visible is set to false */
>>>> if (attr == &chan_attr_ring_buffer && !channel->ring_sysfs_visible)
>>>> return 0;
>>>> + attr->size = channel->ringbuffer_pagecount << PAGE_SHIFT;
>>>
>>> Suppose a VM has two devices using UIO, such as DPDK network device with
>>> a 2MiB ring buffer, and an fcopy device with a 16KiB ring buffer. Both devices
>>> will be referencing the same static instance of chan_attr_ring_buffer, and the
>>> .size field it contains. The above statement will change that .size field
>>> between 2MiB and 16KiB as the /sys entries are initially populated, and as
>>> the visibility is changed if the devices are removed and re-instantiated (which
>>> is much more likely for fcopy than for netvsc). That changing of the .size
>>> value will probably work most of the time, but it's racy if two devices with
>>> different ring buffer sizes get instantiated or re-instantiated at the same time.
>>
>> IIRC it works out in practice. While the global attribute instance is indeed
>> modified back-and-forth the size from it will be *copied* into kernfs
>> after each recalculation. So each attribute should get its own correct size.
>
> The race I see is in fs/sysfs/group.c in the create_files() function. It calls the
> is_bin_visible() function, which this patch uses to set the .size field of the static
> attribute. Then creates_files() calls sysfs_add_bin_file_mode_ns(), which reads
> the .size field and uses it to create the sysfs entry. But if create_files() is called
> in parallel on two different kobjs of the same type, but with different values
> for the .size field, the second create_files() could overwrite the static .size
> field after the first create_files() has set it, but before it has used it. I don't
> see any global lock that would prevent such, though maybe I'm missing
> something.
>
>>
>>> Unfortunately, I don't see a fix, short of backporting support for the
>>> .bin_size function, as this is exactly the problem that function solves.
>>
>> It should work out in practice. (I introduced the .bin_size function)
>
> The race I describe is unlikely, particularly if attribute groups are created
> once and then not disturbed. But note that the Hyper-V fcopy group can
> get updated in a running VM via update_sysfs_group(), which also calls
> create_files(). Such an update might marginally increase the potential for
> the race and for getting the wrong size. Still, I agree it should work out
> in practice.
>
> Michael
>
>>
>> Thomas
hi Thomas,
Would it be possible to port your changes on 6.12 kernel, to avoid such
race conditions? Or if it has a lot of dependencies, or if you have a
follow-up advice, please let us us know.
Thanks,
Naman
Powered by blists - more mailing lists