lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID:
 <SN6PR02MB415792B00B021D4DB76A6014D425A@SN6PR02MB4157.namprd02.prod.outlook.com>
Date: Tue, 29 Jul 2025 19:45:16 +0000
From: Michael Kelley <mhklinux@...look.com>
To: Thomas Weißschuh <linux@...ssschuh.net>
CC: Naman Jain <namjain@...ux.microsoft.com>, "stable@...r.kernel.org"
	<stable@...r.kernel.org>, Wei Liu <wei.liu@...nel.org>, Dexuan Cui
	<decui@...rosoft.com>, "K . Y . Srinivasan" <kys@...rosoft.com>, Haiyang
 Zhang <haiyangz@...rosoft.com>, Long Li <longli@...rosoft.com>, Greg
 Kroah-Hartman <gregkh@...uxfoundation.org>, "linux-hyperv@...r.kernel.org"
	<linux-hyperv@...r.kernel.org>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>
Subject: RE: [PATCH 6.12] Drivers: hv: Make the sysfs node size for the ring
 buffer dynamic

From: Thomas Weißschuh <linux@...ssschuh.net> Sent: Tuesday, July 29, 2025 11:47 AM
> 
> On 2025-07-29 18:39:45+0000, Michael Kelley wrote:
> > From: Naman Jain <namjain@...ux.microsoft.com> Sent: Wednesday, July 23, 2025 12:02 AM
> > >
> > > The ring buffer size varies across VMBus channels. The size of sysfs
> > > node for the ring buffer is currently hardcoded to 4 MB. Userspace
> > > clients either use fstat() or hardcode this size for doing mmap().
> > > To address this, make the sysfs node size dynamic to reflect the
> > > actual ring buffer size for each channel. This will ensure that
> > > fstat() on ring sysfs node always returns the correct size of
> > > ring buffer.
> > >
> > > This is a backport of the upstream commit
> > > 65995e97a1ca ("Drivers: hv: Make the sysfs node size for the ring buffer dynamic")
> > > with modifications, as the original patch has missing dependencies on
> > > kernel v6.12.x. The structure "struct attribute_group" does not have
> > > bin_size field in v6.12.x kernel so the logic of configuring size of
> > > sysfs node for ring buffer has been moved to
> > > vmbus_chan_bin_attr_is_visible().
> > >
> > > Original change was not a fix, but it needs to be backported to fix size
> > > related discrepancy caused by the commit mentioned in Fixes tag.
> > >
> > > Fixes: bf1299797c3c ("uio_hv_generic: Align ring size to system page")
> > > Cc: <stable@...r.kernel.org> # 6.12.x
> > > Signed-off-by: Naman Jain <namjain@...ux.microsoft.com>
> > > ---
> > >
> > > This change won't apply on older kernels currently due to missing
> > > dependencies. I will take care of them after this goes in.
> > >
> > > I did not retain any Reviewed-by or Tested-by tags, since the code has
> > > changed completely, while the functionality remains same.
> > > Requesting Michael, Dexuan, Wei to please review again.
> > >
> > > ---
> > >  drivers/hv/vmbus_drv.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
> > > index 1f519e925f06..616e63fb2f15 100644
> > > --- a/drivers/hv/vmbus_drv.c
> > > +++ b/drivers/hv/vmbus_drv.c
> > > @@ -1810,7 +1810,6 @@ static struct bin_attribute chan_attr_ring_buffer = {
> > >  		.name = "ring",
> > >  		.mode = 0600,
> > >  	},
> > > -	.size = 2 * SZ_2M,
> > >  	.mmap = hv_mmap_ring_buffer_wrapper,
> > >  };
> > >  static struct attribute *vmbus_chan_attrs[] = {
> > > @@ -1866,6 +1865,7 @@ static umode_t vmbus_chan_bin_attr_is_visible(struct kobject *kobj,
> > >  	/* Hide ring attribute if channel's ring_sysfs_visible is set to false */
> > >  	if (attr ==  &chan_attr_ring_buffer && !channel->ring_sysfs_visible)
> > >  		return 0;
> > > +	attr->size = channel->ringbuffer_pagecount << PAGE_SHIFT;
> >
> > Suppose a VM has two devices using UIO, such as DPDK network device with
> > a 2MiB ring buffer, and an fcopy device with a 16KiB ring buffer. Both devices
> > will be referencing the same static instance of chan_attr_ring_buffer, and the
> > .size field it contains. The above statement will change that .size field
> > between 2MiB and 16KiB as the /sys entries are initially populated, and as
> > the visibility is changed if the devices are removed and re-instantiated (which
> > is much more likely for fcopy than for netvsc). That changing of the .size
> > value will probably work most of the time, but it's racy if two devices with
> > different ring buffer sizes get instantiated or re-instantiated at the same time.
> 
> IIRC it works out in practice. While the global attribute instance is indeed
> modified back-and-forth the size from it will be *copied* into kernfs
> after each recalculation. So each attribute should get its own correct size.

The race I see is in fs/sysfs/group.c in the create_files() function. It calls the
is_bin_visible() function, which this patch uses to set the .size field of the static
attribute. Then creates_files() calls sysfs_add_bin_file_mode_ns(), which reads
the .size field and uses it to create the sysfs entry. But if create_files() is called
in parallel on two different kobjs of the same type, but with different values
for the .size field, the second create_files() could overwrite the static .size
field after the first create_files() has set it, but before it has used it. I don't
see any global lock that would prevent such, though maybe I'm missing
something.

> 
> > Unfortunately, I don't see a fix, short of backporting support for the
> > .bin_size function, as this is exactly the problem that function solves.
> 
> It should work out in practice. (I introduced the .bin_size function)

The race I describe is unlikely, particularly if attribute groups are created
once and then not disturbed. But note that the Hyper-V fcopy group can
get updated in a running VM via update_sysfs_group(), which also calls
create_files(). Such an update might marginally increase the potential for
the race and for getting the wrong size. Still, I agree it should work out
in practice.

Michael

> 
> Thomas

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ