lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a7716784-face-44ff-837d-50f7ca79f0e9@linux.microsoft.com>
Date: Tue, 11 Mar 2025 10:15:25 +0530
From: Naman Jain <namjain@...ux.microsoft.com>
To: Greg Kroah-Hartman <gregkh@...uxfoundation.org>
Cc: "K . Y . Srinivasan" <kys@...rosoft.com>,
 Haiyang Zhang <haiyangz@...rosoft.com>, Wei Liu <wei.liu@...nel.org>,
 Dexuan Cui <decui@...rosoft.com>,
 Stephen Hemminger <stephen@...workplumber.org>,
 linux-hyperv@...r.kernel.org, linux-kernel@...r.kernel.org,
 stable@...nel.org, Saurabh Sengar <ssengar@...ux.microsoft.com>,
 Michael Kelley <mhklinux@...look.com>, Long Li <longli@...rosoft.com>
Subject: Re: [PATCH] uio_hv_generic: Fix sysfs creation path for ring buffer



On 3/5/2025 12:36 PM, Naman Jain wrote:
> 
> 
> On 2/27/2025 11:54 AM, Naman Jain wrote:
>>
>>
>> On 2/26/2025 8:03 PM, Greg Kroah-Hartman wrote:
>>> On Wed, Feb 26, 2025 at 05:51:46PM +0530, Naman Jain wrote:
>>>>
>>>>
>>>> On 2/26/2025 3:33 PM, Greg Kroah-Hartman wrote:
>>>>> On Wed, Feb 26, 2025 at 10:43:41AM +0530, Naman Jain wrote:
>>>>>>
>>>>>>
>>>>>> On 2/25/2025 2:09 PM, Greg Kroah-Hartman wrote:
>>>>>>> On Tue, Feb 25, 2025 at 02:04:43PM +0530, Naman Jain wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 2/25/2025 11:42 AM, Greg Kroah-Hartman wrote:
>>>>>>>>> On Tue, Feb 25, 2025 at 10:50:01AM +0530, Naman Jain wrote:
>>>>>>>>>> On regular bootup, devices get registered to vmbus first, so when
>>>>>>>>>> uio_hv_generic driver for a particular device type is probed,
>>>>>>>>>> the device is already initialized and added, so sysfs creation in
>>>>>>>>>> uio_hv_generic probe works fine. However, when device is removed
>>>>>>>>>> and brought back, the channel rescinds and device again gets
>>>>>>>>>> registered to vmbus. However this time, the uio_hv_generic 
>>>>>>>>>> driver is
>>>>>>>>>> already registered to probe for that device and in this case 
>>>>>>>>>> sysfs
>>>>>>>>>> creation is tried before the device gets initialized completely.
>>>>>>>>>>
>>>>>>>>>> Fix this by moving the core logic of sysfs creation for ring 
>>>>>>>>>> buffer,
>>>>>>>>>> from uio_hv_generic to HyperV's vmbus driver, where rest of 
>>>>>>>>>> the sysfs
>>>>>>>>>> attributes for the channels are defined. While doing that, 
>>>>>>>>>> make use
>>>>>>>>>> of attribute groups and macros, instead of creating sysfs 
>>>>>>>>>> directly,
>>>>>>>>>> to ensure better error handling and code flow.
>>>>
>>>> <snip>
>>>>
>>>>>>>>>> +static int hv_mmap_ring_buffer_wrapper(struct file *filp, 
>>>>>>>>>> struct kobject *kobj,
>>>>>>>>>> +                       const struct bin_attribute *attr,
>>>>>>>>>> +                       struct vm_area_struct *vma)
>>>>>>>>>> +{
>>>>>>>>>> +    struct vmbus_channel *channel = container_of(kobj, struct 
>>>>>>>>>> vmbus_channel, kobj);
>>>>>>>>>> +
>>>>>>>>>> +    if (!channel->mmap_ring_buffer)
>>>>>>>>>> +        return -ENODEV;
>>>>>>>>>> +    return channel->mmap_ring_buffer(channel, vma);
>>>>>>>>>
>>>>>>>>> What is preventing mmap_ring_buffer from being set to NULL 
>>>>>>>>> right after
>>>>>>>>> checking it and then calling it here?  I see no locks here or 
>>>>>>>>> where you
>>>>>>>>> are assigning this variable at all, so what is preventing these 
>>>>>>>>> types of
>>>>>>>>> races?
>>>>>>>>>
>>>>>>>>> thanks,
>>>>>>>>>
>>>>>>>>> greg k-h
>>>>>>>>
>>>>>>>> Thank you so much for reviewing.
>>>>>>>> I spent some time to understand if this race condition can 
>>>>>>>> happen and it
>>>>>>>> seems execution flow is pretty sequential, for a particular 
>>>>>>>> channel of a
>>>>>>>> device.
>>>>>>>>
>>>>>>>> Unless hv_uio_remove (which makes channel->mmap_ring_buffer 
>>>>>>>> NULL) can be
>>>>>>>> called in parallel to hv_uio_probe (which had set
>>>>>>>> channel->mmap_ring_buffer to non NULL), I doubt race can happen 
>>>>>>>> here.
>>>>>>>>
>>>>>>>> Code Flow: (R, W-> Read, Write to channel->mmap_ring_buffer)
>>>>>>>>
>>>>>>>> vmbus_device_register
>>>>>>>>      device_register
>>>>>>>>        hv_uio_probe
>>>>>>>>       hv_create_ring_sysfs (W to non NULL)
>>>>>>>>            sysfs_update_group
>>>>>>>>              vmbus_chan_attr_is_visible (R)
>>>>>>>>      vmbus_add_channel_kobj
>>>>>>>>        sysfs_create_group
>>>>>>>>          vmbus_chan_attr_is_visible  (R)
>>>>>>>>          hv_mmap_ring_buffer_wrapper (critical section)
>>>>>>>>
>>>>>>>> hv_uio_remove
>>>>>>>>      hv_remove_ring_sysfs (W to NULL)
>>>>>>>
>>>>>>> Yes, and right in here someone mmaps the file.
>>>>>>>
>>>>>>> I think you can race here, no locks at all feels wrong.
>>>>>>>
>>>>>>> Messing with sysfs groups and files like this is rough, and 
>>>>>>> almost never
>>>>>>> a good idea, why can't you just do this all at once with the default
>>>>>>> groups, why is this being added/removed out-of-band?
>>>>>>>
>>>>>>> thanks,
>>>>>>>
>>>>>>> greg k-h
>>>>>>
>>>>>> The decision to avoid creating a "ring" sysfs attribute by default
>>>>>> likely stems from a specific use case where it wasn't needed for 
>>>>>> every
>>>>>> device. By creating it automatically, it keeps the uio_hv_generic
>>>>>> driver simpler and helps prevent potential race conditions. 
>>>>>> However, it
>>>>>> has an added cost of having ring buffer for all the channels, 
>>>>>> where it
>>>>>> is not required. I am trying to find if there are any more 
>>>>>> implications
>>>>>> of it.
>>>>>
>>>>> You do know about the "is_visable" attribute callback, right?  Why not
>>>>> just use that instead of manually mucking around with the
>>>>> adding/removing of sysfs attributes at all?  That is what it was
>>>>> designed for.
>>>>>
>>>>> thanks,
>>>>>
>>>>> greg k-h
>>>>
>>>> Hi Greg,
>>>> Yes, I am utilizing that in my patch. For differentiating channels of a
>>>> uio_hv_generic device, and for *selectively* creating sysfs, we
>>>> introduced this field in channel struct "channel->mmap_ring_buffer",
>>>> which we were setting only in uio_hv_generic. But, by the time we set
>>>> this in uio_hv_generic driver, the sysfs creation has already gone
>>>> through and sysfs doesn't get updated dynamically. That's where there
>>>> was a need to call sysfs_update_group. I thought the better place to
>>>> keep sysfs_update_group would be in vmbus driver, where we are creating
>>>> the original sysfs entries, hence I had to add the wrapper functions.
>>>> This led us to the race condition we are trying to address now.
>>>>
>>>>
>>>> @@ -1838,6 +1872,10 @@ static umode_t vmbus_chan_attr_is_visible(struct
>>>> kobject *kobj,
>>>>            attr == &chan_attr_monitor_id.attr))
>>>>           return 0;
>>>>
>>>> +    /* Hide ring attribute if channel's mmap_ring_buffer function 
>>>> is not yet
>>>> initialised */
>>>> +    if (attr ==  &chan_attr_ring_buffer.attr && !channel- 
>>>> >mmap_ring_buffer)
>>>> +        return 0;
>>>> +
>>>>       return attr->mode;
>>>
>>> Ok, that's good.  BUT you need to change the detection of this to be
>>> before the device is set up in the driver.  Why can't you do it in the
>>> device creation logic itself instead of after-the-fact when you will
>>> ALWAYS race with userspace?
>>>
>>> thanks,
>>>
>>> greg k-h
>>
>> Sure, will check more on this. Thanks.
>>
>> Regards,
>> Naman
>>
> 
> Hi Greg,
> I understand this is deviating from the discussions that we have had
> till now, but I wanted to kindly request your opinion on the following
> approach, which came up in one of our internal discussions.
> 
> By moving the sysfs creation logic from hv_uio_probe to hv_uio_open, I
> believe we can address this problem. Here are some of the benefits of
> this approach:
> 
> * This approach involves minimal changes and provides a localized
> solution.
> 
> * Since the use-case of ring sysfs is specific to uio_hv_generic and
> DPDK, this will give us the flexibility to modify it based on
> requirements. For example, ring_buffer_bin_attr.size should depend on
> the ring buffer's allocated size, which is easier to manage if the
> current code resides in uio_hv_generic.
> 
> * The use-case of DPDK is such that at any given time, either the
> hv_netvsc kernel driver or the userspace (DPDK) will be controlling this
> HV_NIC device. We do not want to expose the ring buffer to userspace
> when hv_netvsc is using the device. This is where the "awareness" of the
> current user comes into play, and we need a way to control the
> visibility of the "ring" sysfs from uio_hv_generic.
> 
> 
> Alternatively, I could focus on resolving the race condition you
> mentioned and proceed with refining the patch. This approach addresses
> most of the general practice concerns you highlighted.
> 
> Regards,
> Naman

Hello Greg,
Here are the two approaches that we discussed.
Can you please suggest the approach which looks better to you
for next version.

**Approach 1: move sysfs creation to hv_uio_open**
---
  drivers/uio/uio_hv_generic.c | 10 +++++-----
  1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/uio/uio_hv_generic.c b/drivers/uio/uio_hv_generic.c
index 1b19b5647495..fec78c90279e 100644
--- a/drivers/uio/uio_hv_generic.c
+++ b/drivers/uio/uio_hv_generic.c
@@ -218,6 +218,11 @@ hv_uio_open(struct uio_info *info, struct inode *inode)
  	vmbus_set_chn_rescind_callback(dev->channel, hv_uio_rescind);
  	vmbus_set_sc_create_callback(dev->channel, hv_uio_new_channel);

+	ret = sysfs_create_bin_file(&dev->channel->kobj, &ring_buffer_bin_attr);
+	if (ret)
+		dev_notice(&dev->device,
+			   "sysfs create ring bin file failed; %d\n", ret);
+
  	ret = vmbus_connect_ring(dev->channel,
  				 hv_uio_channel_cb, dev->channel);
  	if (ret == 0)
@@ -350,11 +355,6 @@ hv_uio_probe(struct hv_device *dev,
  		goto fail_close;
  	}

-	ret = sysfs_create_bin_file(&channel->kobj, &ring_buffer_bin_attr);
-	if (ret)
-		dev_notice(&dev->device,
-			   "sysfs create ring bin file failed; %d\n", ret);
-
  	hv_set_drvdata(dev, pdata);

  	return 0;
-- 

=====================================================================
**Approach 2: Move sysfs creation to hyperv drivers**

---
  drivers/hv/hyperv_vmbus.h    |  6 +++
  drivers/hv/vmbus_drv.c       | 73 +++++++++++++++++++++++++++++++++++-
  drivers/uio/uio_hv_generic.c | 33 ++++++----------
  include/linux/hyperv.h       |  6 +++
  4 files changed, 95 insertions(+), 23 deletions(-)

diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
index 29780f3a7478..0b450e53161e 100644
--- a/drivers/hv/hyperv_vmbus.h
+++ b/drivers/hv/hyperv_vmbus.h
@@ -477,4 +477,10 @@ static inline int hv_debug_add_dev_dir(struct 
hv_device *dev)

  #endif /* CONFIG_HYPERV_TESTING */

+/* Create and remove sysfs entry for memory mapped ring buffers for a 
channel */
+int hv_create_ring_sysfs(struct vmbus_channel *channel,
+			 int (*hv_mmap_ring_buffer)(struct vmbus_channel *channel,
+						    struct vm_area_struct *vma));
+int hv_remove_ring_sysfs(struct vmbus_channel *channel);
+
  #endif /* _HYPERV_VMBUS_H */
diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index 22afebfc28ff..44ecbca83c9e 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -1802,6 +1802,27 @@ static ssize_t subchannel_id_show(struct 
vmbus_channel *channel,
  }
  static VMBUS_CHAN_ATTR_RO(subchannel_id);

+static int hv_mmap_ring_buffer_wrapper(struct file *filp, struct 
kobject *kobj,
+				       const struct bin_attribute *attr,
+				       struct vm_area_struct *vma)
+{
+	struct vmbus_channel *channel = container_of(kobj, struct 
vmbus_channel, kobj);
+
+	/*
+	 * hv_(create|remove)_ring_sysfs implementation ensures that 
mmap_ring_buffer
+	 * is not NULL.
+	 */
+	return channel->mmap_ring_buffer(channel, vma);
+}
+
+static struct bin_attribute chan_attr_ring_buffer = {
+	.attr = {
+		.name = "ring",
+		.mode = 0600,
+	},
+	.size = 2 * SZ_2M,
+	.mmap = hv_mmap_ring_buffer_wrapper,
+};
  static struct attribute *vmbus_chan_attrs[] = {
  	&chan_attr_out_mask.attr,
  	&chan_attr_in_mask.attr,
@@ -1821,6 +1842,11 @@ static struct attribute *vmbus_chan_attrs[] = {
  	NULL
  };

+static struct bin_attribute *vmbus_chan_bin_attrs[] = {
+	&chan_attr_ring_buffer,
+	NULL
+};
+
  /*
   * Channel-level attribute_group callback function. Returns the 
permission for
   * each attribute, and returns 0 if an attribute is not visible.
@@ -1841,9 +1867,24 @@ static umode_t vmbus_chan_attr_is_visible(struct 
kobject *kobj,
  	return attr->mode;
  }

+static umode_t vmbus_chan_bin_attr_is_visible(struct kobject *kobj,
+					      const struct bin_attribute *attr, int idx)
+{
+	const struct vmbus_channel *channel =
+		container_of(kobj, struct vmbus_channel, kobj);
+
+	/* Hide ring attribute if channel's ring_sysfs_visible is set to false */
+	if (attr ==  &chan_attr_ring_buffer && !channel->ring_sysfs_visible)
+		return 0;
+
+	return attr->attr.mode;
+}
+
  static const struct attribute_group vmbus_chan_group = {
  	.attrs = vmbus_chan_attrs,
-	.is_visible = vmbus_chan_attr_is_visible
+	.bin_attrs = vmbus_chan_bin_attrs,
+	.is_visible = vmbus_chan_attr_is_visible,
+	.is_bin_visible = vmbus_chan_bin_attr_is_visible,
  };

  static const struct kobj_type vmbus_chan_ktype = {
@@ -1851,6 +1892,36 @@ static const struct kobj_type vmbus_chan_ktype = {
  	.release = vmbus_chan_release,
  };

+/*
+ * hv_create_ring_sysfs - create ring sysfs entry corresponding to ring 
buffers for a channel
+ */
+int hv_create_ring_sysfs(struct vmbus_channel *channel,
+			 int (*hv_mmap_ring_buffer)(struct vmbus_channel *channel,
+						    struct vm_area_struct *vma))
+{
+	struct kobject *kobj = &channel->kobj;
+
+	channel->mmap_ring_buffer = hv_mmap_ring_buffer;
+	channel->ring_sysfs_visible = true;
+	return sysfs_update_group(kobj, &vmbus_chan_group);
+}
+EXPORT_SYMBOL_GPL(hv_create_ring_sysfs);
+
+/*
+ * hv_remove_ring_sysfs - remove ring sysfs entry corresponding to ring 
buffers for a channel
+ */
+int hv_remove_ring_sysfs(struct vmbus_channel *channel)
+{
+	struct kobject *kobj = &channel->kobj;
+	int ret;
+
+	channel->ring_sysfs_visible = false;
+	ret = sysfs_update_group(kobj, &vmbus_chan_group);
+	channel->mmap_ring_buffer = NULL;
+	return ret;
+}
+EXPORT_SYMBOL_GPL(hv_remove_ring_sysfs);
+
  /*
   * vmbus_add_channel_kobj - setup a sub-directory under device/channels
   */
diff --git a/drivers/uio/uio_hv_generic.c b/drivers/uio/uio_hv_generic.c
index 1b19b5647495..a398e93ed382 100644
--- a/drivers/uio/uio_hv_generic.c
+++ b/drivers/uio/uio_hv_generic.c
@@ -131,15 +131,12 @@ static void hv_uio_rescind(struct vmbus_channel 
*channel)
  	vmbus_device_unregister(channel->device_obj);
  }

-/* Sysfs API to allow mmap of the ring buffers
+/* Function used for mmap of ring buffer sysfs interface.
   * The ring buffer is allocated as contiguous memory by vmbus_open
   */
-static int hv_uio_ring_mmap(struct file *filp, struct kobject *kobj,
-			    const struct bin_attribute *attr,
-			    struct vm_area_struct *vma)
+static int
+hv_uio_ring_mmap(struct vmbus_channel *channel, struct vm_area_struct *vma)
  {
-	struct vmbus_channel *channel
-		= container_of(kobj, struct vmbus_channel, kobj);
  	void *ring_buffer = page_address(channel->ringbuffer_page);

  	if (channel->state != CHANNEL_OPENED_STATE)
@@ -149,15 +146,6 @@ static int hv_uio_ring_mmap(struct file *filp, 
struct kobject *kobj,
  			       channel->ringbuffer_pagecount << PAGE_SHIFT);
  }

-static const struct bin_attribute ring_buffer_bin_attr = {
-	.attr = {
-		.name = "ring",
-		.mode = 0600,
-	},
-	.size = 2 * SZ_2M,
-	.mmap = hv_uio_ring_mmap,
-};
-
  /* Callback from VMBUS subsystem when new channel created. */
  static void
  hv_uio_new_channel(struct vmbus_channel *new_sc)
@@ -178,8 +166,7 @@ hv_uio_new_channel(struct vmbus_channel *new_sc)
  	/* Disable interrupts on sub channel */
  	new_sc->inbound.ring_buffer->interrupt_mask = 1;
  	set_channel_read_mode(new_sc, HV_CALL_ISR);
-
-	ret = sysfs_create_bin_file(&new_sc->kobj, &ring_buffer_bin_attr);
+	ret = hv_create_ring_sysfs(new_sc, hv_uio_ring_mmap);
  	if (ret) {
  		dev_err(device, "sysfs create ring bin file failed; %d\n", ret);
  		vmbus_close(new_sc);
@@ -350,10 +337,12 @@ hv_uio_probe(struct hv_device *dev,
  		goto fail_close;
  	}

-	ret = sysfs_create_bin_file(&channel->kobj, &ring_buffer_bin_attr);
-	if (ret)
-		dev_notice(&dev->device,
-			   "sysfs create ring bin file failed; %d\n", ret);
+	/*
+	 * This internally calls sysfs_update_group, which returns a non-zero 
value if it executes
+	 * before sysfs_create_group. This is a false alarm from this use case 
point of view and
+	 * thus, no need to check the return value and print warning.
+	 */
+	hv_create_ring_sysfs(channel, hv_uio_ring_mmap);

  	hv_set_drvdata(dev, pdata);

@@ -375,7 +364,7 @@ hv_uio_remove(struct hv_device *dev)
  	if (!pdata)
  		return;

-	sysfs_remove_bin_file(&dev->channel->kobj, &ring_buffer_bin_attr);
+	hv_remove_ring_sysfs(dev->channel);
  	uio_unregister_device(&pdata->info);
  	hv_uio_cleanup(dev, pdata);

diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index 7f4f8d8bdf43..f93e489eca62 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -1058,6 +1058,12 @@ struct vmbus_channel {

  	/* The max size of a packet on this channel */
  	u32 max_pkt_size;
+
+	/* function to mmap ring buffer memory to the channel's sysfs ring 
attribute */
+	int (*mmap_ring_buffer)(struct vmbus_channel *channel, struct 
vm_area_struct *vma);
+
+	/* boolean to control visibility of sysfs for ring buffer */
+	bool ring_sysfs_visible;
  };

  #define lock_requestor(channel, flags)					\
-- 


Regards,
Naman


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ