lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <541c63d6-8ae6-4a32-8a02-d86eea64827e@linux.microsoft.com>
Date: Wed, 26 Feb 2025 10:43:41 +0530
From: Naman Jain <namjain@...ux.microsoft.com>
To: Greg Kroah-Hartman <gregkh@...uxfoundation.org>
Cc: "K . Y . Srinivasan" <kys@...rosoft.com>,
 Haiyang Zhang <haiyangz@...rosoft.com>, Wei Liu <wei.liu@...nel.org>,
 Dexuan Cui <decui@...rosoft.com>,
 Stephen Hemminger <stephen@...workplumber.org>,
 linux-hyperv@...r.kernel.org, linux-kernel@...r.kernel.org,
 stable@...nel.org, Saurabh Sengar <ssengar@...ux.microsoft.com>,
 Michael Kelley <mhklinux@...look.com>, Long Li <longli@...rosoft.com>
Subject: Re: [PATCH] uio_hv_generic: Fix sysfs creation path for ring buffer



On 2/25/2025 2:09 PM, Greg Kroah-Hartman wrote:
> On Tue, Feb 25, 2025 at 02:04:43PM +0530, Naman Jain wrote:
>>
>>
>> On 2/25/2025 11:42 AM, Greg Kroah-Hartman wrote:
>>> On Tue, Feb 25, 2025 at 10:50:01AM +0530, Naman Jain wrote:
>>>> On regular bootup, devices get registered to vmbus first, so when
>>>> uio_hv_generic driver for a particular device type is probed,
>>>> the device is already initialized and added, so sysfs creation in
>>>> uio_hv_generic probe works fine. However, when device is removed
>>>> and brought back, the channel rescinds and device again gets
>>>> registered to vmbus. However this time, the uio_hv_generic driver is
>>>> already registered to probe for that device and in this case sysfs
>>>> creation is tried before the device gets initialized completely.
>>>>
>>>> Fix this by moving the core logic of sysfs creation for ring buffer,
>>>> from uio_hv_generic to HyperV's vmbus driver, where rest of the sysfs
>>>> attributes for the channels are defined. While doing that, make use
>>>> of attribute groups and macros, instead of creating sysfs directly,
>>>> to ensure better error handling and code flow.
>>>>
>>>> Problem path:
>>>> vmbus_device_register
>>>>       device_register
>>>>           uio_hv_generic probe
>>>>                       sysfs_create_bin_file (fails here)
>>>>           kset_create_and_add (dependency)
>>>>           vmbus_add_channel_kobj (dependency)
>>>>
>>>> Fixes: 9ab877a6ccf8 ("uio_hv_generic: make ring buffer attribute for primary channel")
>>>> Cc: stable@...nel.org
>>>> Suggested-by: Saurabh Sengar <ssengar@...ux.microsoft.com>
>>>> Suggested-by: Michael Kelley <mhklinux@...look.com>
>>>> Signed-off-by: Naman Jain <namjain@...ux.microsoft.com>
>>>> ---
>>>> Hi,
>>>> This is the first patch after initial RFC was posted.
>>>> https://lore.kernel.org/all/20250214064351.8994-1-namjain@linux.microsoft.com/
>>>>
>>>> Changes since RFC patch:
>>>> * Different approach to solve the problem is proposed (credits to
>>>>     Michael Kelley).
>>>> * Core logic for sysfs creation moved out of uio_hv_generic, to VMBus
>>>>     drivers where rest of the sysfs attributes for a VMBus channel
>>>>     are defined. (addressed Greg's comments)
>>>> * Used attribute groups instead of sysfs_create* functions, and bundled
>>>>     ring attribute with other attributes for the channel sysfs.
>>>>
>>>> Error logs:
>>>>
>>>> [   35.574120] ------------[ cut here ]------------
>>>> [   35.574122] WARNING: CPU: 0 PID: 10 at fs/sysfs/file.c:591 sysfs_create_bin_file+0x81/0x90
>>>> [   35.574168] Workqueue: hv_pri_chan vmbus_add_channel_work
>>>> [   35.574172] RIP: 0010:sysfs_create_bin_file+0x81/0x90
>>>> [   35.574197] Call Trace:
>>>> [   35.574199]  <TASK>
>>>> [   35.574200]  ? show_regs+0x69/0x80
>>>> [   35.574217]  ? __warn+0x8d/0x130
>>>> [   35.574220]  ? sysfs_create_bin_file+0x81/0x90
>>>> [   35.574222]  ? report_bug+0x182/0x190
>>>> [   35.574225]  ? handle_bug+0x5b/0x90
>>>> [   35.574244]  ? exc_invalid_op+0x19/0x70
>>>> [   35.574247]  ? asm_exc_invalid_op+0x1b/0x20
>>>> [   35.574252]  ? sysfs_create_bin_file+0x81/0x90
>>>> [   35.574255]  hv_uio_probe+0x1e7/0x410 [uio_hv_generic]
>>>> [   35.574271]  vmbus_probe+0x3b/0x90
>>>> [   35.574275]  really_probe+0xf4/0x3b0
>>>> [   35.574279]  __driver_probe_device+0x8a/0x170
>>>> [   35.574282]  driver_probe_device+0x23/0xc0
>>>> [   35.574285]  __device_attach_driver+0xb5/0x140
>>>> [   35.574288]  ? __pfx___device_attach_driver+0x10/0x10
>>>> [   35.574291]  bus_for_each_drv+0x86/0xe0
>>>> [   35.574294]  __device_attach+0xc1/0x200
>>>> [   35.574297]  device_initial_probe+0x13/0x20
>>>> [   35.574315]  bus_probe_device+0x99/0xa0
>>>> [   35.574318]  device_add+0x647/0x870
>>>> [   35.574320]  ? hrtimer_init+0x28/0x70
>>>> [   35.574323]  device_register+0x1b/0x30
>>>> [   35.574326]  vmbus_device_register+0x83/0x130
>>>> [   35.574328]  vmbus_add_channel_work+0x135/0x1a0
>>>> [   35.574331]  process_one_work+0x177/0x340
>>>> [   35.574348]  worker_thread+0x2b2/0x3c0
>>>> [   35.574350]  kthread+0xe3/0x1f0
>>>> [   35.574353]  ? __pfx_worker_thread+0x10/0x10
>>>> [   35.574356]  ? __pfx_kthread+0x10/0x10
>>>>
>>>> ---
>>>>    drivers/hv/hyperv_vmbus.h    |  4 +++
>>>>    drivers/hv/vmbus_drv.c       | 62 ++++++++++++++++++++++++++++++++++++
>>>>    drivers/uio/uio_hv_generic.c | 34 ++------------------
>>>>    include/linux/hyperv.h       |  3 ++
>>>>    4 files changed, 72 insertions(+), 31 deletions(-)
>>>>
>>>> diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
>>>> index 29780f3a7478..e0c7b75e6c7a 100644
>>>> --- a/drivers/hv/hyperv_vmbus.h
>>>> +++ b/drivers/hv/hyperv_vmbus.h
>>>> @@ -477,4 +477,8 @@ static inline int hv_debug_add_dev_dir(struct hv_device *dev)
>>>>    #endif /* CONFIG_HYPERV_TESTING */
>>>> +/* Create and remove sysfs entry for memory mapped ring buffers for a channel */
>>>> +int hv_create_ring_sysfs(struct vmbus_channel *channel);
>>>> +int hv_remove_ring_sysfs(struct vmbus_channel *channel);
>>>> +
>>>>    #endif /* _HYPERV_VMBUS_H */
>>>> diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
>>>> index 22afebfc28ff..0110643bad3f 100644
>>>> --- a/drivers/hv/vmbus_drv.c
>>>> +++ b/drivers/hv/vmbus_drv.c
>>>> @@ -1802,6 +1802,39 @@ static ssize_t subchannel_id_show(struct vmbus_channel *channel,
>>>>    }
>>>>    static VMBUS_CHAN_ATTR_RO(subchannel_id);
>>>> +/* Functions to create sysfs interface to allow mmap of the ring buffers.
>>>> + * The ring buffer is allocated as contiguous memory by vmbus_open
>>>> + */
>>>> +static int hv_mmap_ring_buffer(struct vmbus_channel *channel, struct vm_area_struct *vma)
>>>> +{
>>>> +	void *ring_buffer = page_address(channel->ringbuffer_page);
>>>> +
>>>> +	if (channel->state != CHANNEL_OPENED_STATE)
>>>> +		return -ENODEV;
>>>> +
>>>> +	return vm_iomap_memory(vma, virt_to_phys(ring_buffer),
>>>> +			       channel->ringbuffer_pagecount << PAGE_SHIFT);
>>>> +}
>>>> +
>>>> +static int hv_mmap_ring_buffer_wrapper(struct file *filp, struct kobject *kobj,
>>>> +				       const struct bin_attribute *attr,
>>>> +				       struct vm_area_struct *vma)
>>>> +{
>>>> +	struct vmbus_channel *channel = container_of(kobj, struct vmbus_channel, kobj);
>>>> +
>>>> +	if (!channel->mmap_ring_buffer)
>>>> +		return -ENODEV;
>>>> +	return channel->mmap_ring_buffer(channel, vma);
>>>
>>> What is preventing mmap_ring_buffer from being set to NULL right after
>>> checking it and then calling it here?  I see no locks here or where you
>>> are assigning this variable at all, so what is preventing these types of
>>> races?
>>>
>>> thanks,
>>>
>>> greg k-h
>>
>> Thank you so much for reviewing.
>> I spent some time to understand if this race condition can happen and it
>> seems execution flow is pretty sequential, for a particular channel of a
>> device.
>>
>> Unless hv_uio_remove (which makes channel->mmap_ring_buffer NULL) can be
>> called in parallel to hv_uio_probe (which had set
>> channel->mmap_ring_buffer to non NULL), I doubt race can happen here.
>>
>> Code Flow: (R, W-> Read, Write to channel->mmap_ring_buffer)
>>
>> vmbus_device_register
>>    device_register
>>      hv_uio_probe
>> 	  hv_create_ring_sysfs (W to non NULL)
>>          sysfs_update_group
>>            vmbus_chan_attr_is_visible (R)
>>    vmbus_add_channel_kobj
>>      sysfs_create_group
>>        vmbus_chan_attr_is_visible  (R)
>>        hv_mmap_ring_buffer_wrapper (critical section)
>>
>> hv_uio_remove
>>    hv_remove_ring_sysfs (W to NULL)
> 
> Yes, and right in here someone mmaps the file.
> 
> I think you can race here, no locks at all feels wrong.
> 
> Messing with sysfs groups and files like this is rough, and almost never
> a good idea, why can't you just do this all at once with the default
> groups, why is this being added/removed out-of-band?
> 
> thanks,
> 
> greg k-h

The decision to avoid creating a "ring" sysfs attribute by default
likely stems from a specific use case where it wasn't needed for every
device. By creating it automatically, it keeps the uio_hv_generic
driver simpler and helps prevent potential race conditions. However, it
has an added cost of having ring buffer for all the channels, where it
is not required. I am trying to find if there are any more implications
of it.

Thanks,
Naman

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ