[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID:
<SN6PR02MB41574C54FFDE0D3F3B7A5649D47CA@SN6PR02MB4157.namprd02.prod.outlook.com>
Date: Fri, 20 Jun 2025 16:05:13 +0000
From: Michael Kelley <mhklinux@...look.com>
To: Naman Jain <namjain@...ux.microsoft.com>, "K . Y . Srinivasan"
<kys@...rosoft.com>, Haiyang Zhang <haiyangz@...rosoft.com>, Wei Liu
<wei.liu@...nel.org>, Dexuan Cui <decui@...rosoft.com>, Long Li
<longli@...rosoft.com>, Saurabh Sengar <ssengar@...ux.microsoft.com>
CC: "linux-hyperv@...r.kernel.org" <linux-hyperv@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"stable@...r.kernel.org" <stable@...r.kernel.org>
Subject: RE: [PATCH] tools/hv: fcopy: Fix irregularities with size of ring
buffer
From: Naman Jain <namjain@...ux.microsoft.com> Sent: Friday, June 20, 2025 12:06 AM
>
> Size of ring buffer, as defined in uio_hv_generic driver, is no longer
> fixed to 16 KB. This creates a problem in fcopy, since this size was
> hardcoded. With the change in place to make ring sysfs node actually
> reflect the size of underlying ring buffer, it is safe to get the size
> of ring sysfs file and use it for ring buffer size in fcopy daemon.
> Fix the issue of disparity in ring buffer size, by making it dynamic
> in fcopy uio daemon.
>
> Cc: stable@...r.kernel.org
> Fixes: 0315fef2aff9 ("uio_hv_generic: Align ring size to system page")
> Signed-off-by: Naman Jain <namjain@...ux.microsoft.com>
> ---
> tools/hv/hv_fcopy_uio_daemon.c | 65 ++++++++++++++++++++++++++++++----
> 1 file changed, 58 insertions(+), 7 deletions(-)
>
> diff --git a/tools/hv/hv_fcopy_uio_daemon.c b/tools/hv/hv_fcopy_uio_daemon.c
> index 0198321d14a2..da2b27d6af0e 100644
> --- a/tools/hv/hv_fcopy_uio_daemon.c
> +++ b/tools/hv/hv_fcopy_uio_daemon.c
> @@ -36,6 +36,7 @@
> #define WIN8_SRV_VERSION (WIN8_SRV_MAJOR << 16 | WIN8_SRV_MINOR)
>
> #define FCOPY_UIO "/sys/bus/vmbus/devices/eb765408-105f-49b6-b4aa-c123b64d17d4/uio"
> +#define FCOPY_CHANNELS_PATH "/sys/bus/vmbus/devices/eb765408-105f-49b6-b4aa-c123b64d17d4/channels"
>
> #define FCOPY_VER_COUNT 1
> static const int fcopy_versions[] = {
> @@ -47,9 +48,51 @@ static const int fw_versions[] = {
> UTIL_FW_VERSION
> };
>
> -#define HV_RING_SIZE 0x4000 /* 16KB ring buffer size */
> +#define HV_RING_SIZE_DEFAULT 0x4000 /* 16KB ring buffer size default */
>
> -static unsigned char desc[HV_RING_SIZE];
> +static uint32_t get_ring_buffer_size(void)
> +{
> + char ring_path[PATH_MAX];
> + DIR *dir;
> + struct dirent *entry;
> + struct stat st;
> + uint32_t ring_size = 0;
> +
> + /* Find the channel directory */
> + dir = opendir(FCOPY_CHANNELS_PATH);
> + if (!dir) {
> + syslog(LOG_ERR, "Failed to open channels directory, using default ring size");
This is where the previous long discussion about racing with user space
comes into play. While highly unlikely, it's possible that the "opendir" could fail
because of racing with the kernel thread that creates the "channels" directory.
The right thing to do would be to sleep for some period of time, then try
again. Sleeping for 1 second would be a very generous -- could also go with
something like 100 milliseconds.
> + return HV_RING_SIZE_DEFAULT;
> + }
> +
> + while ((entry = readdir(dir)) != NULL) {
> + if (entry->d_type == DT_DIR && strcmp(entry->d_name, ".") != 0 &&
> + strcmp(entry->d_name, "..") != 0) {
> + snprintf(ring_path, sizeof(ring_path), "%s/%s/ring",
> + FCOPY_CHANNELS_PATH, entry->d_name);
> +
> + if (stat(ring_path, &st) == 0) {
> + /* stat returns size of Tx, Rx rings combined, so take half of it */
> + ring_size = (uint32_t)st.st_size / 2;
> + syslog(LOG_INFO, "Ring buffer size from %s: %u bytes",
> + ring_path, ring_size);
> + break;
> + }
> + }
> + }
The same race problem could happen with this loop. The "channels" directory
might have been created, but the entry for the numbered channel might not.
The loop could exit having found only "." and "..". Again, if no numbered
channel is found, sleep for a short period of time and try again.
> +
> + closedir(dir);
> +
> + if (!ring_size) {
> + ring_size = HV_RING_SIZE_DEFAULT;
> + syslog(LOG_ERR, "Could not determine ring size, using default: %u bytes",
> + HV_RING_SIZE_DEFAULT);
> + }
> +
> + return ring_size;
> +}
> +
> +static unsigned char *desc;
>
> static int target_fd;
> static char target_fname[PATH_MAX];
> @@ -406,7 +449,8 @@ int main(int argc, char *argv[])
> int daemonize = 1, long_index = 0, opt, ret = -EINVAL;
> struct vmbus_br txbr, rxbr;
> void *ring;
> - uint32_t len = HV_RING_SIZE;
> + uint32_t ring_size = get_ring_buffer_size();
Getting the ring buffer size before even the command line options
are parsed could produce unexpected results. For example, if someone
just wanted to see the usage (the -h option), they might get
an error about not being able to get the ring size. I'd suggest doing
this later, after the /dev/uio<N> entry is successfully opened.
> + uint32_t len = ring_size;
> char uio_name[NAME_MAX] = {0};
> char uio_dev_path[PATH_MAX] = {0};
>
> @@ -416,6 +460,13 @@ int main(int argc, char *argv[])
> {0, 0, 0, 0 }
> };
>
> + desc = (unsigned char *)malloc(ring_size * sizeof(unsigned char));
> + if (!desc) {
> + syslog(LOG_ERR, "malloc failed for desc buffer");
> + ret = -ENOMEM;
> + goto exit;
> + }
> +
> while ((opt = getopt_long(argc, argv, "hn", long_options,
> &long_index)) != -1) {
> switch (opt) {
> @@ -448,14 +499,14 @@ int main(int argc, char *argv[])
> goto exit;
> }
>
> - ring = vmbus_uio_map(&fcopy_fd, HV_RING_SIZE);
> + ring = vmbus_uio_map(&fcopy_fd, ring_size);
> if (!ring) {
> ret = errno;
> syslog(LOG_ERR, "mmap ringbuffer failed; error: %d %s", ret, strerror(ret));
> goto close;
> }
> - vmbus_br_setup(&txbr, ring, HV_RING_SIZE);
> - vmbus_br_setup(&rxbr, (char *)ring + HV_RING_SIZE, HV_RING_SIZE);
> + vmbus_br_setup(&txbr, ring, ring_size);
> + vmbus_br_setup(&rxbr, (char *)ring + ring_size, ring_size);
>
> rxbr.vbr->imask = 0;
>
> @@ -472,7 +523,7 @@ int main(int argc, char *argv[])
> goto close;
> }
>
> - len = HV_RING_SIZE;
> + len = ring_size;
> ret = rte_vmbus_chan_recv_raw(&rxbr, desc, &len);
> if (unlikely(ret <= 0)) {
> /* This indicates a failure to communicate (or worse) */
>
> base-commit: bc6e0ba6c9bafa6241b05524b9829808056ac4ad
> --
> 2.34.1
Powered by blists - more mailing lists