[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20250703133524.GA9099@wendao-VirtualBox>
Date: Thu, 3 Jul 2025 21:35:24 +0800
From: Dawei Li <dawei.li@...ux.dev>
To: Arnaud POULIQUEN <arnaud.pouliquen@...s.st.com>
Cc: andersson@...nel.org, mathieu.poirier@...aro.org,
linux-remoteproc@...r.kernel.org, linux-kernel@...r.kernel.org,
set_pte_at@...look.com, dawei.li@...ux.dev
Subject: Re: [PATCH v4 0/3] rpmsg: Introduce RPMSG_CREATE_EPT_FD_IOCTL uAPI
Hi Arnaud,
On Wed, Jul 02, 2025 at 09:48:59AM +0200, Arnaud POULIQUEN wrote:
> hello Dawei
>
> On 7/1/25 16:16, Dawei Li wrote:
> > Hi Arnaud,
> >
> > Thanks for the reply.
> >
> > On Mon, Jun 30, 2025 at 09:54:40AM +0200, Arnaud POULIQUEN wrote:
> >> Hello Dawei,
> >>
> >> Sorry for the late answer.
> >>
> >> On 6/22/25 06:12, Dawei Li wrote:
> >>> Hi Arnaud,
> >>>
> >>> Thanks for the reply.
> >>>
> >>> On Fri, Jun 20, 2025 at 09:52:03AM +0200, Arnaud POULIQUEN wrote:
> >>>>
> >>>>
> >>>> On 6/19/25 16:43, Dawei Li wrote:
> >>>>> Hi Arnaud,
> >>>>> Thanks for review.
> >>>>>
> >>>>> On Wed, Jun 18, 2025 at 03:07:36PM +0200, Arnaud POULIQUEN wrote:
> >>>>>> Hello Dawei,
> >>>>>>
> >>>>>>
> >>>>>> Please find a few comments below. It is not clear to me which parts of your
> >>>>>> implementation are mandatory and which are optional "nice-to-have" optimizations.
> >>>>>
> >>>>> It's more like an improvement.
> >>>>>
> >>>>>>
> >>>>>> Based on (potentially erroneous) hypothesis, you will find a suggestion for an
> >>>>>> alternative to the anonymous inode approach, which does not seem to be a common
> >>>>>> interface.
> >>>>>
> >>>>> AFAIC, annoymous inode is a common interface and used extensivly in kernel development.
> >>>>> Some examples below.
> >>>>>
> >>>>>>
> >>>>>>
> >>>>>> On 6/9/25 17:15, Dawei Li wrote:
> >>>>>>> Hi,
> >>>>>>>
> >>>>>>> This is V4 of series which introduce new uAPI(RPMSG_CREATE_EPT_FD_IOCTL)
> >>>>>>> for rpmsg subsystem.
> >>>>>>>
> >>>>>>> Current uAPI implementation for rpmsg ctrl & char device manipulation is
> >>>>>>> abstracted in procedures below:
> >>>>>>> - fd = open("/dev/rpmsg_ctrlX")
> >>>>>>> - ioctl(fd, RPMSG_CREATE_EPT_IOCTL, &info); /dev/rpmsgY devnode is
> >>>>>>> generated.
> >>>>>>> - fd_ep = open("/dev/rpmsgY", O_RDWR)
> >>>>>>> - operations on fd_ep(write, read, poll ioctl)
> >>>>>>> - ioctl(fd_ep, RPMSG_DESTROY_EPT_IOCTL)
> >>>>>>> - close(fd_ep)
> >>>>>>> - close(fd)
> >>>>>>>
> >>>>>>> This /dev/rpmsgY abstraction is less favorable for:
> >>>>>>> - Performance issue: It's time consuming for some operations are
> >>>>>>> invovled:
> >>>>>>> - Device node creation.
> >>>>>>> Depends on specific config, especially CONFIG_DEVTMPFS, the overall
> >>>>>>> overhead is based on coordination between DEVTMPFS and userspace
> >>>>>>> tools such as udev and mdev.
> >>>>>>>
> >>>>>>> - Extra kernel-space switch cost.
> >>>>>>>
> >>>>>>> - Other major costs brought by heavy-weight logic like device_add().
> >>>>>>
> >>>>>> Is this a blocker of just optimization?
> >>>>>
> >>>>> Yep, performance is one of motivations of this change.
> >>>>>
> >>>>>>
> >>>>>>>
> >>>>>>> - /dev/rpmsgY node can be opened only once. It doesn't make much sense
> >>>>>>> that a dynamically created device node can be opened only once.
> >>>>>>
> >>>>>>
> >>>>>> I assume this is blocker with the fact that you need to open the /dev/rpmsg<x>
> >>>>>> to create the endpoint.
> >>>>>
> >>>>> Yes. You have to open /dev/rpmsgX which is generated by legacy ioctl to
> >>>>> instantiate a new endpoint.
> >>>>>
> >>>>>>
> >>>>>>
> >>>>>>>
> >>>>>>> - For some container application such as docker, a client can't access
> >>>>>>> host's dev unless specified explicitly. But in case of /dev/rpmsgY, which
> >>>>>>> is generated dynamically and whose existence is unknown for clients in
> >>>>>>> advance, this uAPI based on device node doesn't fit well.
> >>>>>>
> >>>>>> does this could be solve in userspace parsing /sys/class/rpmsg/ directory to
> >>>>>> retreive the device?
> >>>>>
> >>>>> Hardly, because client still can't access /dev/rpmsgX which is generated
> >>>>> by host _after_ client is launched.
> >>>>
> >>>>
> >>>> This part is not clear to me; could you provide more details?
> >>>> I cannot figure out why a client can access /dev/rpmsg_ctrlX but not /dev/rpmsgX.
> >>>
> >>> Well, let's take docker as example:
> >>
> >>>
> >>> For docker, when a client is launched and it wants to access host's
> >>> device, it must make explicit request when it's launched:
> >>>
> >>> docker run --device=/dev/xxx
> >>>
> >>> Let's presume that xxx is /dev/rpmsgX generated dynamically by _host_.
> >>> Docker command above knows nothing about these rpmsg nodes which are
> >>> generated by host _after_ client is launched. And yes, parsing> /sys/class/rpmsg may acquire info about rpmsg devices, but client still
> >>> can't access /dev/rpmsgX.
> >>
> >> One extra question:Are you using RPMsg over virtio?
> >>
> >> If yes, do you test the RPMsg name service (NS) announcement, that might also
> >> address your needs.
> >>
> >> The principle is that the remote processor sends a name service announcement to
> >> Linux, which probes the rpmsg character device and creates the /dev/rpmsgX
> >> device in a predefined order known by the remote processor.
> >> In such a case, the /dev/rpmsgX usage would be determined by the remote
> >> processor itself.
> >>
> >> Another advantage is that the RPMsg channel creation is not driven by either the
> >> host or the client. In such case host does no need to define/kwnow RPMSg
> >> endpoint addresses.
> >>
> >> You still need to call the open() file system operation, but this should be done
> >> one time during Docker client initialization.
> >
> > NS is nice, but perhaps it's not the approach for some cases.
> >
> > For offloading/accelerator scenarios, ACPU is responsible for making all
> > the important decisions, including creations of endpoints. Because all
> > the user-awared software stack is running on ACPU, and if you want to
> > create a endpoint _dynamically_, it must be from user's command which is
> > from ACPU.
> >
> > And this series is more about how rpmsg_char and rpmsg_ctrl coordinate
> > themselves about creating dynamic rpmsg endpoints in a more simple
> > and efficient way.
> >
> > And the whole point of series is "When you want to return a fd to
> > userspace which represents an instance of data structure in kernel, you
> > don't implement it as character device". Maybe some quotes from Christian[1]
> > can describe it better[1]:
> >
> > "I'm not sure why people are so in love with character device based apis.
> > It's terrible. It glues everything to devtmpfs which isn't namespacable
> > in any way. It's terrible to delegate and extremely restrictive in terms
> > of extensiblity if you need additional device entries (aka the loop
> > driver folly)."
> >
> > [1] https://lkml.org/lkml/2025/6/24/639
>
> Thank you for all your explanations! It is always very interesting to understand
> the different ways to use RPMsg.
>
> In the end, I don't see any problem with your series. The possibility of using
> an anonymous inode seems valid to me.
Thanks for the review!
>
> I am just wondering whether this should be implemented in the rpmsg_char driver
> or in a new driver, addressing this question to Mathieu and Bjorn.
Well, I think we are kinda in a dilemma:
- It's bit odd to add anonymous inode code in a file named rpmsg_char.c.
- On the other hand, the new anonymous code path do share couples of
codes and abstractions of rpmsg_char.c, rpmsg_eptdev, all methods
excluding open() in fops... . If new code is built into a standalone
driver, how are we supposed to take care of these shared codes?
Yep, leave it to the maintainers.
Thanks,
Dawei
>
>
> Regards,
> Arnaud
>
> >
> > Thanks,
> >
> > Dawei
> >
> >>
> >> Regards
> >> Arnaud
> >>
> >>
> >>>
> >>>>
> >>>>
> >>>>>
> >>>>>>
> >>>>>> You could face same kind of random instantiation for serial peripherals ( UART;
> >>>>>> USb, I2C,...) based on a device tree enumeration. I suppose that user space
> >>>>>> use to solve this.
> >>>>>>
> >>>>>>>
> >>>>>>> An anonymous inode based approach is introduced to address the issues above.
> >>>>>>> Rather than generating device node and opening it, rpmsg code just creates
> >>>>>>> an anonymous inode representing eptdev and return the fd to userspace.
> >>>>>>
> >>>>>> A drawback is that you need to share fb passed between processes.
> >>>>>
> >>>>> Fd is the abstraction of an unique endpoint device, it holds true for
> >>>>> both legacy and new approach.
> >>>>>
> >>>>> So I guess what you mean is that /dev/rpmsgX is global to all so other process
> >>>>> can access it?
> >>>>>
> >>>>> But /dev/rpmsgX is designed to be opened only once, it's implemented as
> >>>>> singleton pattern.
> >>>>>
> >>>>> static int rpmsg_eptdev_open(struct inode *inode, struct file *filp)
> >>>>> {
> >>>>> ...
> >>>>> if (eptdev->ept) {
> >>>>> mutex_unlock(&eptdev->ept_lock);
> >>>>> return -EBUSY;
> >>>>> }
> >>>>> ...
> >>>>> eptdev->ept = ept;
> >>>>> ...
> >>>>> }
> >>>>>
> >>>>> [...]
> >>>>>
> >>>>>>> printf("loop[%d]\n", loop);
> >>>>>>>
> >>>>>>> gettimeofday(&start, NULL);
> >>>>>>>
> >>>>>>> while (loop--) {
> >>>>>>
> >>>>>> Do you need to create /close Endpoint sevral times in your real use case with
> >>>>>> high timing
> >>>>>> constraint?
> >>>>>
> >>>>> No, it's just a silly benchmark demo, large sample reduces noise statistically.
> >>>>>
> >>>>>>
> >>>>>>> fd_info.fd = -1;
> >>>>>>> fd_info.flags = O_RDWR | O_CLOEXEC | O_NONBLOCK;
> >>>>>>> ret = ioctl(fd, RPMSG_CREATE_EPT_FD_IOCTL, &fd_info);
> >>>>>>> if (ret < 0 || fd_info.fd < 0) {
> >>>>>>> printf("ioctl[RPMSG_CREATE_EPT_FD_IOCTL] failed, ret[%d]\n", ret);
> >>>>>>> }
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>>> ret = ioctl(fd_info.fd, RPMSG_DESTROY_EPT_IOCTL, &info);
> >>>>>>> if (ret < 0) {
> >>>>>>> printf("new ioctl[RPMSG_DESTROY_EPT_IOCTL] failed, ret[%d]\n", ret);
> >>>>>>> }
> >>>>>>>
> >>>>>>> close(fd_info.fd);
> >>>>>>
> >>>>>> It seems strange to me to use ioctl() for opening and close() for closing, from
> >>>>>> a symmetry point of view.
> >>>>>
> >>>>> Sorry to hear that. But no, it's a pretty normal skill in kernel codebase
> >>>>> , I had to copy some examples from reply to other reviewer[1].
> >>>>
> >>>> I missed this one, apologize for the duplication.
> >>>>
> >>>>>
> >>>>> anon_inode_get_{fd,file} are used extensively in kernel for returning a new
> >>>>> fd to userspace which is associated with an unique data structure in kernel
> >>>>> space, in different ways:
> >>>>>
> >>>>> - via ioctl(), some examples are:
> >>>>>
> >>>>> - KVM ioctl(s)
> >>>>> - KVM_CREATE_VCPU -> kvm_vm_ioctl_create_vcpu
> >>>>> - KVM_GET_STATS_FD -> kvm_vcpu_ioctl_get_stats_fd
> >>>>> - KVM_CREATE_DEVICE -> kvm_ioctl_create_device
> >>>>> - KVM_CREATE_VM -> kvm_dev_ioctl_create_vm
> >>>>>
> >>>>> - DMA buf/fence/sync ioctls
> >>>>> - DMA_BUF_IOCTL_EXPORT_SYNC_FILE -> dma_buf_export_sync_file
> >>>>> - SW_SYNC_IOC_CREATE_FENCE -> sw_sync_ioctl_create_fence
> >>>>> - Couples of driver implement DMA buf by using anon file _implicitly_:
> >>>>> - UDMABUF_CREATE -> udmabuf_ioctl_create
> >>>>> - DMA_HEAP_IOCTL_ALLOC -> dma_heap_ioctl_allocate
> >>>>>
> >>>>> - gpiolib ioctls:
> >>>>> - GPIO_GET_LINEHANDLE_IOCTL -> linehandle_create
> >>>>> - GPIO_V2_GET_LINE_IOCTL
> >>>>>
> >>>>> - IOMMUFD ioctls:
> >>>>>
> >>>>> - VFIO Ioctls:
> >>>>>
> >>>>> - ....
> >>>>>
> >>>>>
> >>>>> - via other specific syscalls:
> >>>>> - epoll_create1
> >>>>> - bpf
> >>>>> - perf_event_open
> >>>>> - inotify_init
> >>>>> - ...
> >>>>
> >>>> If we put the optimization aspect aside, what seems strange to me is that the
> >>>> purpose of rpmsg_char was to expose a FS character device to user space. If we
> >>>> need tobypass the use of /dev/rpmsgX, does it make sense to support an anonymous
> >>>> inode in this driver? I am clearly not legitimate to answer this question...
> >>>
> >>> You have every right to do so, after all, it's purely a technical
> >>> discussion :).
> >>>
> >>> I admit it's bit confusing to add annoymous inode logic to a file named
> >>> rpmsg_char.c which implies 'character' device. That's why I rename API
> >>> following Mathieu's comment:
> >>> - __rpmsg_chrdev_eptdev_alloc -> rpmsg_eptdev_alloc
> >>> - __rpmsg_chrdev_eptdev_add -> rpmsg_eptdev_add
> >>>
> >>> As to topic how these two uAPI(s) co-exist and affect each other. This
> >>> change is based on rules:
> >>>
> >>> 1. Never break existing uAPI.
> >>> 2. Try best to reuse existing codebase.
> >>> 3. Userspace can choose whatever approach they want to.
> >>>
> >>> Thanks,
> >>>
> >>> Dawei
> >>>>
> >>>>
> >>>> Thanks,
> >>>> Arnaud
> >>>>
> >>>>>
> >>>>> [1] https://lore.kernel.org/all/20250530125008.GA5355@wendao-VirtualBox/
> >>>>>
> >>>>>>
> >>>>>> Regarding your implementation, I wonder if we could keep the /dev/rpmsg<x>
> >>>>>> device with specific open() and close() file operations associated with your new
> >>>>>> ioctl.
> >>>>>>
> >>>>>> - The ioctl would create the endpoint.
> >>>>>> - The open() and close() operations would simply manage the file descriptor and
> >>>>>> increment/decrement a counter to prevent premature endpoint destruction.
> >>>>>>
> >>>>>>
> >>>>>> Regards,
> >>>>>> Arnaud
> >>>>>>
> >>>>>
> >>>>> [...]
> >>>>>
> >>>>> Thanks,
> >>>>>
> >>>>> Dawei
Powered by blists - more mailing lists