[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <685641cb-d5e0-4f28-87a1-98d2fd84e920@foss.st.com>
Date: Wed, 2 Jul 2025 09:48:59 +0200
From: Arnaud POULIQUEN <arnaud.pouliquen@...s.st.com>
To: Dawei Li <dawei.li@...ux.dev>
CC: <andersson@...nel.org>, <mathieu.poirier@...aro.org>,
<linux-remoteproc@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
<set_pte_at@...look.com>
Subject: Re: [PATCH v4 0/3] rpmsg: Introduce RPMSG_CREATE_EPT_FD_IOCTL uAPI
hello Dawei
On 7/1/25 16:16, Dawei Li wrote:
> Hi Arnaud,
>
> Thanks for the reply.
>
> On Mon, Jun 30, 2025 at 09:54:40AM +0200, Arnaud POULIQUEN wrote:
>> Hello Dawei,
>>
>> Sorry for the late answer.
>>
>> On 6/22/25 06:12, Dawei Li wrote:
>>> Hi Arnaud,
>>>
>>> Thanks for the reply.
>>>
>>> On Fri, Jun 20, 2025 at 09:52:03AM +0200, Arnaud POULIQUEN wrote:
>>>>
>>>>
>>>> On 6/19/25 16:43, Dawei Li wrote:
>>>>> Hi Arnaud,
>>>>> Thanks for review.
>>>>>
>>>>> On Wed, Jun 18, 2025 at 03:07:36PM +0200, Arnaud POULIQUEN wrote:
>>>>>> Hello Dawei,
>>>>>>
>>>>>>
>>>>>> Please find a few comments below. It is not clear to me which parts of your
>>>>>> implementation are mandatory and which are optional "nice-to-have" optimizations.
>>>>>
>>>>> It's more like an improvement.
>>>>>
>>>>>>
>>>>>> Based on (potentially erroneous) hypothesis, you will find a suggestion for an
>>>>>> alternative to the anonymous inode approach, which does not seem to be a common
>>>>>> interface.
>>>>>
>>>>> AFAIC, annoymous inode is a common interface and used extensivly in kernel development.
>>>>> Some examples below.
>>>>>
>>>>>>
>>>>>>
>>>>>> On 6/9/25 17:15, Dawei Li wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> This is V4 of series which introduce new uAPI(RPMSG_CREATE_EPT_FD_IOCTL)
>>>>>>> for rpmsg subsystem.
>>>>>>>
>>>>>>> Current uAPI implementation for rpmsg ctrl & char device manipulation is
>>>>>>> abstracted in procedures below:
>>>>>>> - fd = open("/dev/rpmsg_ctrlX")
>>>>>>> - ioctl(fd, RPMSG_CREATE_EPT_IOCTL, &info); /dev/rpmsgY devnode is
>>>>>>> generated.
>>>>>>> - fd_ep = open("/dev/rpmsgY", O_RDWR)
>>>>>>> - operations on fd_ep(write, read, poll ioctl)
>>>>>>> - ioctl(fd_ep, RPMSG_DESTROY_EPT_IOCTL)
>>>>>>> - close(fd_ep)
>>>>>>> - close(fd)
>>>>>>>
>>>>>>> This /dev/rpmsgY abstraction is less favorable for:
>>>>>>> - Performance issue: It's time consuming for some operations are
>>>>>>> invovled:
>>>>>>> - Device node creation.
>>>>>>> Depends on specific config, especially CONFIG_DEVTMPFS, the overall
>>>>>>> overhead is based on coordination between DEVTMPFS and userspace
>>>>>>> tools such as udev and mdev.
>>>>>>>
>>>>>>> - Extra kernel-space switch cost.
>>>>>>>
>>>>>>> - Other major costs brought by heavy-weight logic like device_add().
>>>>>>
>>>>>> Is this a blocker of just optimization?
>>>>>
>>>>> Yep, performance is one of motivations of this change.
>>>>>
>>>>>>
>>>>>>>
>>>>>>> - /dev/rpmsgY node can be opened only once. It doesn't make much sense
>>>>>>> that a dynamically created device node can be opened only once.
>>>>>>
>>>>>>
>>>>>> I assume this is blocker with the fact that you need to open the /dev/rpmsg<x>
>>>>>> to create the endpoint.
>>>>>
>>>>> Yes. You have to open /dev/rpmsgX which is generated by legacy ioctl to
>>>>> instantiate a new endpoint.
>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> - For some container application such as docker, a client can't access
>>>>>>> host's dev unless specified explicitly. But in case of /dev/rpmsgY, which
>>>>>>> is generated dynamically and whose existence is unknown for clients in
>>>>>>> advance, this uAPI based on device node doesn't fit well.
>>>>>>
>>>>>> does this could be solve in userspace parsing /sys/class/rpmsg/ directory to
>>>>>> retreive the device?
>>>>>
>>>>> Hardly, because client still can't access /dev/rpmsgX which is generated
>>>>> by host _after_ client is launched.
>>>>
>>>>
>>>> This part is not clear to me; could you provide more details?
>>>> I cannot figure out why a client can access /dev/rpmsg_ctrlX but not /dev/rpmsgX.
>>>
>>> Well, let's take docker as example:
>>
>>>
>>> For docker, when a client is launched and it wants to access host's
>>> device, it must make explicit request when it's launched:
>>>
>>> docker run --device=/dev/xxx
>>>
>>> Let's presume that xxx is /dev/rpmsgX generated dynamically by _host_.
>>> Docker command above knows nothing about these rpmsg nodes which are
>>> generated by host _after_ client is launched. And yes, parsing> /sys/class/rpmsg may acquire info about rpmsg devices, but client still
>>> can't access /dev/rpmsgX.
>>
>> One extra question:Are you using RPMsg over virtio?
>>
>> If yes, do you test the RPMsg name service (NS) announcement, that might also
>> address your needs.
>>
>> The principle is that the remote processor sends a name service announcement to
>> Linux, which probes the rpmsg character device and creates the /dev/rpmsgX
>> device in a predefined order known by the remote processor.
>> In such a case, the /dev/rpmsgX usage would be determined by the remote
>> processor itself.
>>
>> Another advantage is that the RPMsg channel creation is not driven by either the
>> host or the client. In such case host does no need to define/kwnow RPMSg
>> endpoint addresses.
>>
>> You still need to call the open() file system operation, but this should be done
>> one time during Docker client initialization.
>
> NS is nice, but perhaps it's not the approach for some cases.
>
> For offloading/accelerator scenarios, ACPU is responsible for making all
> the important decisions, including creations of endpoints. Because all
> the user-awared software stack is running on ACPU, and if you want to
> create a endpoint _dynamically_, it must be from user's command which is
> from ACPU.
>
> And this series is more about how rpmsg_char and rpmsg_ctrl coordinate
> themselves about creating dynamic rpmsg endpoints in a more simple
> and efficient way.
>
> And the whole point of series is "When you want to return a fd to
> userspace which represents an instance of data structure in kernel, you
> don't implement it as character device". Maybe some quotes from Christian[1]
> can describe it better[1]:
>
> "I'm not sure why people are so in love with character device based apis.
> It's terrible. It glues everything to devtmpfs which isn't namespacable
> in any way. It's terrible to delegate and extremely restrictive in terms
> of extensiblity if you need additional device entries (aka the loop
> driver folly)."
>
> [1] https://lkml.org/lkml/2025/6/24/639
Thank you for all your explanations! It is always very interesting to understand
the different ways to use RPMsg.
In the end, I don't see any problem with your series. The possibility of using
an anonymous inode seems valid to me.
I am just wondering whether this should be implemented in the rpmsg_char driver
or in a new driver, addressing this question to Mathieu and Bjorn.
Regards,
Arnaud
>
> Thanks,
>
> Dawei
>
>>
>> Regards
>> Arnaud
>>
>>
>>>
>>>>
>>>>
>>>>>
>>>>>>
>>>>>> You could face same kind of random instantiation for serial peripherals ( UART;
>>>>>> USb, I2C,...) based on a device tree enumeration. I suppose that user space
>>>>>> use to solve this.
>>>>>>
>>>>>>>
>>>>>>> An anonymous inode based approach is introduced to address the issues above.
>>>>>>> Rather than generating device node and opening it, rpmsg code just creates
>>>>>>> an anonymous inode representing eptdev and return the fd to userspace.
>>>>>>
>>>>>> A drawback is that you need to share fb passed between processes.
>>>>>
>>>>> Fd is the abstraction of an unique endpoint device, it holds true for
>>>>> both legacy and new approach.
>>>>>
>>>>> So I guess what you mean is that /dev/rpmsgX is global to all so other process
>>>>> can access it?
>>>>>
>>>>> But /dev/rpmsgX is designed to be opened only once, it's implemented as
>>>>> singleton pattern.
>>>>>
>>>>> static int rpmsg_eptdev_open(struct inode *inode, struct file *filp)
>>>>> {
>>>>> ...
>>>>> if (eptdev->ept) {
>>>>> mutex_unlock(&eptdev->ept_lock);
>>>>> return -EBUSY;
>>>>> }
>>>>> ...
>>>>> eptdev->ept = ept;
>>>>> ...
>>>>> }
>>>>>
>>>>> [...]
>>>>>
>>>>>>> printf("loop[%d]\n", loop);
>>>>>>>
>>>>>>> gettimeofday(&start, NULL);
>>>>>>>
>>>>>>> while (loop--) {
>>>>>>
>>>>>> Do you need to create /close Endpoint sevral times in your real use case with
>>>>>> high timing
>>>>>> constraint?
>>>>>
>>>>> No, it's just a silly benchmark demo, large sample reduces noise statistically.
>>>>>
>>>>>>
>>>>>>> fd_info.fd = -1;
>>>>>>> fd_info.flags = O_RDWR | O_CLOEXEC | O_NONBLOCK;
>>>>>>> ret = ioctl(fd, RPMSG_CREATE_EPT_FD_IOCTL, &fd_info);
>>>>>>> if (ret < 0 || fd_info.fd < 0) {
>>>>>>> printf("ioctl[RPMSG_CREATE_EPT_FD_IOCTL] failed, ret[%d]\n", ret);
>>>>>>> }
>>>>>>>
>>>>>>
>>>>>>
>>>>>>> ret = ioctl(fd_info.fd, RPMSG_DESTROY_EPT_IOCTL, &info);
>>>>>>> if (ret < 0) {
>>>>>>> printf("new ioctl[RPMSG_DESTROY_EPT_IOCTL] failed, ret[%d]\n", ret);
>>>>>>> }
>>>>>>>
>>>>>>> close(fd_info.fd);
>>>>>>
>>>>>> It seems strange to me to use ioctl() for opening and close() for closing, from
>>>>>> a symmetry point of view.
>>>>>
>>>>> Sorry to hear that. But no, it's a pretty normal skill in kernel codebase
>>>>> , I had to copy some examples from reply to other reviewer[1].
>>>>
>>>> I missed this one, apologize for the duplication.
>>>>
>>>>>
>>>>> anon_inode_get_{fd,file} are used extensively in kernel for returning a new
>>>>> fd to userspace which is associated with an unique data structure in kernel
>>>>> space, in different ways:
>>>>>
>>>>> - via ioctl(), some examples are:
>>>>>
>>>>> - KVM ioctl(s)
>>>>> - KVM_CREATE_VCPU -> kvm_vm_ioctl_create_vcpu
>>>>> - KVM_GET_STATS_FD -> kvm_vcpu_ioctl_get_stats_fd
>>>>> - KVM_CREATE_DEVICE -> kvm_ioctl_create_device
>>>>> - KVM_CREATE_VM -> kvm_dev_ioctl_create_vm
>>>>>
>>>>> - DMA buf/fence/sync ioctls
>>>>> - DMA_BUF_IOCTL_EXPORT_SYNC_FILE -> dma_buf_export_sync_file
>>>>> - SW_SYNC_IOC_CREATE_FENCE -> sw_sync_ioctl_create_fence
>>>>> - Couples of driver implement DMA buf by using anon file _implicitly_:
>>>>> - UDMABUF_CREATE -> udmabuf_ioctl_create
>>>>> - DMA_HEAP_IOCTL_ALLOC -> dma_heap_ioctl_allocate
>>>>>
>>>>> - gpiolib ioctls:
>>>>> - GPIO_GET_LINEHANDLE_IOCTL -> linehandle_create
>>>>> - GPIO_V2_GET_LINE_IOCTL
>>>>>
>>>>> - IOMMUFD ioctls:
>>>>>
>>>>> - VFIO Ioctls:
>>>>>
>>>>> - ....
>>>>>
>>>>>
>>>>> - via other specific syscalls:
>>>>> - epoll_create1
>>>>> - bpf
>>>>> - perf_event_open
>>>>> - inotify_init
>>>>> - ...
>>>>
>>>> If we put the optimization aspect aside, what seems strange to me is that the
>>>> purpose of rpmsg_char was to expose a FS character device to user space. If we
>>>> need tobypass the use of /dev/rpmsgX, does it make sense to support an anonymous
>>>> inode in this driver? I am clearly not legitimate to answer this question...
>>>
>>> You have every right to do so, after all, it's purely a technical
>>> discussion :).
>>>
>>> I admit it's bit confusing to add annoymous inode logic to a file named
>>> rpmsg_char.c which implies 'character' device. That's why I rename API
>>> following Mathieu's comment:
>>> - __rpmsg_chrdev_eptdev_alloc -> rpmsg_eptdev_alloc
>>> - __rpmsg_chrdev_eptdev_add -> rpmsg_eptdev_add
>>>
>>> As to topic how these two uAPI(s) co-exist and affect each other. This
>>> change is based on rules:
>>>
>>> 1. Never break existing uAPI.
>>> 2. Try best to reuse existing codebase.
>>> 3. Userspace can choose whatever approach they want to.
>>>
>>> Thanks,
>>>
>>> Dawei
>>>>
>>>>
>>>> Thanks,
>>>> Arnaud
>>>>
>>>>>
>>>>> [1] https://lore.kernel.org/all/20250530125008.GA5355@wendao-VirtualBox/
>>>>>
>>>>>>
>>>>>> Regarding your implementation, I wonder if we could keep the /dev/rpmsg<x>
>>>>>> device with specific open() and close() file operations associated with your new
>>>>>> ioctl.
>>>>>>
>>>>>> - The ioctl would create the endpoint.
>>>>>> - The open() and close() operations would simply manage the file descriptor and
>>>>>> increment/decrement a counter to prevent premature endpoint destruction.
>>>>>>
>>>>>>
>>>>>> Regards,
>>>>>> Arnaud
>>>>>>
>>>>>
>>>>> [...]
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Dawei
Powered by blists - more mailing lists