linux-kernel - Re: [PATCH v4 0/3] rpmsg: Introduce RPMSG_CREATE_EPT_FD

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <685641cb-d5e0-4f28-87a1-98d2fd84e920@foss.st.com>
Date: Wed, 2 Jul 2025 09:48:59 +0200
From: Arnaud POULIQUEN <arnaud.pouliquen@...s.st.com>
To: Dawei Li <dawei.li@...ux.dev>
CC: <andersson@...nel.org>, <mathieu.poirier@...aro.org>,
        <linux-remoteproc@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
        <set_pte_at@...look.com>
Subject: Re: [PATCH v4 0/3] rpmsg: Introduce RPMSG_CREATE_EPT_FD_IOCTL uAPI

hello Dawei

On 7/1/25 16:16, Dawei Li wrote:
> Hi Arnaud,
> 
> Thanks for the reply.
> 
> On Mon, Jun 30, 2025 at 09:54:40AM +0200, Arnaud POULIQUEN wrote:
>> Hello Dawei,
>>
>> Sorry for the late answer.
>>
>> On 6/22/25 06:12, Dawei Li wrote:
>>> Hi Arnaud,
>>>
>>> Thanks for the reply.
>>>
>>> On Fri, Jun 20, 2025 at 09:52:03AM +0200, Arnaud POULIQUEN wrote:
>>>>
>>>>
>>>> On 6/19/25 16:43, Dawei Li wrote:
>>>>> Hi Arnaud, 
>>>>> Thanks for review.
>>>>>
>>>>> On Wed, Jun 18, 2025 at 03:07:36PM +0200, Arnaud POULIQUEN wrote:
>>>>>> Hello Dawei,
>>>>>>
>>>>>>
>>>>>> Please find a few comments below. It is not clear to me which parts of your
>>>>>> implementation are mandatory and which are optional "nice-to-have" optimizations.
>>>>>
>>>>> It's more like an improvement.
>>>>>
>>>>>>
>>>>>> Based on (potentially erroneous) hypothesis, you will find a suggestion for an
>>>>>> alternative to the anonymous inode approach, which does not seem to be a common
>>>>>> interface.
>>>>>
>>>>> AFAIC, annoymous inode is a common interface and used extensivly in kernel development.
>>>>> Some examples below.
>>>>>
>>>>>>
>>>>>>
>>>>>> On 6/9/25 17:15, Dawei Li wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> This is V4 of series which introduce new uAPI(RPMSG_CREATE_EPT_FD_IOCTL)
>>>>>>> for rpmsg subsystem.
>>>>>>>
>>>>>>> Current uAPI implementation for rpmsg ctrl & char device manipulation is
>>>>>>> abstracted in procedures below:
>>>>>>> - fd = open("/dev/rpmsg_ctrlX")
>>>>>>> - ioctl(fd, RPMSG_CREATE_EPT_IOCTL, &info); /dev/rpmsgY devnode is
>>>>>>>   generated.
>>>>>>> - fd_ep = open("/dev/rpmsgY", O_RDWR) 
>>>>>>> - operations on fd_ep(write, read, poll ioctl)
>>>>>>> - ioctl(fd_ep, RPMSG_DESTROY_EPT_IOCTL)
>>>>>>> - close(fd_ep)
>>>>>>> - close(fd)
>>>>>>>
>>>>>>> This /dev/rpmsgY abstraction is less favorable for:
>>>>>>> - Performance issue: It's time consuming for some operations are
>>>>>>> invovled:
>>>>>>>   - Device node creation.
>>>>>>>     Depends on specific config, especially CONFIG_DEVTMPFS, the overall
>>>>>>>     overhead is based on coordination between DEVTMPFS and userspace
>>>>>>>     tools such as udev and mdev.
>>>>>>>
>>>>>>>   - Extra kernel-space switch cost.
>>>>>>>
>>>>>>>   - Other major costs brought by heavy-weight logic like device_add().
>>>>>>
>>>>>> Is this a blocker of just optimization?
>>>>>
>>>>> Yep, performance is one of motivations of this change.
>>>>>
>>>>>>
>>>>>>>
>>>>>>> - /dev/rpmsgY node can be opened only once. It doesn't make much sense
>>>>>>>     that a dynamically created device node can be opened only once.
>>>>>>
>>>>>>
>>>>>> I assume this is blocker with the fact that you need to open the /dev/rpmsg<x>
>>>>>> to create the endpoint.
>>>>>
>>>>> Yes. You have to open /dev/rpmsgX which is generated by legacy ioctl to
>>>>> instantiate a new endpoint.
>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> - For some container application such as docker, a client can't access
>>>>>>>   host's dev unless specified explicitly. But in case of /dev/rpmsgY, which
>>>>>>>   is generated dynamically and whose existence is unknown for clients in
>>>>>>>   advance, this uAPI based on device node doesn't fit well.
>>>>>>
>>>>>> does this could be solve in userspace parsing /sys/class/rpmsg/ directory to
>>>>>> retreive the device?
>>>>>
>>>>> Hardly, because client still can't access /dev/rpmsgX which is generated
>>>>> by host _after_ client is launched.
>>>>
>>>>
>>>> This part is not clear to me; could you provide more details?
>>>> I cannot figure out why a client can access /dev/rpmsg_ctrlX but not /dev/rpmsgX.
>>>
>>> Well, let's take docker as example:
>>
>>>
>>> For docker, when a client is launched and it wants to access host's
>>> device, it must make explicit request when it's launched:
>>>
>>> docker run --device=/dev/xxx
>>>
>>> Let's presume that xxx is /dev/rpmsgX generated dynamically by _host_.
>>> Docker command above knows nothing about these rpmsg nodes which are
>>> generated by host _after_ client is launched. And yes, parsing> /sys/class/rpmsg may acquire info about rpmsg devices, but client still
>>> can't access /dev/rpmsgX.
>>
>> One extra question:Are you using RPMsg over virtio?
>>
>> If yes, do you test the RPMsg name service (NS) announcement, that might also
>> address your needs.
>>
>> The principle is that the remote processor sends a name service announcement to
>> Linux, which probes the rpmsg character device and creates the /dev/rpmsgX
>> device in a predefined order known by the remote processor.
>> In such a case, the /dev/rpmsgX usage would be determined by the remote
>> processor itself.
>>
>> Another advantage is that the RPMsg channel creation is not driven by either the
>> host or the client. In such case host does no need to define/kwnow RPMSg
>> endpoint addresses.
>>
>> You still need to call the open() file system operation, but this should be done
>> one time during Docker client initialization.
> 
> NS is nice, but perhaps it's not the approach for some cases.
> 
> For offloading/accelerator scenarios, ACPU is responsible for making all
> the important decisions, including creations of endpoints. Because all
> the user-awared software stack is running on ACPU, and if you want to
> create a endpoint _dynamically_, it must be from user's command which is
> from ACPU.
> 
> And this series is more about how rpmsg_char and rpmsg_ctrl coordinate
> themselves about creating dynamic rpmsg endpoints in a more simple
> and efficient way.
> 
> And the whole point of series is "When you want to return a fd to
> userspace which represents an instance of data structure in kernel, you
> don't implement it as character device". Maybe some quotes from Christian[1]
> can describe it better[1]:
> 
> "I'm not sure why people are so in love with character device based apis.
> It's terrible. It glues everything to devtmpfs which isn't namespacable
> in any way. It's terrible to delegate and extremely restrictive in terms
> of extensiblity if you need additional device entries (aka the loop
> driver folly)."
> 
> [1] https://lkml.org/lkml/2025/6/24/639

Thank you for all your explanations! It is always very interesting to understand
the different ways to use RPMsg.

In the end, I don't see any problem with your series. The possibility of using
an anonymous inode seems valid to me.

I am just wondering whether this should be implemented in the rpmsg_char driver
or in a new driver, addressing this question to Mathieu and Bjorn.


Regards,
Arnaud

> 
> Thanks,
> 
> 	Dawei
> 
>>
>> Regards
>> Arnaud
>>
>>
>>>
>>>>
>>>>
>>>>>
>>>>>>
>>>>>> You could face same kind of random instantiation for serial peripherals ( UART;
>>>>>> USb, I2C,...) based on a device tree enumeration. I suppose that user space
>>>>>> use to solve this.
>>>>>>
>>>>>>>
>>>>>>> An anonymous inode based approach is introduced to address the issues above.
>>>>>>> Rather than generating device node and opening it, rpmsg code just creates
>>>>>>> an anonymous inode representing eptdev and return the fd to userspace.
>>>>>>
>>>>>> A drawback is that you need to share fb passed between processes.
>>>>>
>>>>> Fd is the abstraction of an unique endpoint device, it holds true for
>>>>> both legacy and new approach.
>>>>>
>>>>> So I guess what you mean is that /dev/rpmsgX is global to all so other process
>>>>> can access it?
>>>>>
>>>>> But /dev/rpmsgX is designed to be opened only once, it's implemented as
>>>>> singleton pattern.
>>>>>
>>>>> static int rpmsg_eptdev_open(struct inode *inode, struct file *filp)
>>>>> {
>>>>> ...
>>>>>         if (eptdev->ept) {
>>>>>                 mutex_unlock(&eptdev->ept_lock);
>>>>>                 return -EBUSY;
>>>>>         }
>>>>> ...
>>>>>         eptdev->ept = ept;
>>>>> ...
>>>>> }
>>>>>
>>>>> [...]
>>>>>  
>>>>>>> 	printf("loop[%d]\n", loop);
>>>>>>>
>>>>>>> 	gettimeofday(&start, NULL);
>>>>>>>
>>>>>>> 	while (loop--) {
>>>>>>
>>>>>> Do you need to create /close Endpoint sevral times in your real use case with
>>>>>> high timing
>>>>>> constraint?
>>>>>
>>>>> No, it's just a silly benchmark demo, large sample reduces noise statistically.
>>>>>
>>>>>>
>>>>>>> 		fd_info.fd = -1;
>>>>>>> 		fd_info.flags = O_RDWR | O_CLOEXEC | O_NONBLOCK;
>>>>>>> 		ret = ioctl(fd, RPMSG_CREATE_EPT_FD_IOCTL, &fd_info);
>>>>>>> 		if (ret < 0 || fd_info.fd < 0) {
>>>>>>> 			printf("ioctl[RPMSG_CREATE_EPT_FD_IOCTL] failed, ret[%d]\n", ret);
>>>>>>> 		}
>>>>>>>
>>>>>>
>>>>>>
>>>>>>> 		ret = ioctl(fd_info.fd, RPMSG_DESTROY_EPT_IOCTL, &info);
>>>>>>> 		if (ret < 0) {
>>>>>>> 			printf("new ioctl[RPMSG_DESTROY_EPT_IOCTL] failed, ret[%d]\n", ret);
>>>>>>> 		}
>>>>>>>
>>>>>>> 		close(fd_info.fd);
>>>>>>
>>>>>> It seems strange to me to use ioctl() for opening and close() for closing, from
>>>>>> a symmetry point of view.
>>>>>
>>>>> Sorry to hear that. But no, it's a pretty normal skill in kernel codebase
>>>>> , I had to copy some examples from reply to other reviewer[1].
>>>>
>>>> I missed this one, apologize for the duplication.
>>>>
>>>>>
>>>>> anon_inode_get_{fd,file} are used extensively in kernel for returning a new
>>>>> fd to userspace which is associated with an unique data structure in kernel
>>>>> space, in different ways:
>>>>>
>>>>> - via ioctl(), some examples are:
>>>>>
>>>>>  - KVM ioctl(s)
>>>>>    - KVM_CREATE_VCPU -> kvm_vm_ioctl_create_vcpu
>>>>>    - KVM_GET_STATS_FD -> kvm_vcpu_ioctl_get_stats_fd
>>>>>    - KVM_CREATE_DEVICE -> kvm_ioctl_create_device
>>>>>    - KVM_CREATE_VM -> kvm_dev_ioctl_create_vm
>>>>>
>>>>>  - DMA buf/fence/sync ioctls
>>>>>    - DMA_BUF_IOCTL_EXPORT_SYNC_FILE -> dma_buf_export_sync_file
>>>>>    - SW_SYNC_IOC_CREATE_FENCE -> sw_sync_ioctl_create_fence
>>>>>    - Couples of driver implement DMA buf by using anon file _implicitly_:
>>>>>      - UDMABUF_CREATE -> udmabuf_ioctl_create
>>>>>      - DMA_HEAP_IOCTL_ALLOC -> dma_heap_ioctl_allocate
>>>>>
>>>>>  - gpiolib ioctls:
>>>>>    - GPIO_GET_LINEHANDLE_IOCTL -> linehandle_create
>>>>>    - GPIO_V2_GET_LINE_IOCTL
>>>>>
>>>>>  -  IOMMUFD ioctls:
>>>>>
>>>>>  -  VFIO Ioctls:
>>>>>
>>>>>  - ....
>>>>>
>>>>>
>>>>> - via other specific syscalls:
>>>>>  - epoll_create1
>>>>>  - bpf
>>>>>  - perf_event_open
>>>>>  - inotify_init
>>>>>  - ...
>>>>
>>>> If we put the optimization aspect aside, what seems strange to me is that the
>>>> purpose of rpmsg_char was to expose a FS character device to user space. If we
>>>> need tobypass the use of /dev/rpmsgX, does it make sense to support an anonymous
>>>> inode in this driver?  I am clearly not legitimate to answer this question...
>>>
>>> You have every right to do so, after all, it's purely a technical
>>> discussion :).
>>>
>>> I admit it's bit confusing to add annoymous inode logic to a file named
>>> rpmsg_char.c which implies 'character' device. That's why I rename API
>>> following Mathieu's comment:
>>>   - __rpmsg_chrdev_eptdev_alloc ->  rpmsg_eptdev_alloc
>>>   - __rpmsg_chrdev_eptdev_add ->  rpmsg_eptdev_add
>>>
>>> As to topic how these two uAPI(s) co-exist and affect each other. This
>>> change is based on rules:
>>>
>>> 1. Never break existing uAPI.
>>> 2. Try best to reuse existing codebase.
>>> 3. Userspace can choose whatever approach they want to.
>>>
>>> Thanks,
>>>
>>> 	Dawei
>>>>
>>>>
>>>> Thanks,
>>>> Arnaud
>>>>
>>>>>
>>>>> [1] https://lore.kernel.org/all/20250530125008.GA5355@wendao-VirtualBox/
>>>>>
>>>>>>
>>>>>> Regarding your implementation, I wonder if we could keep the /dev/rpmsg<x>
>>>>>> device with specific open() and close() file operations associated with your new
>>>>>> ioctl.
>>>>>>
>>>>>> - The ioctl would create the endpoint.
>>>>>> - The open() and close() operations would simply manage the file descriptor and
>>>>>> increment/decrement a counter to prevent premature endpoint destruction.
>>>>>>
>>>>>>
>>>>>> Regards,
>>>>>> Arnaud
>>>>>>
>>>>>
>>>>> [...]
>>>>>
>>>>> Thanks,
>>>>>
>>>>> 	Dawei