[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2469565.N6oiQLH2B8@wuerfel>
Date: Thu, 27 Oct 2016 11:11:09 +0200
From: Arnd Bergmann <arnd@...db.de>
To: Tom Gundersen <teg@...m.no>
Cc: Andy Lutomirski <luto@...capital.net>,
David Herrmann <dh.herrmann@...il.com>,
Hannes Reinecke <hare@...e.com>,
Jiri Kosina <jikos@...nel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Josh Triplett <josh@...htriplett.org>,
Greg KH <greg@...ah.com>, Steven Rostedt <rostedt@...dmis.org>,
Linus Torvalds <torvalds@...ux-foundation.org>
Subject: Re: [RFC v1 02/14] bus1: provide stub cdev /dev/bus1
On Thursday, October 27, 2016 1:54:05 AM CEST Tom Gundersen wrote:
> On Thu, Oct 27, 2016 at 1:19 AM, Andy Lutomirski <luto@...capital.net> wrote:
> > This may have been covered elsewhere, but could this use syscalls instead?
>
> Yes, syscalls would work essentially the same. For now, we are using a
> cdev as it makes it a lot more convenient to develop and test as an
> out-of-tree module, but that could be changed easily before the final
> submission, if that's what we want.
Generally speaking, I think syscalls would be appropriate here, and put
bus1 into a similar category as the other ipc interfaces (shm, msg, sem,
mqueue, ...).
However, syscall API design is nontrivial, and will require a bit of
work to come to a set of syscalls that is fairly compact but also
extensible enough. I think it makes sense to go through the exercise
of working out what the syscall interface would end up looking like,
and then make a decision.
There is currently a set of file operations:
@@ -48,7 +90,11 @@ const struct file_operations bus1_fops = {
.owner = THIS_MODULE,
.open = bus1_fop_open,
.release = bus1_fop_release,
+ .poll = bus1_fop_poll,
.llseek = noop_llseek,
+ .mmap = bus1_fop_mmap,
+ .unlocked_ioctl = bus1_peer_ioctl,
+ .compat_ioctl = bus1_peer_ioctl,
.show_fdinfo = bus1_fop_show_fdinfo,
};
and then another set of ioctls:
+enum {
+ BUS1_CMD_PEER_DISCONNECT = _IOWR(BUS1_IOCTL_MAGIC, 0x00,
+ __u64),
+ BUS1_CMD_PEER_QUERY = _IOWR(BUS1_IOCTL_MAGIC, 0x01,
+ struct bus1_cmd_peer_reset),
+ BUS1_CMD_PEER_RESET = _IOWR(BUS1_IOCTL_MAGIC, 0x02,
+ struct bus1_cmd_peer_reset),
+ BUS1_CMD_HANDLE_RELEASE = _IOWR(BUS1_IOCTL_MAGIC, 0x10,
+ __u64),
+ BUS1_CMD_HANDLE_TRANSFER = _IOWR(BUS1_IOCTL_MAGIC, 0x11,
+ struct bus1_cmd_handle_transfer),
+ BUS1_CMD_NODES_DESTROY = _IOWR(BUS1_IOCTL_MAGIC, 0x20,
+ struct bus1_cmd_nodes_destroy),
+ BUS1_CMD_SLICE_RELEASE = _IOWR(BUS1_IOCTL_MAGIC, 0x30,
+ __u64),
+ BUS1_CMD_SEND = _IOWR(BUS1_IOCTL_MAGIC, 0x40,
+ struct bus1_cmd_send),
+ BUS1_CMD_RECV = _IOWR(BUS1_IOCTL_MAGIC, 0x50,
+ struct bus1_cmd_recv),
+};
I think there is no alternative to having some sort of file descriptor
with the basic operations you have above, but there is a question of
how to get that file descriptor if the ioctls get changed to a syscall,
the basic options being:
- Keep using a chardev. This works, but feels a little odd to me,
and I can't think of any other interfaces combining syscalls with
a chardev.
- Have one syscall that returns an open file descriptor, replacing
the fops->open() function. One advantage is that you can pass
additional arguments in that you can't have with open.
An example for this would be mqueue_open().
- Have a mountable file system, and use open() on that to create
connections. Advantages are that it's fairly easy to have one
instance per fs-namespace, and you can have user-defined naming
of objects in the file system.
For the other operations, the obvious translation would be to
turn each ioctl command into one syscall, but that may not always
be the best representation. One limitation is that you cannot
generally have more than six 'long' arguments on a lot of
architectures, and passing 'u64' arguments to syscalls is awkward.
For some of the commands, the transformation would be straightforward
if we assume that the 'u64' arguments can actually be 'long',
I guess like this:
+struct bus1_cmd_handle_transfer {
+ __u64 flags;
+ __u64 src_handle;
+ __u64 dst_fd;
+ __u64 dst_handle;
+} __attribute__((__aligned__(8)));
long bus1_handle_transfer(int fd, unsigned long handle,
int dst_fd, unsigned long *dst_handle, unsigned int flags);
+struct bus1_cmd_nodes_destroy {
+ __u64 flags;
+ __u64 ptr_nodes;
+ __u64 n_nodes;
+} __attribute__((__aligned__(8)));
long bus1_nodes_destroy(int fd, u64 *ptr_nodes,
long n_nodes, unsigned int flags);
However, the peer_reset would exceed the 6-argument limit when you count
the initial file descriptor even if you assume that 'flags' can be
made 32-bit:
+struct bus1_cmd_peer_reset {
+ __u64 flags;
+ __u64 peer_flags;
+ __u32 max_slices;
+ __u32 max_handles;
+ __u32 max_inflight_bytes;
+ __u32 max_inflight_fds;
+} __attribute__((__aligned__(8)));
maybe something slightly ugly like
long bus1_peer_reset(int fd, const struct bus1_peer_limits *param,
unsigned int flags);
a library might provide a wrapper that passes all the limits
as separate arguments.
The receive function would be fairly straightforward again, as
we just pass a pointer to the returned message, and all inputs
can be arguments, but the send command with this structure
+struct bus1_cmd_send {
+ __u64 flags;
+ __u64 ptr_destinations;
+ __u64 ptr_errors;
+ __u64 n_destinations;
+ __u64 ptr_vecs;
+ __u64 n_vecs;
+ __u64 ptr_handles;
+ __u64 n_handles;
+ __u64 ptr_fds;
+ __u64 n_fds;
+} __attribute__((__aligned__(8)));
is really tricky, as it's such a central interface but it's
also really complex, with its five indirect pointers to
variable-length arrays, making a total of 11 arguments
(including the first fd). Turning this into a syscall would
probably make a more efficient interface, so maybe some
of the arrays can be turned into a single argument and
require the user to call it multiple times instead of the
kernel looping around it.
The minimal version would be something like
long bus1_send(int fd, long dst, struct iovec *vecs, int n_vecs,
long handle, int dst_fd);
so you already get to six arguments with one destination, one
handle and one fd but no flags. Replacing vecs/n_vecs with pointer
and length doesn't help either, so I guess whatever we do here
we have to use some indirect structure.
Arnd
Powered by blists - more mailing lists