[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20260130205227.6fb1d9ad@pumpkin>
Date: Fri, 30 Jan 2026 20:52:27 +0000
From: David Laight <david.laight.linux@...il.com>
To: Breno Leitao <leitao@...ian.org>
Cc: "David S. Miller" <davem@...emloft.net>, Eric Dumazet
<edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>, Paolo Abeni
<pabeni@...hat.com>, Simon Horman <horms@...nel.org>, Kuniyuki Iwashima
<kuniyu@...gle.com>, Willem de Bruijn <willemb@...gle.com>,
metze@...ba.org, axboe@...nel.dk, Stanislav Fomichev <sdf@...ichev.me>,
io-uring@...r.kernel.org, bpf@...r.kernel.org, netdev@...r.kernel.org,
Linus Torvalds <torvalds@...ux-foundation.org>,
linux-kernel@...r.kernel.org, kernel-team@...a.com
Subject: Re: [PATCH net-next RFC 0/3] net: move .getsockopt away from __user
buffers
On Fri, 30 Jan 2026 10:46:16 -0800
Breno Leitao <leitao@...ian.org> wrote:
> Currently, .getsockopt callback cannot be called with kernel buffers
> because it requires userspace addresses:
>
> int (*getsockopt)(struct socket *sock, int level,
> int optname, char __user *optval, int __user *optlen);
>
> This prevents kernel callers (io_uring, BPF, etc) from using getsockopt
> on levels other than SOL_SOCKET, since they pass kernel pointers rather
> than __user pointers.
I had thoughts about this as well.
I think using iov_iter is over the top and may have measurable performance
impact for some paths.
I think the first thing to do is sort out 'optlen'.
There is absolutely no reason for the user pointer being passed into
all the per-protocol functions.
(and the code that changes that use sockptr_t are just stupid...)
The system call wrapper can do the user copies, it can also suppress
the write if the value is unchanged (which matters with clac/slac).
The obvious change would be to pass the length itself and make the
return value -ERRNO or the size.
The annoyance is the few places that want to return an error and
change optlen.
That might be best addresses by something like:
#define GETSOCKOPT_RVAL(errval, size) (1 << 31 | (errval) << 20 | (size))
which would get picked in the rval < 0 path.
It would also let 'return 0' mean 'don't change the size' requiring
a special return for the one (or two?) places that want to set the
size to zero and return success.
The length passed should also be 'unsigned int' - with a check for
negative values in the system call wrapper.
(There are many broken drivers that treat negative lengths as 4.)
There is not much point making the 'optval' parameter more than
a structure of a user and kernel address - one of which will be NULL.
(This is safer than sockptr_t's discriminant union.)
You can't police the length because it is sometimes only the length
of a header (and in some recent code as well).
I have looked at some of this change - it is enormous.
David
>
> Following Linus' suggestion [0], this series introduces a wrapper
> around iov_iter (sockopt_t) and a temporary getsockopt_iter callback:
>
> typedef struct sockopt {
> struct iov_iter iter;
> int optlen;
> } sockopt_t;
>
> Note: optlen was not suggested by Linus' but I believe it is needed, given
> random values could be passed by protocols back to userspace.
>
> And the callback becomes:
>
> int (*getsockopt_iter)(struct socket *sock, int level,
> int optname, sockopt_t *opt);
>
> The sockopt_t structure encapsulates:
> - An iov_iter for reading/writing option data (works with both user
> and kernel buffers)
> - An optlen field for buffer size (input) and returned data size
> (output)
>
> The plan is to enable getsockopt to leverage kernel buffers initially,
> but then move .setsockopt from sockptr_t into this as well.
>
> This series:
>
> 1. Adds the sockopt_t type and getsockopt_iter callback to proto_ops
> 2. Adds do_sock_getsockopt_iter() helper that prefers getsockopt_iter
> 3. Converts one protocol (netlink) to use getsockopt_iter as a proof of
> concept
>
> This is what I have in mind for this work stream, to make it more
> digestible:
>
> * Keep the temporary getsockopt_iter callback allows protocols to
> migrate gradually.
> * Once all protocols have been converted, getsockopt can be removed and
> getsockopt_iter renamed back to getsockopt with the new API.
> * Once the protocols are converted, the SOL_SOCKET limitation in
> io_uring_cmd_getsockopt() will be removed.
> * Covert setsockopt() to also use a similar strategy, moving it away
> from sockptr_t.
> * Remove sockptr_t in the front end (do_sock_getsockopt(),
> io_uring_cmd_getsockopt()) and start with sockopt_t (instead of
> sockptr_t) in __sys_getsockopt() and io_uring_cmd_getsockopt()
>
> Link: https://lore.kernel.org/all/CAHk-=whmzrO-BMU=uSVXbuoLi-3tJsO=0kHj1BCPBE3F2kVhTA@mail.gmail.com/ [0]
> ---
> Breno Leitao (3):
> net: add getsockopt_iter callback to proto_ops
> net: prefer getsockopt_iter in do_sock_getsockopt
> netlink: convert to getsockopt_iter
>
> include/linux/net.h | 19 +++++++++++++++++++
> net/netlink/af_netlink.c | 22 ++++++++++++----------
> net/socket.c | 42 +++++++++++++++++++++++++++++++++++++++---
> 3 files changed, 70 insertions(+), 13 deletions(-)
> ---
> base-commit: 4d310797262f0ddf129e76c2aad2b950adaf1fda
> change-id: 20260130-getsockopt-9f36625eedcb
>
> Best regards,
> --
> Breno Leitao <leitao@...ian.org>
>
>
Powered by blists - more mailing lists