lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <39515c76-310d-41af-a8b4-a814841449e3@samba.org>
Date: Tue, 1 Apr 2025 10:19:05 +0200
From: Stefan Metzmacher <metze@...ba.org>
To: Stanislav Fomichev <stfomichev@...il.com>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>,
 Jens Axboe <axboe@...nel.dk>, Pavel Begunkov <asml.silence@...il.com>,
 Breno Leitao <leitao@...ian.org>, Jakub Kicinski <kuba@...nel.org>,
 Christoph Hellwig <hch@....de>, Karsten Keil <isdn@...ux-pingi.de>,
 Ayush Sawal <ayush.sawal@...lsio.com>, Andrew Lunn <andrew+netdev@...n.ch>,
 "David S. Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>,
 Paolo Abeni <pabeni@...hat.com>, Simon Horman <horms@...nel.org>,
 Kuniyuki Iwashima <kuniyu@...zon.com>, Willem de Bruijn
 <willemb@...gle.com>, David Ahern <dsahern@...nel.org>,
 Marcelo Ricardo Leitner <marcelo.leitner@...il.com>,
 Xin Long <lucien.xin@...il.com>, Neal Cardwell <ncardwell@...gle.com>,
 Joerg Reuter <jreuter@...na.de>, Marcel Holtmann <marcel@...tmann.org>,
 Johan Hedberg <johan.hedberg@...il.com>,
 Luiz Augusto von Dentz <luiz.dentz@...il.com>,
 Oliver Hartkopp <socketcan@...tkopp.net>,
 Marc Kleine-Budde <mkl@...gutronix.de>,
 Robin van der Gracht <robin@...tonic.nl>,
 Oleksij Rempel <o.rempel@...gutronix.de>, kernel@...gutronix.de,
 Alexander Aring <alex.aring@...il.com>,
 Stefan Schmidt <stefan@...enfreihafen.org>,
 Miquel Raynal <miquel.raynal@...tlin.com>,
 Alexandra Winter <wintera@...ux.ibm.com>,
 Thorsten Winkler <twinkler@...ux.ibm.com>,
 James Chapman <jchapman@...alix.com>, Jeremy Kerr <jk@...econstruct.com.au>,
 Matt Johnston <matt@...econstruct.com.au>,
 Matthieu Baerts <matttbe@...nel.org>, Mat Martineau <martineau@...nel.org>,
 Geliang Tang <geliang@...nel.org>, Krzysztof Kozlowski <krzk@...nel.org>,
 Remi Denis-Courmont <courmisch@...il.com>,
 Allison Henderson <allison.henderson@...cle.com>,
 David Howells <dhowells@...hat.com>, Marc Dionne <marc.dionne@...istor.com>,
 Wenjia Zhang <wenjia@...ux.ibm.com>, Jan Karcher <jaka@...ux.ibm.com>,
 "D. Wythe" <alibuda@...ux.alibaba.com>, Tony Lu <tonylu@...ux.alibaba.com>,
 Wen Gu <guwen@...ux.alibaba.com>, Jon Maloy <jmaloy@...hat.com>,
 Boris Pismenny <borisp@...dia.com>, John Fastabend
 <john.fastabend@...il.com>, Stefano Garzarella <sgarzare@...hat.com>,
 Martin Schiller <ms@....tdt.de>, Björn Töpel
 <bjorn@...nel.org>, Magnus Karlsson <magnus.karlsson@...el.com>,
 Maciej Fijalkowski <maciej.fijalkowski@...el.com>,
 Jonathan Lemon <jonathan.lemon@...il.com>,
 Alexei Starovoitov <ast@...nel.org>, Daniel Borkmann <daniel@...earbox.net>,
 Jesper Dangaard Brouer <hawk@...nel.org>, netdev@...r.kernel.org,
 linux-kernel@...r.kernel.org, linux-sctp@...r.kernel.org,
 linux-hams@...r.kernel.org, linux-bluetooth@...r.kernel.org,
 linux-can@...r.kernel.org, dccp@...r.kernel.org, linux-wpan@...r.kernel.org,
 linux-s390@...r.kernel.org, mptcp@...ts.linux.dev,
 linux-rdma@...r.kernel.org, rds-devel@....oracle.com,
 linux-afs@...ts.infradead.org, tipc-discussion@...ts.sourceforge.net,
 virtualization@...ts.linux.dev, linux-x25@...r.kernel.org,
 bpf@...r.kernel.org, isdn4linux@...tserv.isdn4linux.de,
 io-uring@...r.kernel.org
Subject: Re: [RFC PATCH 0/4] net/io_uring: pass a kernel pointer via optlen_t
 to proto[_ops].getsockopt()

Am 31.03.25 um 23:04 schrieb Stanislav Fomichev:
> On 03/31, Stefan Metzmacher wrote:
>> The motivation for this is to remove the SOL_SOCKET limitation
>> from io_uring_cmd_getsockopt().
>>
>> The reason for this limitation is that io_uring_cmd_getsockopt()
>> passes a kernel pointer as optlen to do_sock_getsockopt()
>> and can't reach the ops->getsockopt() path.
>>
>> The first idea would be to change the optval and optlen arguments
>> to the protocol specific hooks also to sockptr_t, as that
>> is already used for setsockopt() and also by do_sock_getsockopt()
>> sk_getsockopt() and BPF_CGROUP_RUN_PROG_GETSOCKOPT().
>>
>> But as Linus don't like 'sockptr_t' I used a different approach.
>>
>> @Linus, would that optlen_t approach fit better for you?
> 
> [..]
> 
>> Instead of passing the optlen as user or kernel pointer,
>> we only ever pass a kernel pointer and do the
>> translation from/to userspace in do_sock_getsockopt().
> 
> At this point why not just fully embrace iov_iter? You have the size
> now + the user (or kernel) pointer. Might as well do
> s/sockptr_t/iov_iter/ conversion?

I think that would only be possible if we introduce
proto[_ops].getsockopt_iter() and then convert the implementations
step by step. Doing it all in one go has a lot of potential to break
the uapi. I could try to convert things like socket, ip and tcp myself, but
the rest needs to be converted by the maintainer of the specific protocol,
as it needs to be tested. As there are crazy things happening in the existing
implementations, e.g. some getsockopt() implementations use optval as in and out
buffer.

I first tried to convert both optval and optlen of getsockopt to sockptr_t,
and that showed that touching the optval part starts to get complex very soon,
see https://git.samba.org/?p=metze/linux/wip.git;a=commitdiff;h=141912166473bf8843ec6ace76dc9c6945adafd1
(note it didn't converted everything, I gave up after hitting
sctp_getsockopt_peer_addrs and sctp_getsockopt_local_addrs.
sctp_getsockopt_context, sctp_getsockopt_maxseg, sctp_getsockopt_associnfo and maybe
more are the ones also doing both copy_from_user and copy_to_user on optval)

I come also across one implementation that returned -ERANGE because *optlen was
too short and put the required length into *optlen, which means the returned
*optlen is larger than the optval buffer given from userspace.

Because of all these strange things I tried to do a minimal change
in order to get rid of the io_uring limitation and only converted
optlen and leave optval as is.

In order to have a patchset that has a low risk to cause regressions.

But as alternative introducing a prototype like this:

         int (*getsockopt_iter)(struct socket *sock, int level, int optname,
                                struct iov_iter *optval_iter);

That returns a non-negative value which can be placed into *optlen
or negative value as error and *optlen will not be changed on error.
optval_iter will get direction ITER_DEST, so it can only be written to.

Implementations could then opt in for the new interface and
allow do_sock_getsockopt() work also for the io_uring case,
while all others would still get -EOPNOTSUPP.

So what should be the way to go?

metze

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ