lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z+wH1oYOr1dlKeyN@gmail.com>
Date: Tue, 1 Apr 2025 08:35:50 -0700
From: Breno Leitao <leitao@...ian.org>
To: Stefan Metzmacher <metze@...ba.org>
Cc: Stanislav Fomichev <stfomichev@...il.com>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Jens Axboe <axboe@...nel.dk>,
	Pavel Begunkov <asml.silence@...il.com>,
	Jakub Kicinski <kuba@...nel.org>, Christoph Hellwig <hch@....de>,
	Karsten Keil <isdn@...ux-pingi.de>,
	Ayush Sawal <ayush.sawal@...lsio.com>,
	Andrew Lunn <andrew+netdev@...n.ch>,
	"David S. Miller" <davem@...emloft.net>,
	Eric Dumazet <edumazet@...gle.com>, Paolo Abeni <pabeni@...hat.com>,
	Simon Horman <horms@...nel.org>,
	Kuniyuki Iwashima <kuniyu@...zon.com>,
	Willem de Bruijn <willemb@...gle.com>,
	David Ahern <dsahern@...nel.org>,
	Marcelo Ricardo Leitner <marcelo.leitner@...il.com>,
	Xin Long <lucien.xin@...il.com>,
	Neal Cardwell <ncardwell@...gle.com>,
	Joerg Reuter <jreuter@...na.de>,
	Marcel Holtmann <marcel@...tmann.org>,
	Johan Hedberg <johan.hedberg@...il.com>,
	Luiz Augusto von Dentz <luiz.dentz@...il.com>,
	Oliver Hartkopp <socketcan@...tkopp.net>,
	Marc Kleine-Budde <mkl@...gutronix.de>,
	Robin van der Gracht <robin@...tonic.nl>,
	Oleksij Rempel <o.rempel@...gutronix.de>, kernel@...gutronix.de,
	Alexander Aring <alex.aring@...il.com>,
	Stefan Schmidt <stefan@...enfreihafen.org>,
	Miquel Raynal <miquel.raynal@...tlin.com>,
	Alexandra Winter <wintera@...ux.ibm.com>,
	Thorsten Winkler <twinkler@...ux.ibm.com>,
	James Chapman <jchapman@...alix.com>,
	Jeremy Kerr <jk@...econstruct.com.au>,
	Matt Johnston <matt@...econstruct.com.au>,
	Matthieu Baerts <matttbe@...nel.org>,
	Mat Martineau <martineau@...nel.org>,
	Geliang Tang <geliang@...nel.org>,
	Krzysztof Kozlowski <krzk@...nel.org>,
	Remi Denis-Courmont <courmisch@...il.com>,
	Allison Henderson <allison.henderson@...cle.com>,
	David Howells <dhowells@...hat.com>,
	Marc Dionne <marc.dionne@...istor.com>,
	Wenjia Zhang <wenjia@...ux.ibm.com>,
	Jan Karcher <jaka@...ux.ibm.com>,
	"D. Wythe" <alibuda@...ux.alibaba.com>,
	Tony Lu <tonylu@...ux.alibaba.com>,
	Wen Gu <guwen@...ux.alibaba.com>, Jon Maloy <jmaloy@...hat.com>,
	Boris Pismenny <borisp@...dia.com>,
	John Fastabend <john.fastabend@...il.com>,
	Stefano Garzarella <sgarzare@...hat.com>,
	Martin Schiller <ms@....tdt.de>,
	Björn Töpel <bjorn@...nel.org>,
	Magnus Karlsson <magnus.karlsson@...el.com>,
	Maciej Fijalkowski <maciej.fijalkowski@...el.com>,
	Jonathan Lemon <jonathan.lemon@...il.com>,
	Alexei Starovoitov <ast@...nel.org>,
	Daniel Borkmann <daniel@...earbox.net>,
	Jesper Dangaard Brouer <hawk@...nel.org>, netdev@...r.kernel.org,
	linux-kernel@...r.kernel.org, linux-sctp@...r.kernel.org,
	linux-hams@...r.kernel.org, linux-bluetooth@...r.kernel.org,
	linux-can@...r.kernel.org, dccp@...r.kernel.org,
	linux-wpan@...r.kernel.org, linux-s390@...r.kernel.org,
	mptcp@...ts.linux.dev, linux-rdma@...r.kernel.org,
	rds-devel@....oracle.com, linux-afs@...ts.infradead.org,
	tipc-discussion@...ts.sourceforge.net,
	virtualization@...ts.linux.dev, linux-x25@...r.kernel.org,
	bpf@...r.kernel.org, isdn4linux@...tserv.isdn4linux.de,
	io-uring@...r.kernel.org
Subject: Re: [RFC PATCH 0/4] net/io_uring: pass a kernel pointer via optlen_t
 to proto[_ops].getsockopt()

On Tue, Apr 01, 2025 at 03:48:58PM +0200, Stefan Metzmacher wrote:
> Am 01.04.25 um 15:37 schrieb Stefan Metzmacher:
> > Am 01.04.25 um 10:19 schrieb Stefan Metzmacher:
> > > Am 31.03.25 um 23:04 schrieb Stanislav Fomichev:
> > > > On 03/31, Stefan Metzmacher wrote:
> > > > > The motivation for this is to remove the SOL_SOCKET limitation
> > > > > from io_uring_cmd_getsockopt().
> > > > > 
> > > > > The reason for this limitation is that io_uring_cmd_getsockopt()
> > > > > passes a kernel pointer as optlen to do_sock_getsockopt()
> > > > > and can't reach the ops->getsockopt() path.
> > > > > 
> > > > > The first idea would be to change the optval and optlen arguments
> > > > > to the protocol specific hooks also to sockptr_t, as that
> > > > > is already used for setsockopt() and also by do_sock_getsockopt()
> > > > > sk_getsockopt() and BPF_CGROUP_RUN_PROG_GETSOCKOPT().
> > > > > 
> > > > > But as Linus don't like 'sockptr_t' I used a different approach.
> > > > > 
> > > > > @Linus, would that optlen_t approach fit better for you?
> > > > 
> > > > [..]
> > > > 
> > > > > Instead of passing the optlen as user or kernel pointer,
> > > > > we only ever pass a kernel pointer and do the
> > > > > translation from/to userspace in do_sock_getsockopt().
> > > > 
> > > > At this point why not just fully embrace iov_iter? You have the size
> > > > now + the user (or kernel) pointer. Might as well do
> > > > s/sockptr_t/iov_iter/ conversion?
> > > 
> > > I think that would only be possible if we introduce
> > > proto[_ops].getsockopt_iter() and then convert the implementations
> > > step by step. Doing it all in one go has a lot of potential to break
> > > the uapi. I could try to convert things like socket, ip and tcp myself, but
> > > the rest needs to be converted by the maintainer of the specific protocol,
> > > as it needs to be tested. As there are crazy things happening in the existing
> > > implementations, e.g. some getsockopt() implementations use optval as in and out
> > > buffer.
> > > 
> > > I first tried to convert both optval and optlen of getsockopt to sockptr_t,
> > > and that showed that touching the optval part starts to get complex very soon,
> > > see https://git.samba.org/?p=metze/linux/wip.git;a=commitdiff;h=141912166473bf8843ec6ace76dc9c6945adafd1
> > > (note it didn't converted everything, I gave up after hitting
> > > sctp_getsockopt_peer_addrs and sctp_getsockopt_local_addrs.
> > > sctp_getsockopt_context, sctp_getsockopt_maxseg, sctp_getsockopt_associnfo and maybe
> > > more are the ones also doing both copy_from_user and copy_to_user on optval)
> > > 
> > > I come also across one implementation that returned -ERANGE because *optlen was
> > > too short and put the required length into *optlen, which means the returned
> > > *optlen is larger than the optval buffer given from userspace.
> > > 
> > > Because of all these strange things I tried to do a minimal change
> > > in order to get rid of the io_uring limitation and only converted
> > > optlen and leave optval as is.
> > > 
> > > In order to have a patchset that has a low risk to cause regressions.
> > > 
> > > But as alternative introducing a prototype like this:
> > > 
> > >          int (*getsockopt_iter)(struct socket *sock, int level, int optname,
> > >                                 struct iov_iter *optval_iter);
> > > 
> > > That returns a non-negative value which can be placed into *optlen
> > > or negative value as error and *optlen will not be changed on error.
> > > optval_iter will get direction ITER_DEST, so it can only be written to.
> > > 
> > > Implementations could then opt in for the new interface and
> > > allow do_sock_getsockopt() work also for the io_uring case,
> > > while all others would still get -EOPNOTSUPP.
> > > 
> > > So what should be the way to go?
> > 
> > Ok, I've added the infrastructure for getsockopt_iter, see below,
> > but the first part I wanted to convert was
> > tcp_ao_copy_mkts_to_user() and that also reads from userspace before
> > writing.
> > 
> > So we could go with the optlen_t approach, or we need
> > logic for ITER_BOTH or pass two iov_iters one with ITER_SRC and one
> > with ITER_DEST...
> > 
> > So who wants to decide?
> 
> I just noticed that it's even possible in same cases
> to pass in a short buffer to optval, but have a longer value in optlen,
> hci_sock_getsockopt() with SOL_BLUETOOTH completely ignores optlen.
> 
> This makes it really hard to believe that trying to use iov_iter for this
> is a good idea :-(

That was my finding as well a while ago, when I was planning to get the
__user pointers converted to iov_iter. There are some weird ways of
using optlen and optval, which makes them non-trivial to covert to
iov_iter.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ