lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Message-ID: <lc2v5porgyzx6neimlyrpeg3d5l7trnorbs7xubqgcrlp7bbi7@yxs25wx67tm7> Date: Wed, 3 May 2023 14:09:38 +0200 From: Stefano Garzarella <sgarzare@...hat.com> To: Bobby Eshleman <bobbyeshleman@...il.com> Cc: Vishnu Dasa <vdasa@...are.com>, Wei Liu <wei.liu@...nel.org>, Bobby Eshleman <bobby.eshleman@...edance.com>, kvm@...r.kernel.org, "Michael S. Tsirkin" <mst@...hat.com>, VMware PV-Drivers Reviewers <pv-drivers@...are.com>, Dexuan Cui <decui@...rosoft.com>, Haiyang Zhang <haiyangz@...rosoft.com>, linux-kernel@...r.kernel.org, virtualization@...ts.linux-foundation.org, Bryan Tan <bryantan@...are.com>, Eric Dumazet <edumazet@...gle.com>, linux-hyperv@...r.kernel.org, Stefan Hajnoczi <stefanha@...hat.com>, netdev@...r.kernel.org, Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>, "David S. Miller" <davem@...emloft.net> Subject: Re: [PATCH RFC net-next v2 3/4] vsock: Add lockless sendmsg() support On Sat, Apr 15, 2023 at 05:29:12PM +0000, Bobby Eshleman wrote: >On Fri, Apr 28, 2023 at 12:29:50PM +0200, Stefano Garzarella wrote: >> On Sat, Apr 15, 2023 at 10:30:55AM +0000, Bobby Eshleman wrote: >> > On Wed, Apr 19, 2023 at 11:30:53AM +0200, Stefano Garzarella wrote: >> > > On Fri, Apr 14, 2023 at 12:25:59AM +0000, Bobby Eshleman wrote: >> > > > Because the dgram sendmsg() path for AF_VSOCK acquires the socket lock >> > > > it does not scale when many senders share a socket. >> > > > >> > > > Prior to this patch the socket lock is used to protect the local_addr, >> > > > remote_addr, transport, and buffer size variables. What follows are the >> > > > new protection schemes for the various protected fields that ensure a >> > > > race-free multi-sender sendmsg() path for vsock dgrams. >> > > > >> > > > - local_addr >> > > > local_addr changes as a result of binding a socket. The write path >> > > > for local_addr is bind() and various vsock_auto_bind() call sites. >> > > > After a socket has been bound via vsock_auto_bind() or bind(), subsequent >> > > > calls to bind()/vsock_auto_bind() do not write to local_addr again. bind() >> > > > rejects the user request and vsock_auto_bind() early exits. >> > > > Therefore, the local addr can not change while a parallel thread is >> > > > in sendmsg() and lock-free reads of local addr in sendmsg() are safe. >> > > > Change: only acquire lock for auto-binding as-needed in sendmsg(). >> > > > >> > > > - vsk->transport >> > > > Updated upon socket creation and it doesn't change again until the >> > > >> > > This is true only for dgram, right? >> > > >> > >> > Yes. >> > >> > > How do we decide which transport to assign for dgram? >> > > >> > >> > The transport is assigned in proto->create() [vsock_create()]. It is >> > assigned there *only* for dgrams, whereas for streams/seqpackets it is >> > assigned in connect(). vsock_create() sets transport to >> > 'transport_dgram' if sock->type == SOCK_DGRAM. >> > >> > vsock_sk_destruct() then eventually sets vsk->transport to NULL. >> > >> > Neither destruct nor create can occur in parallel with sendmsg(). >> > create() hasn't yet returned the sockfd for the user to call upon it >> > sendmsg(), and AFAICT destruct is only called after the final socket >> > reference is released, which only happens after the socket no longer >> > exists in the fd lookup table and so sendmsg() will fail before it ever >> > has the chance to race. >> >> This is okay if a socket can be assigned to a single transport, but with >> dgrams I'm not sure we can do that. >> >> Since it is not connected, a socket can send or receive packets from >> different transports, so maybe we can't assign it to a specific one, >> but do a lookup for every packets to understand which transport to use. >> > >Yes this is true, this lookup needs to be implemented... currently >sendmsg() doesn't do this at all. It grabs the remote_addr when passed >in from sendto(), but then just uses the same old transport from vsk. >You are right that sendto() should be a per-packet lookup, not a >vsk->transport read. Perhaps I should add that as another patch in this >series, and make it precede this one? Yep, I think so, we need to implement it before adding a new transport that can support dgram. > >For the send() / sendto(NULL) case where vsk->transport is being read, I >do believe this is still race-free, but... > >If we later support dynamic transports for datagram, such that >connect(VMADDR_LOCAL_CID) sets vsk->transport to transport_loopback, >connect(VMADDR_CID_HOST) sets vsk->transport to something like >transport_datagram_g2h, etc..., then vsk->transport will need to be >bundled into the RCU-protected pointer too, since it may change when >remote_addr changes. Yep, I think so. Although in vsock_dgram_connect we call lock_sock(), so maybe that could be enough to protect us. In general I think we should use vsk->transport if vsock_dgram_connect() is called, or we need to do per-packet lookup. Another think I would change, is the dgram_dequeue() callback. I would remove it, and move in the core the code in vmci_transport_dgram_dequeue() since it seems pretty generic. This should work well if every transport uses sk_receive_skb() to enqueue sk_buffs in the socket buffer. Thanks, Stefano
Powered by blists - more mailing lists