lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <d78bbd0c-5a56-4a5c-be84-567d98aa281e@rbox.co>
Date: Thu, 20 Mar 2025 23:16:36 +0100
From: Michal Luczaj <mhal@...x.co>
To: Cong Wang <xiyou.wangcong@...il.com>
Cc: Stefano Garzarella <sgarzare@...hat.com>,
 "David S. Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>,
 Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
 Simon Horman <horms@...nel.org>, "Michael S. Tsirkin" <mst@...hat.com>,
 Bobby Eshleman <bobby.eshleman@...edance.com>,
 Andrii Nakryiko <andrii@...nel.org>, Eduard Zingerman <eddyz87@...il.com>,
 Mykola Lysenko <mykolal@...com>, Alexei Starovoitov <ast@...nel.org>,
 Daniel Borkmann <daniel@...earbox.net>,
 Martin KaFai Lau <martin.lau@...ux.dev>, Song Liu <song@...nel.org>,
 Yonghong Song <yonghong.song@...ux.dev>,
 John Fastabend <john.fastabend@...il.com>, KP Singh <kpsingh@...nel.org>,
 Stanislav Fomichev <sdf@...ichev.me>, Hao Luo <haoluo@...gle.com>,
 Jiri Olsa <jolsa@...nel.org>, Shuah Khan <shuah@...nel.org>,
 netdev@...r.kernel.org, bpf@...r.kernel.org, virtualization@...ts.linux.dev,
 linux-kernel@...r.kernel.org, linux-kselftest@...r.kernel.org
Subject: Re: [PATCH net v4 3/3] vsock/bpf: Fix bpf recvmsg() racing transport
 reassignment

On 3/20/25 21:54, Cong Wang wrote:
> On Thu, Mar 20, 2025 at 01:05:27PM +0100, Michal Luczaj wrote:
>> On 3/19/25 23:18, Cong Wang wrote:
>>> On Mon, Mar 17, 2025 at 10:52:25AM +0100, Michal Luczaj wrote:
>>>> Signal delivery during connect() may lead to a disconnect of an already
>>>> established socket. That involves removing socket from any sockmap and
>>>> resetting state to SS_UNCONNECTED. While it correctly restores socket's
>>>> proto, a call to vsock_bpf_recvmsg() might have been already under way in
>>>> another thread. If the connect()ing thread reassigns the vsock transport to
>>>> NULL, the recvmsg()ing thread may trigger a WARN_ON_ONCE.
>>>>
>>
>>    *THREAD 1*                      *THREAD 2*
>>
>>>> connect
>>>>   / state = SS_CONNECTED /
>>>>                                 sock_map_update_elem
>>>>                                 vsock_bpf_recvmsg
>>>>                                   psock = sk_psock_get()
>>>>   lock sk
>>>>   if signal_pending
>>>>     unhash
>>>>       sock_map_remove_links
>>>
>>> So vsock's ->recvmsg() should be restored after this, right? Then how is
>>> vsock_bpf_recvmsg() called afterward?
>>
>> I'm not sure I understand the question, so I've added a header above: those
>> are 2 parallel flows of execution. vsock_bpf_recvmsg() wasn't called
>> afterwards. It was called before sock_map_remove_links(). Note that at the
>> time of sock_map_remove_links() (in T1), vsock_bpf_recvmsg() is still
>> executing (in T2).
> 
> I thought the above vsock_bpf_recvmsg() on the right side completed
> before sock_map_remove_links(), sorry for the confusion.

No problem, I see why you've might. Perhaps deeper indentation would make
things clearer.

>>>>     state = SS_UNCONNECTED
>>>>   release sk
>>>>
>>>> connect
>>>>   transport = NULL
>>>>                                   lock sk
>>>>                                   WARN_ON_ONCE(!vsk->transport)
>>>>
>>>
>>> And I am wondering why we need to WARN here since we can handle this error
>>> case correctly?
>>
>> The WARN and transport check are here for defensive measures, and to state
>> a contract.
>>
>> But I think I get your point. If we accept for a fact of life that BPF code
>> should be able to handle transport disappearing - then WARN can be removed
>> (while keeping the check) and this patch can be dropped.
> 
> I am thinking whether we have more elegant way to handle this case,
> WARN looks not pretty.

Since the case should never happen, I like to think of WARN as a deliberate
eyesore :)

>> My aim, instead, was to keep things consistent. By which I mean sticking to
>> the conditions expressed in vsock_bpf_update_proto() as invariants; so that
>> vsock with a psock is guaranteed to have transport assigned.
> 
> Other than the WARN, I am also concerned about locking vsock_bpf_recvmsg()
> because for example UDP is (almost) lockless, so enforcing the sock lock
> for all vsock types looks not flexible and may hurt performance.
>
> Maybe it is time to let vsock_bpf_rebuild_protos() build different hooks
> for different struct proto (as we did for TCP/UDP)?

By UDP you mean vsock SOCK_DGRAM? No need to worry. VMCI is the only
transport that features VSOCK_TRANSPORT_F_DGRAM, but it does not
implemented read_skb() callback, making it unsupported by BPF/sockmap.


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ