netdev - Re: [PATCH net 2/2] vsock/test: Add test for SO

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <kiz4tjwsvauyupixpccqug5wt7tq7g3mld5yy5drpg5zxkmiap@3z625aedysx7>
Date: Fri, 11 Apr 2025 15:21:40 +0200
From: Stefano Garzarella <sgarzare@...hat.com>
To: Michal Luczaj <mhal@...x.co>
Cc: "David S. Miller" <davem@...emloft.net>, 
	Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>, 
	Paolo Abeni <pabeni@...hat.com>, Simon Horman <horms@...nel.org>, netdev@...r.kernel.org
Subject: Re: [PATCH net 2/2] vsock/test: Add test for SO_LINGER null ptr deref

On Fri, Apr 04, 2025 at 12:06:36AM +0200, Michal Luczaj wrote:
>On 4/1/25 12:32, Stefano Garzarella wrote:
>> On Tue, Mar 25, 2025 at 02:22:45PM +0100, Michal Luczaj wrote:
>>> On 3/20/25 12:31, Stefano Garzarella wrote:
>>>> On Fri, Mar 14, 2025 at 04:25:16PM +0100, Michal Luczaj wrote:
>>>>> On 3/10/25 16:24, Stefano Garzarella wrote:
>>>>>> On Fri, Mar 07, 2025 at 10:49:52AM +0100, Michal Luczaj wrote:
>>>>>>> ...
>>>>>>> I've tried modifying the loop to make close()/shutdown() linger until
>>>>>>> unsent_bytes() == 0. No idea if this is acceptable:
>>>>>>
>>>>>> Yes, that's a good idea, I had something similar in mind, but reusing
>>>>>> unsent_bytes() sounds great to me.
>>>>>>
>>>>>> The only problem I see is that in the driver in the guest, the packets
>>>>>> are put in the virtqueue and the variable is decremented only when the
>>>>>> host sends us an interrupt to say that it has copied the packets and
>>>>>> then the guest can free the buffer. Is this okay to consider this as
>>>>>> sending?
>>>>>>
>>>>>> I think so, though it's honestly not clear to me if instead by sending
>>>>>> we should consider when the driver copies the bytes into the virtqueue,
>>>>>> but that doesn't mean they were really sent. We should compare it to
>>>>>> what the network devices or AF_UNIX do.
>>>>>
>>>>> I had a look at AF_UNIX. SO_LINGER is not supported. Which makes sense;
>>>>> when you send a packet, it directly lands in receiver's queue. As for
>>>>> SIOCOUTQ handling: `return sk_wmem_alloc_get(sk)`. So I guess it's more of
>>>>> an "unread bytes"?
>>>>
>>>> Yes, I see, actually for AF_UNIX it is simple.
>>>> It's hard for us to tell when the user on the other pear actually read
>>>> the data, we could use the credit mechanism, but that sometimes isn't
>>>> sent unless explicitly requested, so I'd say unsent_bytes() is fine.
>>>
>>> One more option: keep the semantics (in a state of not-what-`man 7 socket`-
>>> says) and, for completeness, add the lingering to shutdown()?
>>
>> Sorry, I'm getting lost!
>> That's because we had a different behavior between close() and
>> shutdown() right?
>>
>> If it's the case, I would say let's fix at least that for now.
>
>Yeah, okay, let's keep things simple. I'll post the patch once net-next opens.
>
>>>>>>> ...
>>>>>>> This works, but I find it difficult to test without artificially slowing
>>>>>>> the kernel down. It's a race against workers as they quite eagerly do
>>>>>>> virtio_transport_consume_skb_sent(), which decrements vvs->bytes_unsent.
>>>>>>> I've tried reducing SO_VM_SOCKETS_BUFFER_SIZE as you've suggested, but
>>>>>>> send() would just block until peer had available space.
>>>>>>
>>>>>> Did you test with loopback or virtio-vsock with a VM?
>>>>>
>>>>> Both, but I may be missing something. Do you see a way to stop (or don't
>>>>> schedule) the worker from processing queue (and decrementing bytes_unsent)?
>>>>
>>>> Without touching the driver (which I don't want to do) I can't think of
>>>> anything, so I'd say it's okay.
>>>
>>> Turns out there's a way to purge the loopback queue before worker processes
>>> it (I had no success with g2h). If you win that race, bytes_unsent stays
>>> elevated until kingdom come. Then you can close() the socket and watch as
>>> it lingers.
>>>
>>> connect(s)
>>>  lock_sock
>>>  while (sk_state != TCP_ESTABLISHED)
>>>    release_sock
>>>    schedule_timeout
>>>
>>> // virtio_transport_recv_connecting
>>> //   sk_state = TCP_ESTABLISHED
>>>
>>>                                       send(s, 'x')
>>>                                         lock_sock
>>>                                         virtio_transport_send_pkt_info
>>>                                           virtio_transport_get_credit
>>>                                    (!)      vvs->bytes_unsent += ret
>>>                                           vsock_loopback_send_pkt
>>>                                             virtio_vsock_skb_queue_tail
>>>                                         release_sock
>>>                                       kill()
>>>    lock_sock
>>>    if signal_pending
>>>      vsock_loopback_cancel_pkt
>>>        virtio_transport_purge_skbs (!)
>>>
>>> That said, I may be missing a bigger picture, but is it worth supporting
>>> this "signal disconnects TCP_ESTABLISHED" behaviour in the first place?
>>
>> Can you elaborate a bit?
>
>There isn't much to it. I just wondered if connect() -- that has already
>established a connection -- could ignore the signal (or pretend it came too
>late), to avoid carrying out this kind of disconnect.

Okay, I see now!

Yeah, I think after `schedule_timeout()`, if `sk->sk_state == 
TCP_ESTABLISHED` we should just exit from the while() and return a
succesful connection IMHO, as I fixed for closing socket.

Maybe we should check what we do in other cases such as AF_UNIX, 
AF_INET.


>
>>> Removing it would make the race above (and the whole [1] series) moot.
>>> Plus, it appears to be broken: when I hit this condition and I try to
>>> re-connect to the same listener, I get ETIMEDOUT for loopback and
>>> ECONNRESET for g2h virtio; see [2].
>>
>> Could this be related to the fix I sent some days ago?
>> https://lore.kernel.org/netdev/20250328141528.420719-1-sgarzare@redhat.com/
>
>I've tried that. I've also took a hint from your other mail and attempted
>flushing the listener queue, but to no avail. Crude code below. Is there
>something wrong with it?

I can't see anything wrong, but I'm starting to get confused :-(
we're talking about too many things in the same thread. What issues do 
you want to highlight?

Thanks,
Stefano

>
>#include <stdio.h>
>#include <errno.h>
>#include <stdlib.h>
>#include <unistd.h>
>#include <signal.h>
>#include <pthread.h>
>#include <sys/socket.h>
>#include <linux/vm_sockets.h>
>
>static void die(const char *msg)
>{
>	perror(msg);
>	exit(-1);
>}
>
>static void barrier(pthread_barrier_t *barr)
>{
>	errno = pthread_barrier_wait(barr);
>	if (errno && errno != PTHREAD_BARRIER_SERIAL_THREAD)
>		die("pthread_barrier_wait");
>}
>
>static void flush_accept(int s)
>{
>	int p = accept(s, NULL, NULL);
>	if (p < 0) {
>		if (errno != EAGAIN)
>			perror("accept");
>		return;
>	}
>
>	printf("accept: drained\n");
>	close(p);
>}
>
>static void handler(int signum)
>{
>	/* nop */
>}
>
>void static set_accept_timeout(int s)
>{
>	struct timeval tv = { .tv_sec = 1 };
>	if (setsockopt(s, SOL_SOCKET, SO_RCVTIMEO, &tv, sizeof(tv)))
>		die("setsockopt SO_RCVTIMEO");
>}
>
>void static set_connect_timeout(int s)
>{
>	struct timeval tv = { .tv_sec = 1 };
>	if (setsockopt(s, AF_VSOCK, SO_VM_SOCKETS_CONNECT_TIMEOUT, &tv,
>		       sizeof(tv)))
>		die("setsockopt SO_VM_SOCKETS_CONNECT_TIMEOUT");
>}
>
>static void *killer(void *arg)
>{
>	pthread_barrier_t *barr = (pthread_barrier_t *)arg;
>	pid_t pid = getpid();
>
>	if ((errno = pthread_setcanceltype(PTHREAD_CANCEL_ASYNCHRONOUS,
>					   NULL)))
>		die("pthread_setcanceltype");
>
>	for (;;) {
>		barrier(barr);
>		if (kill(pid, SIGUSR1))
>			die("kill");
>		barrier(barr);
>	}
>
>	return NULL;
>}
>
>int main(void)
>{
>	struct sockaddr_vm addr = {
>		.svm_family = AF_VSOCK,
>		.svm_cid = VMADDR_CID_LOCAL,
>		.svm_port = 1234
>	};
>	socklen_t alen = sizeof(addr);
>	pthread_barrier_t barr;
>	pthread_t tid;
>	int s, c;
>
>	if ((errno = pthread_barrier_init(&barr, NULL, 2)))
>		die("pthread_barrier_init");
>
>	if (signal(SIGUSR1, handler) == SIG_ERR)
>		die("signal");
>
>	s = socket(AF_VSOCK, SOCK_STREAM, 0);
>	if (s < 0)
>		die("socket s");
>	set_accept_timeout(s);
>
>	if (bind(s, (struct sockaddr *)&addr, alen))
>		die("bind");
>
>	if (listen(s, 64))
>		die("listen");
>
>	if ((errno = pthread_create(&tid, NULL, killer, &barr)))
>		die("pthread_create");
>
>	for (;;) {
>		c = socket(AF_VSOCK, SOCK_STREAM, 0);
>		if (c < 0)
>			die("socket c");
>
>		barrier(&barr);
>		if (connect(c, (struct sockaddr *)&addr, sizeof(addr)) &&
>		    errno == EINTR) {
>		    	printf("connect: EINTR\n");
>			break;
>		}
>		barrier(&barr);
>
>		close(c);
>		flush_accept(s);
>	}
>
>	if ((errno = pthread_cancel(tid)))
>		die("pthread_cancel");
>
>	if ((errno = pthread_join(tid, NULL)))
>		die("pthread_join");
>
>	flush_accept(s);
>	set_connect_timeout(c);
>	if (connect(c, (struct sockaddr *)&addr, sizeof(addr)))
>		die("re-connect");
>
>	printf("okay?\n");
>
>	return 0;
>}
>