netdev - Re: [PATCH net 2/2] vsock/test: Add test for SO

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7566fe52-23b7-46cc-95ef-63cbbd3071a1@rbox.co>
Date: Fri, 4 Apr 2025 00:06:36 +0200
From: Michal Luczaj <mhal@...x.co>
To: Stefano Garzarella <sgarzare@...hat.com>
Cc: "David S. Miller" <davem@...emloft.net>,
 Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>,
 Paolo Abeni <pabeni@...hat.com>, Simon Horman <horms@...nel.org>,
 netdev@...r.kernel.org
Subject: Re: [PATCH net 2/2] vsock/test: Add test for SO_LINGER null ptr deref

On 4/1/25 12:32, Stefano Garzarella wrote:
> On Tue, Mar 25, 2025 at 02:22:45PM +0100, Michal Luczaj wrote:
>> On 3/20/25 12:31, Stefano Garzarella wrote:
>>> On Fri, Mar 14, 2025 at 04:25:16PM +0100, Michal Luczaj wrote:
>>>> On 3/10/25 16:24, Stefano Garzarella wrote:
>>>>> On Fri, Mar 07, 2025 at 10:49:52AM +0100, Michal Luczaj wrote:
>>>>>> ...
>>>>>> I've tried modifying the loop to make close()/shutdown() linger until
>>>>>> unsent_bytes() == 0. No idea if this is acceptable:
>>>>>
>>>>> Yes, that's a good idea, I had something similar in mind, but reusing
>>>>> unsent_bytes() sounds great to me.
>>>>>
>>>>> The only problem I see is that in the driver in the guest, the packets
>>>>> are put in the virtqueue and the variable is decremented only when the
>>>>> host sends us an interrupt to say that it has copied the packets and
>>>>> then the guest can free the buffer. Is this okay to consider this as
>>>>> sending?
>>>>>
>>>>> I think so, though it's honestly not clear to me if instead by sending
>>>>> we should consider when the driver copies the bytes into the virtqueue,
>>>>> but that doesn't mean they were really sent. We should compare it to
>>>>> what the network devices or AF_UNIX do.
>>>>
>>>> I had a look at AF_UNIX. SO_LINGER is not supported. Which makes sense;
>>>> when you send a packet, it directly lands in receiver's queue. As for
>>>> SIOCOUTQ handling: `return sk_wmem_alloc_get(sk)`. So I guess it's more of
>>>> an "unread bytes"?
>>>
>>> Yes, I see, actually for AF_UNIX it is simple.
>>> It's hard for us to tell when the user on the other pear actually read
>>> the data, we could use the credit mechanism, but that sometimes isn't
>>> sent unless explicitly requested, so I'd say unsent_bytes() is fine.
>>
>> One more option: keep the semantics (in a state of not-what-`man 7 socket`-
>> says) and, for completeness, add the lingering to shutdown()?
> 
> Sorry, I'm getting lost!
> That's because we had a different behavior between close() and 
> shutdown() right?
> 
> If it's the case, I would say let's fix at least that for now.

Yeah, okay, let's keep things simple. I'll post the patch once net-next opens.

>>>>>> ...
>>>>>> This works, but I find it difficult to test without artificially slowing
>>>>>> the kernel down. It's a race against workers as they quite eagerly do
>>>>>> virtio_transport_consume_skb_sent(), which decrements vvs->bytes_unsent.
>>>>>> I've tried reducing SO_VM_SOCKETS_BUFFER_SIZE as you've suggested, but
>>>>>> send() would just block until peer had available space.
>>>>>
>>>>> Did you test with loopback or virtio-vsock with a VM?
>>>>
>>>> Both, but I may be missing something. Do you see a way to stop (or don't
>>>> schedule) the worker from processing queue (and decrementing bytes_unsent)?
>>>
>>> Without touching the driver (which I don't want to do) I can't think of
>>> anything, so I'd say it's okay.
>>
>> Turns out there's a way to purge the loopback queue before worker processes
>> it (I had no success with g2h). If you win that race, bytes_unsent stays
>> elevated until kingdom come. Then you can close() the socket and watch as
>> it lingers.
>>
>> connect(s)
>>  lock_sock
>>  while (sk_state != TCP_ESTABLISHED)
>>    release_sock
>>    schedule_timeout
>>
>> // virtio_transport_recv_connecting
>> //   sk_state = TCP_ESTABLISHED
>>
>>                                       send(s, 'x')
>>                                         lock_sock
>>                                         virtio_transport_send_pkt_info
>>                                           virtio_transport_get_credit
>>                                    (!)      vvs->bytes_unsent += ret
>>                                           vsock_loopback_send_pkt
>>                                             virtio_vsock_skb_queue_tail
>>                                         release_sock
>>                                       kill()
>>    lock_sock
>>    if signal_pending
>>      vsock_loopback_cancel_pkt
>>        virtio_transport_purge_skbs (!)
>>
>> That said, I may be missing a bigger picture, but is it worth supporting
>> this "signal disconnects TCP_ESTABLISHED" behaviour in the first place?
> 
> Can you elaborate a bit?

There isn't much to it. I just wondered if connect() -- that has already
established a connection -- could ignore the signal (or pretend it came too
late), to avoid carrying out this kind of disconnect.

>> Removing it would make the race above (and the whole [1] series) moot.
>> Plus, it appears to be broken: when I hit this condition and I try to
>> re-connect to the same listener, I get ETIMEDOUT for loopback and
>> ECONNRESET for g2h virtio; see [2].
> 
> Could this be related to the fix I sent some days ago?
> https://lore.kernel.org/netdev/20250328141528.420719-1-sgarzare@redhat.com/

I've tried that. I've also took a hint from your other mail and attempted
flushing the listener queue, but to no avail. Crude code below. Is there
something wrong with it?

#include <stdio.h>
#include <errno.h>
#include <stdlib.h>
#include <unistd.h>
#include <signal.h>
#include <pthread.h>
#include <sys/socket.h>
#include <linux/vm_sockets.h>

static void die(const char *msg)
{
	perror(msg);
	exit(-1);
}

static void barrier(pthread_barrier_t *barr)
{
	errno = pthread_barrier_wait(barr);
	if (errno && errno != PTHREAD_BARRIER_SERIAL_THREAD)
		die("pthread_barrier_wait");
}

static void flush_accept(int s)
{
	int p = accept(s, NULL, NULL);
	if (p < 0) {
		if (errno != EAGAIN)
			perror("accept");
		return;
	}

	printf("accept: drained\n");
	close(p);
}

static void handler(int signum)
{
	/* nop */
}

void static set_accept_timeout(int s)
{
	struct timeval tv = { .tv_sec = 1 };
	if (setsockopt(s, SOL_SOCKET, SO_RCVTIMEO, &tv, sizeof(tv)))
		die("setsockopt SO_RCVTIMEO");
}

void static set_connect_timeout(int s)
{
	struct timeval tv = { .tv_sec = 1 };
	if (setsockopt(s, AF_VSOCK, SO_VM_SOCKETS_CONNECT_TIMEOUT, &tv,
		       sizeof(tv)))
		die("setsockopt SO_VM_SOCKETS_CONNECT_TIMEOUT");
}

static void *killer(void *arg)
{
	pthread_barrier_t *barr = (pthread_barrier_t *)arg;
	pid_t pid = getpid();

	if ((errno = pthread_setcanceltype(PTHREAD_CANCEL_ASYNCHRONOUS,
					   NULL)))
		die("pthread_setcanceltype");

	for (;;) {
		barrier(barr);
		if (kill(pid, SIGUSR1))
			die("kill");
		barrier(barr);
	}

	return NULL;
}

int main(void)
{
	struct sockaddr_vm addr = {
		.svm_family = AF_VSOCK,
		.svm_cid = VMADDR_CID_LOCAL,
		.svm_port = 1234
	};
	socklen_t alen = sizeof(addr);
	pthread_barrier_t barr;
	pthread_t tid;
	int s, c;

	if ((errno = pthread_barrier_init(&barr, NULL, 2)))
		die("pthread_barrier_init");

	if (signal(SIGUSR1, handler) == SIG_ERR)
		die("signal");

	s = socket(AF_VSOCK, SOCK_STREAM, 0);
	if (s < 0)
		die("socket s");
	set_accept_timeout(s);

	if (bind(s, (struct sockaddr *)&addr, alen))
		die("bind");

	if (listen(s, 64))
		die("listen");

	if ((errno = pthread_create(&tid, NULL, killer, &barr)))
		die("pthread_create");

	for (;;) {
		c = socket(AF_VSOCK, SOCK_STREAM, 0);
		if (c < 0)
			die("socket c");

		barrier(&barr);
		if (connect(c, (struct sockaddr *)&addr, sizeof(addr)) &&
		    errno == EINTR) {
		    	printf("connect: EINTR\n");
			break;
		}
		barrier(&barr);

		close(c);
		flush_accept(s);
	}

	if ((errno = pthread_cancel(tid)))
		die("pthread_cancel");

	if ((errno = pthread_join(tid, NULL)))
		die("pthread_join");

	flush_accept(s);
	set_connect_timeout(c);
	if (connect(c, (struct sockaddr *)&addr, sizeof(addr)))
		die("re-connect");

	printf("okay?\n");

	return 0;
}