netdev - Re: [PATCH net 2/2] vsock/test: Add test for SO

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aa00af3b-2bb1-4c09-8222-edeec0520ae1@rbox.co>
Date: Tue, 25 Mar 2025 14:22:45 +0100
From: Michal Luczaj <mhal@...x.co>
To: Stefano Garzarella <sgarzare@...hat.com>
Cc: "David S. Miller" <davem@...emloft.net>,
 Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>,
 Paolo Abeni <pabeni@...hat.com>, Simon Horman <horms@...nel.org>,
 netdev@...r.kernel.org
Subject: Re: [PATCH net 2/2] vsock/test: Add test for SO_LINGER null ptr deref

On 3/20/25 12:31, Stefano Garzarella wrote:
> On Fri, Mar 14, 2025 at 04:25:16PM +0100, Michal Luczaj wrote:
>> On 3/10/25 16:24, Stefano Garzarella wrote:
>>> On Fri, Mar 07, 2025 at 10:49:52AM +0100, Michal Luczaj wrote:
>>>> ...
>>>> I've tried modifying the loop to make close()/shutdown() linger until
>>>> unsent_bytes() == 0. No idea if this is acceptable:
>>>
>>> Yes, that's a good idea, I had something similar in mind, but reusing
>>> unsent_bytes() sounds great to me.
>>>
>>> The only problem I see is that in the driver in the guest, the packets
>>> are put in the virtqueue and the variable is decremented only when the
>>> host sends us an interrupt to say that it has copied the packets and
>>> then the guest can free the buffer. Is this okay to consider this as
>>> sending?
>>>
>>> I think so, though it's honestly not clear to me if instead by sending
>>> we should consider when the driver copies the bytes into the virtqueue,
>>> but that doesn't mean they were really sent. We should compare it to
>>> what the network devices or AF_UNIX do.
>>
>> I had a look at AF_UNIX. SO_LINGER is not supported. Which makes sense;
>> when you send a packet, it directly lands in receiver's queue. As for
>> SIOCOUTQ handling: `return sk_wmem_alloc_get(sk)`. So I guess it's more of
>> an "unread bytes"?
> 
> Yes, I see, actually for AF_UNIX it is simple.
> It's hard for us to tell when the user on the other pear actually read
> the data, we could use the credit mechanism, but that sometimes isn't
> sent unless explicitly requested, so I'd say unsent_bytes() is fine.

One more option: keep the semantics (in a state of not-what-`man 7 socket`-
says) and, for completeness, add the lingering to shutdown()?

>>>> ...
>>>> This works, but I find it difficult to test without artificially slowing
>>>> the kernel down. It's a race against workers as they quite eagerly do
>>>> virtio_transport_consume_skb_sent(), which decrements vvs->bytes_unsent.
>>>> I've tried reducing SO_VM_SOCKETS_BUFFER_SIZE as you've suggested, but
>>>> send() would just block until peer had available space.
>>>
>>> Did you test with loopback or virtio-vsock with a VM?
>>
>> Both, but I may be missing something. Do you see a way to stop (or don't
>> schedule) the worker from processing queue (and decrementing bytes_unsent)?
> 
> Without touching the driver (which I don't want to do) I can't think of
> anything, so I'd say it's okay.

Turns out there's a way to purge the loopback queue before worker processes
it (I had no success with g2h). If you win that race, bytes_unsent stays
elevated until kingdom come. Then you can close() the socket and watch as
it lingers.

connect(s)
  lock_sock
  while (sk_state != TCP_ESTABLISHED)
    release_sock
    schedule_timeout

// virtio_transport_recv_connecting
//   sk_state = TCP_ESTABLISHED

                                       send(s, 'x')
                                         lock_sock
                                         virtio_transport_send_pkt_info
                                           virtio_transport_get_credit
                                    (!)      vvs->bytes_unsent += ret
                                           vsock_loopback_send_pkt
                                             virtio_vsock_skb_queue_tail
                                         release_sock
                                       kill()
    lock_sock
    if signal_pending
      vsock_loopback_cancel_pkt
        virtio_transport_purge_skbs (!)

That said, I may be missing a bigger picture, but is it worth supporting
this "signal disconnects TCP_ESTABLISHED" behaviour in the first place?
Removing it would make the race above (and the whole [1] series) moot.
Plus, it appears to be broken: when I hit this condition and I try to
re-connect to the same listener, I get ETIMEDOUT for loopback and
ECONNRESET for g2h virtio; see [2].

[1]: https://lore.kernel.org/netdev/20250317-vsock-trans-signal-race-v4-0-fc8837f3f1d4@rbox.co/
[2]: Inspired by Luigi's code, which I mauled terribly:
diff --git a/tools/testing/vsock/vsock_test.c b/tools/testing/vsock/vsock_test.c
index d0f6d253ac72..aa4a321ddd9c 100644
--- a/tools/testing/vsock/vsock_test.c
+++ b/tools/testing/vsock/vsock_test.c
@@ -23,6 +23,7 @@
 #include <sys/ioctl.h>
 #include <linux/sockios.h>
 #include <linux/time64.h>
+#include <pthread.h>
 
 #include "vsock_test_zerocopy.h"
 #include "timeout.h"
@@ -1824,6 +1825,104 @@ static void test_stream_linger_server(const struct test_opts *opts)
 	close(fd);
 }
 
+static void handler(int signum)
+{
+	/* nop */
+}
+
+static void *killer(void *arg)
+{
+	pid_t pid = getpid();
+
+	if ((errno = pthread_setcanceltype(PTHREAD_CANCEL_ASYNCHRONOUS, NULL))) {
+		perror("pthread_setcanceltype");
+		exit(EXIT_FAILURE);
+	}
+
+	for (;;) {
+		if (kill(pid, SIGUSR1)) {
+			perror("kill");
+			exit(EXIT_FAILURE);
+		}
+	}
+
+	return NULL;
+}
+
+static void client(const struct test_opts *opts)
+{
+	struct sockaddr_vm addr = {
+		.svm_family = AF_VSOCK,
+		.svm_cid = opts->peer_cid,
+		.svm_port = opts->peer_port,
+	};
+	sighandler_t old_handler;
+	bool reconnect = false;
+	pthread_t tid;
+	time_t tout;
+	int c;
+
+	old_handler = signal(SIGUSR1, handler);
+	if (old_handler == SIG_ERR) {
+		perror("signal");
+		exit(EXIT_FAILURE);
+	}
+
+	if ((errno = pthread_create(&tid, NULL, killer, NULL))) {
+		perror("pthread_create");
+		exit(EXIT_FAILURE);
+	}
+
+	tout = current_nsec() + 2 * NSEC_PER_SEC;
+	do {
+		c = socket(AF_VSOCK, SOCK_STREAM, 0);
+		if (c < 0) {
+			perror("socket");
+			exit(EXIT_FAILURE);
+		}
+
+		if (connect(c, (struct sockaddr *)&addr, sizeof(addr)) &&
+		    errno == EINTR) {
+			reconnect = true;
+			break;
+		}
+
+		close(c);
+	} while (current_nsec() < tout);
+
+	if ((errno = pthread_cancel(tid))) {
+		perror("pthread_cancel");
+		exit(EXIT_FAILURE);
+	}
+
+	if ((errno = pthread_join(tid, NULL))) {
+		perror("pthread_join");
+		exit(EXIT_FAILURE);
+	}
+
+	if (signal(SIGUSR1, old_handler) == SIG_ERR) {
+		perror("signal");
+		exit(EXIT_FAILURE);
+	}
+
+	if (reconnect) {
+		if (connect(c, (struct sockaddr *)&addr, sizeof(addr))) {
+			perror("re-connect() after EINTR");
+			exit(EXIT_FAILURE);
+		}
+		close(c);
+	}
+
+	control_writeln("DONE");
+}
+
+static void server(const struct test_opts *opts)
+{
+	int s = vsock_stream_listen(VMADDR_CID_ANY, opts->peer_port);
+	control_expectln("DONE");
+	close(s);
+}
+
 static struct test_case test_cases[] = {
 	{
 		.name = "SOCK_STREAM connection reset",
@@ -1984,6 +2083,11 @@ static struct test_case test_cases[] = {
 		.run_client = test_stream_linger_client,
 		.run_server = test_stream_linger_server,
 	},
+	{
+		.name = "SOCK_STREAM connect -> EINTR -> connect",
+		.run_client = client,
+		.run_server = server,
+	},
 	{},
 };