linux-kernel - Re: [PATCH net-next v3] vsock/test: Add test for null ptr deref when transport changes

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <pjwrj4pnl3jheypjbkcopjv7uilcdiog3pxb3m57zyeq47sc22@7sl6otxuims2>
Date: Mon, 16 Jun 2025 16:23:40 +0200
From: Luigi Leonardi <leonardi@...hat.com>
To: Stefano Garzarella <sgarzare@...hat.com>
Cc: Michal Luczaj <mhal@...x.co>, virtualization@...ts.linux.dev, 
	netdev@...r.kernel.org, linux-kernel@...r.kernel.org, Hyunwoo Kim <v4bel@...ori.io>
Subject: Re: [PATCH net-next v3] vsock/test: Add test for null ptr deref when
 transport changes

Hi Stefano,

On Wed, Jun 11, 2025 at 04:53:11PM +0200, Stefano Garzarella wrote:
>On Wed, Jun 11, 2025 at 04:07:25PM +0200, Luigi Leonardi wrote:
>>Add a new test to ensure that when the transport changes a null pointer
>>dereference does not occur. The bug was reported upstream [1] and fixed
>>with commit 2cb7c756f605 ("vsock/virtio: discard packets if the
>>transport changes").
>>
>>KASAN: null-ptr-deref in range [0x0000000000000060-0x0000000000000067]
>>CPU: 2 UID: 0 PID: 463 Comm: kworker/2:3 Not tainted
>>Workqueue: vsock-loopback vsock_loopback_work
>>RIP: 0010:vsock_stream_has_data+0x44/0x70
>>Call Trace:
>>virtio_transport_do_close+0x68/0x1a0
>>virtio_transport_recv_pkt+0x1045/0x2ae4
>>vsock_loopback_work+0x27d/0x3f0
>>process_one_work+0x846/0x1420
>>worker_thread+0x5b3/0xf80
>>kthread+0x35a/0x700
>>ret_from_fork+0x2d/0x70
>>ret_from_fork_asm+0x1a/0x30
>>
>>Note that this test may not fail in a kernel without the fix, but it may
>>hang on the client side if it triggers a kernel oops.
>>
>>This works by creating a socket, trying to connect to a server, and then
>>executing a second connect operation on the same socket but to a
>>different CID (0). This triggers a transport change. If the connect
>>operation is interrupted by a signal, this could cause a null-ptr-deref.
>>
>>Since this bug is non-deterministic, we need to try several times. It
>>is reasonable to assume that the bug will show up within the timeout
>>period.
>>
>>If there is a G2H transport loaded in the system, the bug is not
>>triggered and this test will always pass.
>
>Should we re-use what Michal is doing in https://lore.kernel.org/virtualization/20250528-vsock-test-inc-cov-v2-0-8f655b40d57c@rbox.co/
>to print a warning?

Yes, good idea! IMHO we can skip the test if a G2H transport is loaded.  
I'll wait for Michal's patch to be merged before sending v4.
>
>>
>>[1]https://lore.kernel.org/netdev/Z2LvdTTQR7dBmPb5@v4bel-B760M-AORUS-ELITE-AX/
>>
>>Suggested-by: Hyunwoo Kim <v4bel@...ori.io>
>>
>>#include "vsock_test_zerocopy.h"
>>#include "timeout.h"
>>@@ -1811,6 +1813,168 @@ static void test_stream_connect_retry_server(const struct test_opts *opts)
>>	close(fd);
>>+		/* Set CID to 0 cause a transport change. */
>>+		sa.svm_cid = 0;
>>+		/* This connect must fail. No-one listening on CID 0
>>+		 * This connect can also be interrupted, ignore this error.
>>+		 */
>>+		ret = connect(s, (struct sockaddr *)&sa, sizeof(sa));
>>+		if (ret != -1 && errno != EINTR) {
>
>Should this condition be `ret != -1 || errno != EINTR` ?
>
Right, good catch.
>
>>+			fprintf(stderr,
>>+				"connect: expected a failure because of unused CID: %d\n", errno);
>>+			exit(EXIT_FAILURE);
>>+		}
>>+
>>+		close(s);
>>+
>>+		control_writeulong(CONTROL_CONTINUE);
>>+
>>+	} while (current_nsec() < tout);
>>+
>>+	control_writeulong(CONTROL_DONE);
>>+
>>+	ret = pthread_cancel(thread_id);
>>+	if (ret) {
>>+		fprintf(stderr, "pthread_cancel: %d\n", ret);
>>+		exit(EXIT_FAILURE);
>>+	}
>>+
>>+	/* Wait for the thread to terminate */
>>+	ret = pthread_join(thread_id, NULL);
>>+	if (ret) {
>>+		fprintf(stderr, "pthread_join: %d\n", ret);
>>+		exit(EXIT_FAILURE);
>>+	}
>>+
>>+	/* Restore the old handler */
>>+	if (signal(SIGUSR1, old_handler) == SIG_ERR) {
>>+		perror("signal");
>>+		exit(EXIT_FAILURE);
>>+	}
>>+}
>>+
>>+static void test_stream_transport_change_server(const struct test_opts *opts)
>>+{
>>+	int ret, s;
>>+
>>+	s = vsock_stream_listen(VMADDR_CID_ANY, opts->peer_port);
>>+
>>+	/* Set the socket to be nonblocking because connects that have been interrupted
>>+	 * (EINTR) can fill the receiver's accept queue anyway, leading to connect failure.
>>+	 * As of today (6.15) in such situation there is no way to understand, from the
>>+	 * client side, if the connection has been queued in the server or not.
>>+	 */
>>+	ret = fcntl(s, F_SETFL, fcntl(s, F_GETFL, 0) | O_NONBLOCK);
>>+	if (ret < 0) {
>
>nit: If you need to resend, I'd remove `ret` and check fcntl directly:
>	if (fcntl(...) < 0) {
Ok!
>
>>+		perror("fcntl");
>>+		exit(EXIT_FAILURE);
>>+	}
>>+	control_writeln("LISTENING");
>>+
>>+	while (control_readulong() == CONTROL_CONTINUE) {
>>+		struct sockaddr_vm sa_client;
>>+		socklen_t socklen_client = sizeof(sa_client);
>>+
>>+		/* Must accept the connection, otherwise the `listen`
>>+		 * queue will fill up and new connections will fail.
>>+		 * There can be more than one queued connection,
>>+		 * clear them all.
>>+		 */
>>+		while (true) {
>>+			int client = accept(s, (struct sockaddr *)&sa_client, &socklen_client);
>>+
>>+			if (client < 0 && errno != EAGAIN) {
>>+				perror("accept");
>>+				exit(EXIT_FAILURE);
>>+			} else if (client > 0) {
>
>0 in theory is a valid fd, so here we should check `client >= 0`.
Right!
>
>>+				close(client);
>>+			}
>>+
>>+			if (errno == EAGAIN)
>>+				break;
>
>I think you can refactor in this way:
>			if (client < 0) {
>				if (errno == EAGAIN)
>					break;
>
>				perror("accept");
>				exit(EXIT_FAILURE);
>			}
>
>			close(client);
>
Will do.
>Thanks,
>Stefano
>

Thanks for the review.
Luigi