[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <zpc6pbabs5m5snrsfubtl3wp4eb64w4qwqosywp7tsmrfnba3j@ybkgg2cnhqec>
Date: Wed, 11 Jun 2025 16:53:11 +0200
From: Stefano Garzarella <sgarzare@...hat.com>
To: Luigi Leonardi <leonardi@...hat.com>
Cc: Michal Luczaj <mhal@...x.co>, virtualization@...ts.linux.dev,
netdev@...r.kernel.org, linux-kernel@...r.kernel.org, Hyunwoo Kim <v4bel@...ori.io>
Subject: Re: [PATCH net-next v3] vsock/test: Add test for null ptr deref when
transport changes
On Wed, Jun 11, 2025 at 04:07:25PM +0200, Luigi Leonardi wrote:
>Add a new test to ensure that when the transport changes a null pointer
>dereference does not occur. The bug was reported upstream [1] and fixed
>with commit 2cb7c756f605 ("vsock/virtio: discard packets if the
>transport changes").
>
>KASAN: null-ptr-deref in range [0x0000000000000060-0x0000000000000067]
>CPU: 2 UID: 0 PID: 463 Comm: kworker/2:3 Not tainted
>Workqueue: vsock-loopback vsock_loopback_work
>RIP: 0010:vsock_stream_has_data+0x44/0x70
>Call Trace:
> virtio_transport_do_close+0x68/0x1a0
> virtio_transport_recv_pkt+0x1045/0x2ae4
> vsock_loopback_work+0x27d/0x3f0
> process_one_work+0x846/0x1420
> worker_thread+0x5b3/0xf80
> kthread+0x35a/0x700
> ret_from_fork+0x2d/0x70
> ret_from_fork_asm+0x1a/0x30
>
>Note that this test may not fail in a kernel without the fix, but it may
>hang on the client side if it triggers a kernel oops.
>
>This works by creating a socket, trying to connect to a server, and then
>executing a second connect operation on the same socket but to a
>different CID (0). This triggers a transport change. If the connect
>operation is interrupted by a signal, this could cause a null-ptr-deref.
>
>Since this bug is non-deterministic, we need to try several times. It
>is reasonable to assume that the bug will show up within the timeout
>period.
>
>If there is a G2H transport loaded in the system, the bug is not
>triggered and this test will always pass.
Should we re-use what Michal is doing in
https://lore.kernel.org/virtualization/20250528-vsock-test-inc-cov-v2-0-8f655b40d57c@rbox.co/
to print a warning?
>
>[1]https://lore.kernel.org/netdev/Z2LvdTTQR7dBmPb5@v4bel-B760M-AORUS-ELITE-AX/
>
>Suggested-by: Hyunwoo Kim <v4bel@...ori.io>
>Suggested-by: Michal Luczaj <mhal@...x.co>
>Signed-off-by: Luigi Leonardi <leonardi@...hat.com>
>---
>This series introduces a new test that checks for a null pointer
>dereference that may happen when there is a transport change[1]. This
>bug was fixed in [2].
>
>Note that this test *cannot* fail, it hangs if it triggers a kernel
>oops. The intended use-case is to run it and then check if there is any
>oops in the dmesg.
>
>This test is based on Hyunwoo Kim's[3] and Michal's python
>reproducers[4].
>
>[1]https://lore.kernel.org/netdev/Z2LvdTTQR7dBmPb5@v4bel-B760M-AORUS-ELITE-AX/
>[2]https://lore.kernel.org/netdev/20250110083511.30419-1-sgarzare@redhat.com/
>[3]https://lore.kernel.org/netdev/Z2LvdTTQR7dBmPb5@v4bel-B760M-AORUS-ELITE-AX/#t
>[4]https://lore.kernel.org/netdev/2b3062e3-bdaa-4c94-a3c0-2930595b9670@rbox.co/
>---
>Sorry, this took waaay longer than expected.
>
>Changes in v3:
>Addressed Stefano's and Michal's comments:
> - Added the splat text to the commit commessage.
> - Introduced commit hash that fixes the bug.
> - Not using perror anymore on pthread_* functions.
> - Listener is just created once.
>
>- Link to v2:
>https://lore.kernel.org/r/20250314-test_vsock-v2-1-3c0a1d878a6d@redhat.com
>
>Changes in v2:
>- Addressed Stefano's comments:
> - Timeout is now using current_nsec()
> - Check for return values
> - Style issues
>- Added Hyunwoo Kim to Suggested-by
>- Link to v1: https://lore.kernel.org/r/20250306-test_vsock-v1-0-0320b5accf92@redhat.com
>---
> tools/testing/vsock/Makefile | 1 +
> tools/testing/vsock/vsock_test.c | 169 +++++++++++++++++++++++++++++++++++++++
> 2 files changed, 170 insertions(+)
>
>diff --git a/tools/testing/vsock/Makefile b/tools/testing/vsock/Makefile
>index 6e0b4e95e230500f99bb9c74350701a037ecd198..88211fd132d23ecdfd56ab0815580a237889e7f2 100644
>--- a/tools/testing/vsock/Makefile
>+++ b/tools/testing/vsock/Makefile
>@@ -5,6 +5,7 @@ vsock_test: vsock_test.o vsock_test_zerocopy.o timeout.o control.o util.o msg_ze
> vsock_diag_test: vsock_diag_test.o timeout.o control.o util.o
> vsock_perf: vsock_perf.o msg_zerocopy_common.o
>
>+vsock_test: LDLIBS = -lpthread
> vsock_uring_test: LDLIBS = -luring
> vsock_uring_test: control.o util.o vsock_uring_test.o timeout.o msg_zerocopy_common.o
>
>diff --git a/tools/testing/vsock/vsock_test.c b/tools/testing/vsock/vsock_test.c
>index f669baaa0dca3bebc678d00eafa80857d1f0fdd6..1aed483e7e622d3623be07fcd7fe4295fcfce230 100644
>--- a/tools/testing/vsock/vsock_test.c
>+++ b/tools/testing/vsock/vsock_test.c
>@@ -22,6 +22,8 @@
> #include <signal.h>
> #include <sys/ioctl.h>
> #include <linux/time64.h>
>+#include <pthread.h>
>+#include <fcntl.h>
>
> #include "vsock_test_zerocopy.h"
> #include "timeout.h"
>@@ -1811,6 +1813,168 @@ static void test_stream_connect_retry_server(const struct test_opts *opts)
> close(fd);
> }
>
>+#define TRANSPORT_CHANGE_TIMEOUT 2 /* seconds */
>+
>+static void *test_stream_transport_change_thread(void *vargp)
>+{
>+ pid_t *pid = (pid_t *)vargp;
>+ int ret;
>+
>+ /* We want this thread to terminate as soon as possible */
>+ ret = pthread_setcanceltype(PTHREAD_CANCEL_ASYNCHRONOUS, NULL);
>+ if (ret) {
>+ fprintf(stderr, "pthread_setcanceltype: %d\n", ret);
>+ exit(EXIT_FAILURE);
>+ }
>+
>+ while (true) {
>+ if (kill(*pid, SIGUSR1) < 0) {
>+ perror("kill");
>+ exit(EXIT_FAILURE);
>+ }
>+ }
>+ return NULL;
>+}
>+
>+static void test_transport_change_signal_handler(int signal)
>+{
>+ /* We need a custom handler for SIGUSR1 as the default one terminates the process. */
>+}
>+
>+static void test_stream_transport_change_client(const struct test_opts *opts)
>+{
>+ __sighandler_t old_handler;
>+ pid_t pid = getpid();
>+ pthread_t thread_id;
>+ time_t tout;
>+ int ret;
>+
>+ old_handler = signal(SIGUSR1, test_transport_change_signal_handler);
>+ if (old_handler == SIG_ERR) {
>+ perror("signal");
>+ exit(EXIT_FAILURE);
>+ }
>+
>+ ret = pthread_create(&thread_id, NULL, test_stream_transport_change_thread, &pid);
>+ if (ret) {
>+ fprintf(stderr, "pthread_create: %d\n", ret);
>+ exit(EXIT_FAILURE);
>+ }
>+
>+ control_expectln("LISTENING");
>+
>+ tout = current_nsec() + TRANSPORT_CHANGE_TIMEOUT * NSEC_PER_SEC;
>+ do {
>+ struct sockaddr_vm sa = {
>+ .svm_family = AF_VSOCK,
>+ .svm_cid = opts->peer_cid,
>+ .svm_port = opts->peer_port,
>+ };
>+ int s;
>+
>+ s = socket(AF_VSOCK, SOCK_STREAM, 0);
>+ if (s < 0) {
>+ perror("socket");
>+ exit(EXIT_FAILURE);
>+ }
>+
>+ ret = connect(s, (struct sockaddr *)&sa, sizeof(sa));
>+ /* The connect can fail due to signals coming from the thread.
>+ * or because the receiver connection queue is full.
>+ * Ignoring also the latter case because there is no way
>+ * of synchronizing client's connect and server's accept when
>+ * connect(s) are constantly being interrupted by signals.
>+ */
>+ if (ret == -1 && (errno != EINTR && errno != ECONNRESET)) {
>+ perror("connect");
>+ exit(EXIT_FAILURE);
>+ }
>+
>+ /* Set CID to 0 cause a transport change. */
>+ sa.svm_cid = 0;
>+ /* This connect must fail. No-one listening on CID 0
>+ * This connect can also be interrupted, ignore this error.
>+ */
>+ ret = connect(s, (struct sockaddr *)&sa, sizeof(sa));
>+ if (ret != -1 && errno != EINTR) {
Should this condition be `ret != -1 || errno != EINTR` ?
>+ fprintf(stderr,
>+ "connect: expected a failure because of unused CID: %d\n", errno);
>+ exit(EXIT_FAILURE);
>+ }
>+
>+ close(s);
>+
>+ control_writeulong(CONTROL_CONTINUE);
>+
>+ } while (current_nsec() < tout);
>+
>+ control_writeulong(CONTROL_DONE);
>+
>+ ret = pthread_cancel(thread_id);
>+ if (ret) {
>+ fprintf(stderr, "pthread_cancel: %d\n", ret);
>+ exit(EXIT_FAILURE);
>+ }
>+
>+ /* Wait for the thread to terminate */
>+ ret = pthread_join(thread_id, NULL);
>+ if (ret) {
>+ fprintf(stderr, "pthread_join: %d\n", ret);
>+ exit(EXIT_FAILURE);
>+ }
>+
>+ /* Restore the old handler */
>+ if (signal(SIGUSR1, old_handler) == SIG_ERR) {
>+ perror("signal");
>+ exit(EXIT_FAILURE);
>+ }
>+}
>+
>+static void test_stream_transport_change_server(const struct test_opts *opts)
>+{
>+ int ret, s;
>+
>+ s = vsock_stream_listen(VMADDR_CID_ANY, opts->peer_port);
>+
>+ /* Set the socket to be nonblocking because connects that have been interrupted
>+ * (EINTR) can fill the receiver's accept queue anyway, leading to connect failure.
>+ * As of today (6.15) in such situation there is no way to understand, from the
>+ * client side, if the connection has been queued in the server or not.
>+ */
>+ ret = fcntl(s, F_SETFL, fcntl(s, F_GETFL, 0) | O_NONBLOCK);
>+ if (ret < 0) {
nit: If you need to resend, I'd remove `ret` and check fcntl directly:
if (fcntl(...) < 0) {
>+ perror("fcntl");
>+ exit(EXIT_FAILURE);
>+ }
>+ control_writeln("LISTENING");
>+
>+ while (control_readulong() == CONTROL_CONTINUE) {
>+ struct sockaddr_vm sa_client;
>+ socklen_t socklen_client = sizeof(sa_client);
>+
>+ /* Must accept the connection, otherwise the `listen`
>+ * queue will fill up and new connections will fail.
>+ * There can be more than one queued connection,
>+ * clear them all.
>+ */
>+ while (true) {
>+ int client = accept(s, (struct sockaddr *)&sa_client, &socklen_client);
>+
>+ if (client < 0 && errno != EAGAIN) {
>+ perror("accept");
>+ exit(EXIT_FAILURE);
>+ } else if (client > 0) {
0 in theory is a valid fd, so here we should check `client >= 0`.
>+ close(client);
>+ }
>+
>+ if (errno == EAGAIN)
>+ break;
I think you can refactor in this way:
if (client < 0) {
if (errno == EAGAIN)
break;
perror("accept");
exit(EXIT_FAILURE);
}
close(client);
Thanks,
Stefano
>+ }
>+ }
>+
>+ close(s);
>+}
>+
> static void test_stream_linger_client(const struct test_opts *opts)
> {
> int fd;
>@@ -2051,6 +2215,11 @@ static struct test_case test_cases[] = {
> .run_client = test_stream_nolinger_client,
> .run_server = test_stream_nolinger_server,
> },
>+ {
>+ .name = "SOCK_STREAM transport change null-ptr-deref",
>+ .run_client = test_stream_transport_change_client,
>+ .run_server = test_stream_transport_change_server,
>+ },
> {},
> };
>
>
>---
>base-commit: 5abc7438f1e9d62e91ad775cc83c9594c48d2282
>change-id: 20250306-test_vsock-3e77a9c7a245
>
>Best regards,
>--
>Luigi Leonardi <leonardi@...hat.com>
>
Powered by blists - more mailing lists