[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <c2ln3v2anshvy64xflck7agirk55wkelnfznvkf6wq4bvezzdi@kod4yt5b6qhl>
Date: Mon, 30 Jun 2025 11:24:48 +0200
From: Luigi Leonardi <leonardi@...hat.com>
To: Stefano Garzarella <sgarzare@...hat.com>
Cc: Michal Luczaj <mhal@...x.co>, virtualization@...ts.linux.dev,
netdev@...r.kernel.org, linux-kernel@...r.kernel.org, Hyunwoo Kim <v4bel@...ori.io>
Subject: Re: [PATCH net-next v4] vsock/test: Add test for null ptr deref when
transport changes
Hi Stefano,
On Wed, Jun 25, 2025 at 10:26:26AM +0200, Stefano Garzarella wrote:
>On Tue, Jun 24, 2025 at 05:40:15PM +0200, Luigi Leonardi wrote:
>>Add a new test to ensure that when the transport changes a null pointer
>>dereference does not occur. The bug was reported upstream [1] and fixed
>>with commit 2cb7c756f605 ("vsock/virtio: discard packets if the
>>transport changes").
>>
>>KASAN: null-ptr-deref in range [0x0000000000000060-0x0000000000000067]
>>CPU: 2 UID: 0 PID: 463 Comm: kworker/2:3 Not tainted
>>Workqueue: vsock-loopback vsock_loopback_work
>>RIP: 0010:vsock_stream_has_data+0x44/0x70
>>Call Trace:
>>virtio_transport_do_close+0x68/0x1a0
>>virtio_transport_recv_pkt+0x1045/0x2ae4
>>vsock_loopback_work+0x27d/0x3f0
>>process_one_work+0x846/0x1420
>>worker_thread+0x5b3/0xf80
>>kthread+0x35a/0x700
>>ret_from_fork+0x2d/0x70
>>ret_from_fork_asm+0x1a/0x30
>>
>>Note that this test may not fail in a kernel without the fix, but it may
>>hang on the client side if it triggers a kernel oops.
>>
>>This works by creating a socket, trying to connect to a server, and then
>>executing a second connect operation on the same socket but to a
>>different CID (0). This triggers a transport change. If the connect
>>operation is interrupted by a signal, this could cause a null-ptr-deref.
>>
>>Since this bug is non-deterministic, we need to try several times. It
>>is reasonable to assume that the bug will show up within the timeout
>>period.
>>
>>If there is a G2H transport loaded in the system, the bug is not
>>triggered and this test will always pass.
>
>Can you add the reason?
Will do.
>
>>
>>[1]https://lore.kernel.org/netdev/Z2LvdTTQR7dBmPb5@v4bel-B760M-AORUS-ELITE-AX/
>>
>>Suggested-by: Hyunwoo Kim <v4bel@...ori.io>
>>Suggested-by: Michal Luczaj <mhal@...x.co>
>>Signed-off-by: Luigi Leonardi <leonardi@...hat.com>
>>---
>>This series introduces a new test that checks for a null pointer
>>dereference that may happen when there is a transport change[1]. This
>>bug was fixed in [2].
>>
>>Note that this test *cannot* fail, it hangs if it triggers a kernel
>>oops. The intended use-case is to run it and then check if there is any
>>oops in the dmesg.
>>
>>This test is based on Hyunwoo Kim's[3] and Michal's python
>>reproducers[4].
>>
>>[1]https://lore.kernel.org/netdev/Z2LvdTTQR7dBmPb5@v4bel-B760M-AORUS-ELITE-AX/
>>[2]https://lore.kernel.org/netdev/20250110083511.30419-1-sgarzare@redhat.com/
>>[3]https://lore.kernel.org/netdev/Z2LvdTTQR7dBmPb5@v4bel-B760M-AORUS-ELITE-AX/#t
>>[4]https://lore.kernel.org/netdev/2b3062e3-bdaa-4c94-a3c0-2930595b9670@rbox.co/
>>---
>>Changes in v4:
>>- Addressed Stefano's comments:
>> - Minor style changes
>> - Use `get_transports()` to print a warning when a G2H transport is
>> loaded
>> - Removed check on second connect: Because the first connect is
>> interrupted, the socket is in an unspecified state (see man connect)
>> . This can cause strange and unexpected behaviors (connect returning
>> success on a non-existing CID).
>>
>>- Link to v3:
>>https://lore.kernel.org/r/20250611-test_vsock-v3-1-8414a2d4df62@redhat.com
>>
>>Sorry, this took waaay longer than expected.
>>
>>Changes in v3:
>>Addressed Stefano's and Michal's comments:
>> - Added the splat text to the commit commessage.
>> - Introduced commit hash that fixes the bug.
>> - Not using perror anymore on pthread_* functions.
>> - Listener is just created once.
>>
>>- Link to v2:
>>https://lore.kernel.org/r/20250314-test_vsock-v2-1-3c0a1d878a6d@redhat.com
>>
>>Changes in v2:
>>- Addressed Stefano's comments:
>> - Timeout is now using current_nsec()
>> - Check for return values
>> - Style issues
>>- Added Hyunwoo Kim to Suggested-by
>>- Link to v1:
>>https://lore.kernel.org/r/20250306-test_vsock-v1-0-0320b5accf92@redhat.com
>>---
>>tools/testing/vsock/Makefile | 1 +
>>tools/testing/vsock/vsock_test.c | 178 +++++++++++++++++++++++++++++++++++++++
>>2 files changed, 179 insertions(+)
>>
>>diff --git a/tools/testing/vsock/Makefile b/tools/testing/vsock/Makefile
>>index 6e0b4e95e230500f99bb9c74350701a037ecd198..88211fd132d23ecdfd56ab0815580a237889e7f2 100644
>>--- a/tools/testing/vsock/Makefile
>>+++ b/tools/testing/vsock/Makefile
>>@@ -5,6 +5,7 @@ vsock_test: vsock_test.o vsock_test_zerocopy.o timeout.o control.o util.o msg_ze
>>vsock_diag_test: vsock_diag_test.o timeout.o control.o util.o
>>vsock_perf: vsock_perf.o msg_zerocopy_common.o
>>
>>+vsock_test: LDLIBS = -lpthread
>>vsock_uring_test: LDLIBS = -luring
>>vsock_uring_test: control.o util.o vsock_uring_test.o timeout.o msg_zerocopy_common.o
>>
>>diff --git a/tools/testing/vsock/vsock_test.c b/tools/testing/vsock/vsock_test.c
>>index eb6f54378667ac7ed324f4823e988ec9846e41a3..095705c7b53b7ad38ab3b8bc3cbe54a9eeb76d5c 100644
>>--- a/tools/testing/vsock/vsock_test.c
>>+++ b/tools/testing/vsock/vsock_test.c
>>@@ -22,6 +22,8 @@
>>#include <signal.h>
>>#include <sys/ioctl.h>
>>#include <linux/time64.h>
>>+#include <pthread.h>
>>+#include <fcntl.h>
>>
>>#include "vsock_test_zerocopy.h"
>>#include "timeout.h"
>>@@ -1867,6 +1869,177 @@ static void test_stream_connect_retry_server(const struct test_opts *opts)
>> close(fd);
>>}
>>
>>+#define TRANSPORT_CHANGE_TIMEOUT 2 /* seconds */
>>+
>>+static void *test_stream_transport_change_thread(void *vargp)
>>+{
>>+ pid_t *pid = (pid_t *)vargp;
>>+ int ret;
>>+
>>+ /* We want this thread to terminate as soon as possible */
>>+ ret = pthread_setcanceltype(PTHREAD_CANCEL_ASYNCHRONOUS, NULL);
>>+ if (ret) {
>>+ fprintf(stderr, "pthread_setcanceltype: %d\n", ret);
>>+ exit(EXIT_FAILURE);
>>+ }
>>+
>>+ while (true) {
>>+ if (kill(*pid, SIGUSR1) < 0) {
>>+ perror("kill");
>>+ exit(EXIT_FAILURE);
>>+ }
>>+ }
>>+ return NULL;
>>+}
>>+
>>+static void test_transport_change_signal_handler(int signal)
>>+{
>>+ /* We need a custom handler for SIGUSR1 as the default one terminates the process. */
>>+}
>>+
>>+static void test_stream_transport_change_client(const struct test_opts *opts)
>>+{
>>+ __sighandler_t old_handler;
>>+ pid_t pid = getpid();
>>+ pthread_t thread_id;
>>+ time_t tout;
>>+ int ret, tr;
>>+
>>+ tr = get_transports();
>
>nit: add a blank line here
>
>>+ /* Print a warning if there is a G2H transport loaded.
>>+ * This is on a best effort basis because VMCI can be either G2H and H2G, and there is
>>+ * no easy way to understand it.
>>+ * The bug is present in the loopback transport. However, it does not interfere
>
>nit: s/is/was
>
>>+ * if it is loaded.
>
>I don't understand this, if the bug was present in the loopback transport, our goal is to stress it, so have it loaded is great, why it should interfere?
What I meant to say is that, when testing the H2G path to trigger the
issue, and the loopback transport is there, it does not prevent the bug,
like a G2H transport would do.
>
>>+ * The bug we are testing only appears when G2H transports are not loaded.
>
>Please add the reason also here.
>
>>+ */
>>+
>
>nit: remove the blank line here
>
>>+ tr &= ~TRANSPORT_LOOPBACK;
>>+ if (tr != 0 && tr != TRANSPORT_VHOST)
>
>Sorry, this is really hard to understand IMO, let's do a step back.
>Your goal is to check if there is a G2H transport loaded, right?
correct
>
>I think we have 2 options:
>1. similar to your, just masking the other 2 transports
>
> if (tr & ~(TRANSPORT_LOOPBACK | TRANSPORT_VHOST))
>
> IMO this is much clear to understand, and should have the same
> effect.
>
>2. (my preference) define in util.h some macros that we can reuse:
> #define TRANSPORTS_G2H (TRANSPORT_VIRTIO | TRANSPORT_VMCI |
> TRANSPORT_HYPERV)
> #define TRANSPORTS_H2G (TRANSPORT_VHOST | TRANSPORT_VMCI)
> #define TRANSPORTS_LOCAL (TRANSPORT_LOOPBACK)
>
> and here you can just do:
> if (tr & TRANSPORTS_G2H)
>
I'll add these defines in a separate commit, thanks for the hint.
>>+ fprintf(stderr, "G2H Transport detected. This test will not fail.\n");
>>+
>>+ old_handler = signal(SIGUSR1, test_transport_change_signal_handler);
>>+ if (old_handler == SIG_ERR) {
>>+ perror("signal");
>>+ exit(EXIT_FAILURE);
>>+ }
>>+
>>+ ret = pthread_create(&thread_id, NULL, test_stream_transport_change_thread, &pid);
>>+ if (ret) {
>>+ fprintf(stderr, "pthread_create: %d\n", ret);
>>+ exit(EXIT_FAILURE);
>>+ }
>>+
>>+ control_expectln("LISTENING");
>>+
>>+ tout = current_nsec() + TRANSPORT_CHANGE_TIMEOUT * NSEC_PER_SEC;
>>+ do {
>>+ struct sockaddr_vm sa = {
>>+ .svm_family = AF_VSOCK,
>>+ .svm_cid = opts->peer_cid,
>>+ .svm_port = opts->peer_port,
>>+ };
>>+ int s;
>>+
>>+ s = socket(AF_VSOCK, SOCK_STREAM, 0);
>>+ if (s < 0) {
>>+ perror("socket");
>>+ exit(EXIT_FAILURE);
>>+ }
>>+
>>+ ret = connect(s, (struct sockaddr *)&sa, sizeof(sa));
>>+ /* The connect can fail due to signals coming from the thread.
>
>. should be ,
>
>>+ * or because the receiver connection queue is full.
>>+ * Ignoring also the latter case because there is no way
>>+ * of synchronizing client's connect and server's accept when
>>+ * connect(s) are constantly being interrupted by signals.
>>+ */
>>+ if (ret == -1 && (errno != EINTR && errno != ECONNRESET)) {
>>+ perror("connect");
>>+ exit(EXIT_FAILURE);
>>+ }
>>+
>>+ /* Set CID to 0 cause a transport change. */
>>+ sa.svm_cid = 0;
>
>nit: add a blank line
>
>>+ /* Here we ignore the connect return value because we cannot
>>+ * safely assume that it will *always* fail.
>>+ * This is because the previous connect was interrupted
>>+ * during the connection process. The socket state, as stated
>>+ * in `man connect`, is unspecified and can result in strange
>>+ * behaviors.
>
>Let's focus on the behaviour and try to be more concise. Something like
>this:
>
> /* Ignore return value since it can fail or not.
> * If the previous connect is interrupted while the
> * connection request is already sent, the second
> * connect() will wait for the response.
>
>>+ */
>>+ connect(s, (struct sockaddr *)&sa, sizeof(sa));
>>+
>>+ close(s);
>>+
>>+ control_writeulong(CONTROL_CONTINUE);
>>+
>>+ } while (current_nsec() < tout);
>>+
>>+ control_writeulong(CONTROL_DONE);
>>+
>>+ ret = pthread_cancel(thread_id);
>>+ if (ret) {
>>+ fprintf(stderr, "pthread_cancel: %d\n", ret);
>>+ exit(EXIT_FAILURE);
>>+ }
>>+
>>+ /* Wait for the thread to terminate */
>
>useless comment
>
>>+ ret = pthread_join(thread_id, NULL);
>>+ if (ret) {
>>+ fprintf(stderr, "pthread_join: %d\n", ret);
>>+ exit(EXIT_FAILURE);
>>+ }
>>+
>>+ /* Restore the old handler */
>
>ditto
>
>>+ if (signal(SIGUSR1, old_handler) == SIG_ERR) {
>>+ perror("signal");
>>+ exit(EXIT_FAILURE);
>>+ }
>>+}
>>+
>>+static void test_stream_transport_change_server(const struct test_opts *opts)
>>+{
>>+ int s = vsock_stream_listen(VMADDR_CID_ANY, opts->peer_port);
>>+
>>+ /* Set the socket to be nonblocking because connects that have been interrupted
>>+ * (EINTR) can fill the receiver's accept queue anyway, leading to connect failure.
>>+ * As of today (6.15) in such situation there is no way to understand, from the
>>+ * client side, if the connection has been queued in the server or not.
>>+ */
>>+ if (fcntl(s, F_SETFL, fcntl(s, F_GETFL, 0) | O_NONBLOCK) < 0) {
>>+ perror("fcntl");
>>+ exit(EXIT_FAILURE);
>>+ }
>>+ control_writeln("LISTENING");
>>+
>>+ while (control_readulong() == CONTROL_CONTINUE) {
>>+ struct sockaddr_vm sa_client;
>>+ socklen_t socklen_client = sizeof(sa_client);
>>+
>>+ /* Must accept the connection, otherwise the `listen`
>>+ * queue will fill up and new connections will fail.
>>+ * There can be more than one queued connection,
>>+ * clear them all.
>>+ */
>>+ while (true) {
>>+ int client = accept(s, (struct sockaddr *)&sa_client, &socklen_client);
>
>We don't use the client address, so IMO we can just pass NULL.
>
>Thanks,
>Stefano
>
>>+
>>+ if (client < 0) {
>>+ if (errno == EAGAIN)
>>+ break;
>>+
>>+ perror("accept");
>>+ exit(EXIT_FAILURE);
>>+ }
>>+
>>+ close(client);
>>+ }
>>+ }
>>+
>>+ close(s);
>>+}
>>+
>>static void test_stream_linger_client(const struct test_opts *opts)
>>{
>> int fd;
>>@@ -2106,6 +2279,11 @@ static struct test_case test_cases[] = {
>> .run_client = test_stream_nolinger_client,
>> .run_server = test_stream_nolinger_server,
>> },
>>+ {
>>+ .name = "SOCK_STREAM transport change null-ptr-deref",
>>+ .run_client = test_stream_transport_change_client,
>>+ .run_server = test_stream_transport_change_server,
>>+ },
>> {},
>>};
>>
>>
>>---
>>base-commit: 68d019aa14d97f8d57b0f8d203fd3b44db2ba0c7
>>change-id: 20250306-test_vsock-3e77a9c7a245
>>
>>Best regards,
>>--
>>Luigi Leonardi <leonardi@...hat.com>
>>
>
Thanks for the review all the comments!
Luigi
Powered by blists - more mailing lists