[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <pjwrj4pnl3jheypjbkcopjv7uilcdiog3pxb3m57zyeq47sc22@7sl6otxuims2>
Date: Mon, 16 Jun 2025 16:23:40 +0200
From: Luigi Leonardi <leonardi@...hat.com>
To: Stefano Garzarella <sgarzare@...hat.com>
Cc: Michal Luczaj <mhal@...x.co>, virtualization@...ts.linux.dev,
netdev@...r.kernel.org, linux-kernel@...r.kernel.org, Hyunwoo Kim <v4bel@...ori.io>
Subject: Re: [PATCH net-next v3] vsock/test: Add test for null ptr deref when
transport changes
Hi Stefano,
On Wed, Jun 11, 2025 at 04:53:11PM +0200, Stefano Garzarella wrote:
>On Wed, Jun 11, 2025 at 04:07:25PM +0200, Luigi Leonardi wrote:
>>Add a new test to ensure that when the transport changes a null pointer
>>dereference does not occur. The bug was reported upstream [1] and fixed
>>with commit 2cb7c756f605 ("vsock/virtio: discard packets if the
>>transport changes").
>>
>>KASAN: null-ptr-deref in range [0x0000000000000060-0x0000000000000067]
>>CPU: 2 UID: 0 PID: 463 Comm: kworker/2:3 Not tainted
>>Workqueue: vsock-loopback vsock_loopback_work
>>RIP: 0010:vsock_stream_has_data+0x44/0x70
>>Call Trace:
>>virtio_transport_do_close+0x68/0x1a0
>>virtio_transport_recv_pkt+0x1045/0x2ae4
>>vsock_loopback_work+0x27d/0x3f0
>>process_one_work+0x846/0x1420
>>worker_thread+0x5b3/0xf80
>>kthread+0x35a/0x700
>>ret_from_fork+0x2d/0x70
>>ret_from_fork_asm+0x1a/0x30
>>
>>Note that this test may not fail in a kernel without the fix, but it may
>>hang on the client side if it triggers a kernel oops.
>>
>>This works by creating a socket, trying to connect to a server, and then
>>executing a second connect operation on the same socket but to a
>>different CID (0). This triggers a transport change. If the connect
>>operation is interrupted by a signal, this could cause a null-ptr-deref.
>>
>>Since this bug is non-deterministic, we need to try several times. It
>>is reasonable to assume that the bug will show up within the timeout
>>period.
>>
>>If there is a G2H transport loaded in the system, the bug is not
>>triggered and this test will always pass.
>
>Should we re-use what Michal is doing in https://lore.kernel.org/virtualization/20250528-vsock-test-inc-cov-v2-0-8f655b40d57c@rbox.co/
>to print a warning?
Yes, good idea! IMHO we can skip the test if a G2H transport is loaded.
I'll wait for Michal's patch to be merged before sending v4.
>
>>
>>[1]https://lore.kernel.org/netdev/Z2LvdTTQR7dBmPb5@v4bel-B760M-AORUS-ELITE-AX/
>>
>>Suggested-by: Hyunwoo Kim <v4bel@...ori.io>
>>
>>#include "vsock_test_zerocopy.h"
>>#include "timeout.h"
>>@@ -1811,6 +1813,168 @@ static void test_stream_connect_retry_server(const struct test_opts *opts)
>> close(fd);
>>+ /* Set CID to 0 cause a transport change. */
>>+ sa.svm_cid = 0;
>>+ /* This connect must fail. No-one listening on CID 0
>>+ * This connect can also be interrupted, ignore this error.
>>+ */
>>+ ret = connect(s, (struct sockaddr *)&sa, sizeof(sa));
>>+ if (ret != -1 && errno != EINTR) {
>
>Should this condition be `ret != -1 || errno != EINTR` ?
>
Right, good catch.
>
>>+ fprintf(stderr,
>>+ "connect: expected a failure because of unused CID: %d\n", errno);
>>+ exit(EXIT_FAILURE);
>>+ }
>>+
>>+ close(s);
>>+
>>+ control_writeulong(CONTROL_CONTINUE);
>>+
>>+ } while (current_nsec() < tout);
>>+
>>+ control_writeulong(CONTROL_DONE);
>>+
>>+ ret = pthread_cancel(thread_id);
>>+ if (ret) {
>>+ fprintf(stderr, "pthread_cancel: %d\n", ret);
>>+ exit(EXIT_FAILURE);
>>+ }
>>+
>>+ /* Wait for the thread to terminate */
>>+ ret = pthread_join(thread_id, NULL);
>>+ if (ret) {
>>+ fprintf(stderr, "pthread_join: %d\n", ret);
>>+ exit(EXIT_FAILURE);
>>+ }
>>+
>>+ /* Restore the old handler */
>>+ if (signal(SIGUSR1, old_handler) == SIG_ERR) {
>>+ perror("signal");
>>+ exit(EXIT_FAILURE);
>>+ }
>>+}
>>+
>>+static void test_stream_transport_change_server(const struct test_opts *opts)
>>+{
>>+ int ret, s;
>>+
>>+ s = vsock_stream_listen(VMADDR_CID_ANY, opts->peer_port);
>>+
>>+ /* Set the socket to be nonblocking because connects that have been interrupted
>>+ * (EINTR) can fill the receiver's accept queue anyway, leading to connect failure.
>>+ * As of today (6.15) in such situation there is no way to understand, from the
>>+ * client side, if the connection has been queued in the server or not.
>>+ */
>>+ ret = fcntl(s, F_SETFL, fcntl(s, F_GETFL, 0) | O_NONBLOCK);
>>+ if (ret < 0) {
>
>nit: If you need to resend, I'd remove `ret` and check fcntl directly:
> if (fcntl(...) < 0) {
Ok!
>
>>+ perror("fcntl");
>>+ exit(EXIT_FAILURE);
>>+ }
>>+ control_writeln("LISTENING");
>>+
>>+ while (control_readulong() == CONTROL_CONTINUE) {
>>+ struct sockaddr_vm sa_client;
>>+ socklen_t socklen_client = sizeof(sa_client);
>>+
>>+ /* Must accept the connection, otherwise the `listen`
>>+ * queue will fill up and new connections will fail.
>>+ * There can be more than one queued connection,
>>+ * clear them all.
>>+ */
>>+ while (true) {
>>+ int client = accept(s, (struct sockaddr *)&sa_client, &socklen_client);
>>+
>>+ if (client < 0 && errno != EAGAIN) {
>>+ perror("accept");
>>+ exit(EXIT_FAILURE);
>>+ } else if (client > 0) {
>
>0 in theory is a valid fd, so here we should check `client >= 0`.
Right!
>
>>+ close(client);
>>+ }
>>+
>>+ if (errno == EAGAIN)
>>+ break;
>
>I think you can refactor in this way:
> if (client < 0) {
> if (errno == EAGAIN)
> break;
>
> perror("accept");
> exit(EXIT_FAILURE);
> }
>
> close(client);
>
Will do.
>Thanks,
>Stefano
>
Thanks for the review.
Luigi
Powered by blists - more mailing lists