[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aa00af3b-2bb1-4c09-8222-edeec0520ae1@rbox.co>
Date: Tue, 25 Mar 2025 14:22:45 +0100
From: Michal Luczaj <mhal@...x.co>
To: Stefano Garzarella <sgarzare@...hat.com>
Cc: "David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>,
Paolo Abeni <pabeni@...hat.com>, Simon Horman <horms@...nel.org>,
netdev@...r.kernel.org
Subject: Re: [PATCH net 2/2] vsock/test: Add test for SO_LINGER null ptr deref
On 3/20/25 12:31, Stefano Garzarella wrote:
> On Fri, Mar 14, 2025 at 04:25:16PM +0100, Michal Luczaj wrote:
>> On 3/10/25 16:24, Stefano Garzarella wrote:
>>> On Fri, Mar 07, 2025 at 10:49:52AM +0100, Michal Luczaj wrote:
>>>> ...
>>>> I've tried modifying the loop to make close()/shutdown() linger until
>>>> unsent_bytes() == 0. No idea if this is acceptable:
>>>
>>> Yes, that's a good idea, I had something similar in mind, but reusing
>>> unsent_bytes() sounds great to me.
>>>
>>> The only problem I see is that in the driver in the guest, the packets
>>> are put in the virtqueue and the variable is decremented only when the
>>> host sends us an interrupt to say that it has copied the packets and
>>> then the guest can free the buffer. Is this okay to consider this as
>>> sending?
>>>
>>> I think so, though it's honestly not clear to me if instead by sending
>>> we should consider when the driver copies the bytes into the virtqueue,
>>> but that doesn't mean they were really sent. We should compare it to
>>> what the network devices or AF_UNIX do.
>>
>> I had a look at AF_UNIX. SO_LINGER is not supported. Which makes sense;
>> when you send a packet, it directly lands in receiver's queue. As for
>> SIOCOUTQ handling: `return sk_wmem_alloc_get(sk)`. So I guess it's more of
>> an "unread bytes"?
>
> Yes, I see, actually for AF_UNIX it is simple.
> It's hard for us to tell when the user on the other pear actually read
> the data, we could use the credit mechanism, but that sometimes isn't
> sent unless explicitly requested, so I'd say unsent_bytes() is fine.
One more option: keep the semantics (in a state of not-what-`man 7 socket`-
says) and, for completeness, add the lingering to shutdown()?
>>>> ...
>>>> This works, but I find it difficult to test without artificially slowing
>>>> the kernel down. It's a race against workers as they quite eagerly do
>>>> virtio_transport_consume_skb_sent(), which decrements vvs->bytes_unsent.
>>>> I've tried reducing SO_VM_SOCKETS_BUFFER_SIZE as you've suggested, but
>>>> send() would just block until peer had available space.
>>>
>>> Did you test with loopback or virtio-vsock with a VM?
>>
>> Both, but I may be missing something. Do you see a way to stop (or don't
>> schedule) the worker from processing queue (and decrementing bytes_unsent)?
>
> Without touching the driver (which I don't want to do) I can't think of
> anything, so I'd say it's okay.
Turns out there's a way to purge the loopback queue before worker processes
it (I had no success with g2h). If you win that race, bytes_unsent stays
elevated until kingdom come. Then you can close() the socket and watch as
it lingers.
connect(s)
lock_sock
while (sk_state != TCP_ESTABLISHED)
release_sock
schedule_timeout
// virtio_transport_recv_connecting
// sk_state = TCP_ESTABLISHED
send(s, 'x')
lock_sock
virtio_transport_send_pkt_info
virtio_transport_get_credit
(!) vvs->bytes_unsent += ret
vsock_loopback_send_pkt
virtio_vsock_skb_queue_tail
release_sock
kill()
lock_sock
if signal_pending
vsock_loopback_cancel_pkt
virtio_transport_purge_skbs (!)
That said, I may be missing a bigger picture, but is it worth supporting
this "signal disconnects TCP_ESTABLISHED" behaviour in the first place?
Removing it would make the race above (and the whole [1] series) moot.
Plus, it appears to be broken: when I hit this condition and I try to
re-connect to the same listener, I get ETIMEDOUT for loopback and
ECONNRESET for g2h virtio; see [2].
[1]: https://lore.kernel.org/netdev/20250317-vsock-trans-signal-race-v4-0-fc8837f3f1d4@rbox.co/
[2]: Inspired by Luigi's code, which I mauled terribly:
diff --git a/tools/testing/vsock/vsock_test.c b/tools/testing/vsock/vsock_test.c
index d0f6d253ac72..aa4a321ddd9c 100644
--- a/tools/testing/vsock/vsock_test.c
+++ b/tools/testing/vsock/vsock_test.c
@@ -23,6 +23,7 @@
#include <sys/ioctl.h>
#include <linux/sockios.h>
#include <linux/time64.h>
+#include <pthread.h>
#include "vsock_test_zerocopy.h"
#include "timeout.h"
@@ -1824,6 +1825,104 @@ static void test_stream_linger_server(const struct test_opts *opts)
close(fd);
}
+static void handler(int signum)
+{
+ /* nop */
+}
+
+static void *killer(void *arg)
+{
+ pid_t pid = getpid();
+
+ if ((errno = pthread_setcanceltype(PTHREAD_CANCEL_ASYNCHRONOUS, NULL))) {
+ perror("pthread_setcanceltype");
+ exit(EXIT_FAILURE);
+ }
+
+ for (;;) {
+ if (kill(pid, SIGUSR1)) {
+ perror("kill");
+ exit(EXIT_FAILURE);
+ }
+ }
+
+ return NULL;
+}
+
+static void client(const struct test_opts *opts)
+{
+ struct sockaddr_vm addr = {
+ .svm_family = AF_VSOCK,
+ .svm_cid = opts->peer_cid,
+ .svm_port = opts->peer_port,
+ };
+ sighandler_t old_handler;
+ bool reconnect = false;
+ pthread_t tid;
+ time_t tout;
+ int c;
+
+ old_handler = signal(SIGUSR1, handler);
+ if (old_handler == SIG_ERR) {
+ perror("signal");
+ exit(EXIT_FAILURE);
+ }
+
+ if ((errno = pthread_create(&tid, NULL, killer, NULL))) {
+ perror("pthread_create");
+ exit(EXIT_FAILURE);
+ }
+
+ tout = current_nsec() + 2 * NSEC_PER_SEC;
+ do {
+ c = socket(AF_VSOCK, SOCK_STREAM, 0);
+ if (c < 0) {
+ perror("socket");
+ exit(EXIT_FAILURE);
+ }
+
+ if (connect(c, (struct sockaddr *)&addr, sizeof(addr)) &&
+ errno == EINTR) {
+ reconnect = true;
+ break;
+ }
+
+ close(c);
+ } while (current_nsec() < tout);
+
+ if ((errno = pthread_cancel(tid))) {
+ perror("pthread_cancel");
+ exit(EXIT_FAILURE);
+ }
+
+ if ((errno = pthread_join(tid, NULL))) {
+ perror("pthread_join");
+ exit(EXIT_FAILURE);
+ }
+
+ if (signal(SIGUSR1, old_handler) == SIG_ERR) {
+ perror("signal");
+ exit(EXIT_FAILURE);
+ }
+
+ if (reconnect) {
+ if (connect(c, (struct sockaddr *)&addr, sizeof(addr))) {
+ perror("re-connect() after EINTR");
+ exit(EXIT_FAILURE);
+ }
+ close(c);
+ }
+
+ control_writeln("DONE");
+}
+
+static void server(const struct test_opts *opts)
+{
+ int s = vsock_stream_listen(VMADDR_CID_ANY, opts->peer_port);
+ control_expectln("DONE");
+ close(s);
+}
+
static struct test_case test_cases[] = {
{
.name = "SOCK_STREAM connection reset",
@@ -1984,6 +2083,11 @@ static struct test_case test_cases[] = {
.run_client = test_stream_linger_client,
.run_server = test_stream_linger_server,
},
+ {
+ .name = "SOCK_STREAM connect -> EINTR -> connect",
+ .run_client = client,
+ .run_server = server,
+ },
{},
};
Powered by blists - more mailing lists