[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <7566fe52-23b7-46cc-95ef-63cbbd3071a1@rbox.co>
Date: Fri, 4 Apr 2025 00:06:36 +0200
From: Michal Luczaj <mhal@...x.co>
To: Stefano Garzarella <sgarzare@...hat.com>
Cc: "David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>,
Paolo Abeni <pabeni@...hat.com>, Simon Horman <horms@...nel.org>,
netdev@...r.kernel.org
Subject: Re: [PATCH net 2/2] vsock/test: Add test for SO_LINGER null ptr deref
On 4/1/25 12:32, Stefano Garzarella wrote:
> On Tue, Mar 25, 2025 at 02:22:45PM +0100, Michal Luczaj wrote:
>> On 3/20/25 12:31, Stefano Garzarella wrote:
>>> On Fri, Mar 14, 2025 at 04:25:16PM +0100, Michal Luczaj wrote:
>>>> On 3/10/25 16:24, Stefano Garzarella wrote:
>>>>> On Fri, Mar 07, 2025 at 10:49:52AM +0100, Michal Luczaj wrote:
>>>>>> ...
>>>>>> I've tried modifying the loop to make close()/shutdown() linger until
>>>>>> unsent_bytes() == 0. No idea if this is acceptable:
>>>>>
>>>>> Yes, that's a good idea, I had something similar in mind, but reusing
>>>>> unsent_bytes() sounds great to me.
>>>>>
>>>>> The only problem I see is that in the driver in the guest, the packets
>>>>> are put in the virtqueue and the variable is decremented only when the
>>>>> host sends us an interrupt to say that it has copied the packets and
>>>>> then the guest can free the buffer. Is this okay to consider this as
>>>>> sending?
>>>>>
>>>>> I think so, though it's honestly not clear to me if instead by sending
>>>>> we should consider when the driver copies the bytes into the virtqueue,
>>>>> but that doesn't mean they were really sent. We should compare it to
>>>>> what the network devices or AF_UNIX do.
>>>>
>>>> I had a look at AF_UNIX. SO_LINGER is not supported. Which makes sense;
>>>> when you send a packet, it directly lands in receiver's queue. As for
>>>> SIOCOUTQ handling: `return sk_wmem_alloc_get(sk)`. So I guess it's more of
>>>> an "unread bytes"?
>>>
>>> Yes, I see, actually for AF_UNIX it is simple.
>>> It's hard for us to tell when the user on the other pear actually read
>>> the data, we could use the credit mechanism, but that sometimes isn't
>>> sent unless explicitly requested, so I'd say unsent_bytes() is fine.
>>
>> One more option: keep the semantics (in a state of not-what-`man 7 socket`-
>> says) and, for completeness, add the lingering to shutdown()?
>
> Sorry, I'm getting lost!
> That's because we had a different behavior between close() and
> shutdown() right?
>
> If it's the case, I would say let's fix at least that for now.
Yeah, okay, let's keep things simple. I'll post the patch once net-next opens.
>>>>>> ...
>>>>>> This works, but I find it difficult to test without artificially slowing
>>>>>> the kernel down. It's a race against workers as they quite eagerly do
>>>>>> virtio_transport_consume_skb_sent(), which decrements vvs->bytes_unsent.
>>>>>> I've tried reducing SO_VM_SOCKETS_BUFFER_SIZE as you've suggested, but
>>>>>> send() would just block until peer had available space.
>>>>>
>>>>> Did you test with loopback or virtio-vsock with a VM?
>>>>
>>>> Both, but I may be missing something. Do you see a way to stop (or don't
>>>> schedule) the worker from processing queue (and decrementing bytes_unsent)?
>>>
>>> Without touching the driver (which I don't want to do) I can't think of
>>> anything, so I'd say it's okay.
>>
>> Turns out there's a way to purge the loopback queue before worker processes
>> it (I had no success with g2h). If you win that race, bytes_unsent stays
>> elevated until kingdom come. Then you can close() the socket and watch as
>> it lingers.
>>
>> connect(s)
>> lock_sock
>> while (sk_state != TCP_ESTABLISHED)
>> release_sock
>> schedule_timeout
>>
>> // virtio_transport_recv_connecting
>> // sk_state = TCP_ESTABLISHED
>>
>> send(s, 'x')
>> lock_sock
>> virtio_transport_send_pkt_info
>> virtio_transport_get_credit
>> (!) vvs->bytes_unsent += ret
>> vsock_loopback_send_pkt
>> virtio_vsock_skb_queue_tail
>> release_sock
>> kill()
>> lock_sock
>> if signal_pending
>> vsock_loopback_cancel_pkt
>> virtio_transport_purge_skbs (!)
>>
>> That said, I may be missing a bigger picture, but is it worth supporting
>> this "signal disconnects TCP_ESTABLISHED" behaviour in the first place?
>
> Can you elaborate a bit?
There isn't much to it. I just wondered if connect() -- that has already
established a connection -- could ignore the signal (or pretend it came too
late), to avoid carrying out this kind of disconnect.
>> Removing it would make the race above (and the whole [1] series) moot.
>> Plus, it appears to be broken: when I hit this condition and I try to
>> re-connect to the same listener, I get ETIMEDOUT for loopback and
>> ECONNRESET for g2h virtio; see [2].
>
> Could this be related to the fix I sent some days ago?
> https://lore.kernel.org/netdev/20250328141528.420719-1-sgarzare@redhat.com/
I've tried that. I've also took a hint from your other mail and attempted
flushing the listener queue, but to no avail. Crude code below. Is there
something wrong with it?
#include <stdio.h>
#include <errno.h>
#include <stdlib.h>
#include <unistd.h>
#include <signal.h>
#include <pthread.h>
#include <sys/socket.h>
#include <linux/vm_sockets.h>
static void die(const char *msg)
{
perror(msg);
exit(-1);
}
static void barrier(pthread_barrier_t *barr)
{
errno = pthread_barrier_wait(barr);
if (errno && errno != PTHREAD_BARRIER_SERIAL_THREAD)
die("pthread_barrier_wait");
}
static void flush_accept(int s)
{
int p = accept(s, NULL, NULL);
if (p < 0) {
if (errno != EAGAIN)
perror("accept");
return;
}
printf("accept: drained\n");
close(p);
}
static void handler(int signum)
{
/* nop */
}
void static set_accept_timeout(int s)
{
struct timeval tv = { .tv_sec = 1 };
if (setsockopt(s, SOL_SOCKET, SO_RCVTIMEO, &tv, sizeof(tv)))
die("setsockopt SO_RCVTIMEO");
}
void static set_connect_timeout(int s)
{
struct timeval tv = { .tv_sec = 1 };
if (setsockopt(s, AF_VSOCK, SO_VM_SOCKETS_CONNECT_TIMEOUT, &tv,
sizeof(tv)))
die("setsockopt SO_VM_SOCKETS_CONNECT_TIMEOUT");
}
static void *killer(void *arg)
{
pthread_barrier_t *barr = (pthread_barrier_t *)arg;
pid_t pid = getpid();
if ((errno = pthread_setcanceltype(PTHREAD_CANCEL_ASYNCHRONOUS,
NULL)))
die("pthread_setcanceltype");
for (;;) {
barrier(barr);
if (kill(pid, SIGUSR1))
die("kill");
barrier(barr);
}
return NULL;
}
int main(void)
{
struct sockaddr_vm addr = {
.svm_family = AF_VSOCK,
.svm_cid = VMADDR_CID_LOCAL,
.svm_port = 1234
};
socklen_t alen = sizeof(addr);
pthread_barrier_t barr;
pthread_t tid;
int s, c;
if ((errno = pthread_barrier_init(&barr, NULL, 2)))
die("pthread_barrier_init");
if (signal(SIGUSR1, handler) == SIG_ERR)
die("signal");
s = socket(AF_VSOCK, SOCK_STREAM, 0);
if (s < 0)
die("socket s");
set_accept_timeout(s);
if (bind(s, (struct sockaddr *)&addr, alen))
die("bind");
if (listen(s, 64))
die("listen");
if ((errno = pthread_create(&tid, NULL, killer, &barr)))
die("pthread_create");
for (;;) {
c = socket(AF_VSOCK, SOCK_STREAM, 0);
if (c < 0)
die("socket c");
barrier(&barr);
if (connect(c, (struct sockaddr *)&addr, sizeof(addr)) &&
errno == EINTR) {
printf("connect: EINTR\n");
break;
}
barrier(&barr);
close(c);
flush_accept(s);
}
if ((errno = pthread_cancel(tid)))
die("pthread_cancel");
if ((errno = pthread_join(tid, NULL)))
die("pthread_join");
flush_accept(s);
set_connect_timeout(c);
if (connect(c, (struct sockaddr *)&addr, sizeof(addr)))
die("re-connect");
printf("okay?\n");
return 0;
}
Powered by blists - more mailing lists