[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220325172036.3f8f619e@gandalf.local.home>
Date: Fri, 25 Mar 2022 17:20:36 -0400
From: Steven Rostedt <rostedt@...dmis.org>
To: LKML <linux-kernel@...r.kernel.org>
Cc: Amit Shah <amit@...nel.org>, Arnd Bergmann <arnd@...db.de>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
virtualization@...ts.linux-foundation.org,
Linus Torvalds <torvalds@...ux-foundation.org>
Subject: [PATCH] virtio: Workaround fix for hard hang on guest using fifos
This is more of a workaround patch and probably not the proper fix. But
I'm doing some work that is using fifos for guests and this is causing a
hang that is quite annoying.
I currently have this patch applied to continue my work.
I was working on analyzing data transfers between host and guests via
virtio sockets (FIFOs on host, dev on guest), vsockets and TCP packets.
I wrote a program to test each by passing a 1GB file and timing it. I'm
using the splice system call to help move things along. In doing so, I
found that my pipe between splice calls originally used "page_size" for
data transfer, and that is not as efficient as finding out what the pipe
size is. So I changed the code to use pipe_size and while debugging it, the
guest locked up hard.
I'm attaching the "agent-fifo" that runs on the guest, and the
"client-fifo" that runs on the host (the names may be backwards, but
makes sense when you add how I test vsockets and network packets).
Here's what I did:
<host> # ./client-fifo /var/lib/virt/Guest/trace-pipe-cpu0.out /test/bigfile
Where the trace-pipe-cpu0.out is the receiving side from the guest's virtio
pipe. The /test/bigfile is created when data starts coming in from the
guest pipe.
<guest> # dd if=/dev/urandom of=bigfile bs=1024 count=1048576
<guest> # ./agent-fifo /dev/virtio-ports/trace-pipe-cpu0 bigfile
With the updates to change the size being passed in the splice from
page_size to pipe_size, this never finished (it would copy around a meg or
so). And stopped. When I killed the agent-fifo task on the guest, the guest
hung hard.
Debugging this, I found that the guest is stuck in the loop in
drivers/char/virt_console.c: __send_control_msg():
if (virtqueue_add_outbuf(vq, sg, 1, &portdev->cpkt, GFP_ATOMIC) == 0) {
virtqueue_kick(vq);
while (!virtqueue_get_buf(vq, &len)
&& !virtqueue_is_broken(vq))
cpu_relax();
}
It never exits that loop. My workaround (this patch) is to put in a
timeout, and exit out if it spins there for more than 5 seconds. This
makes the problem go away.
Below is my changes, but this is a band-aid, it is not the cure.
Workaround-fix-by: Steven Rostedt (Google) <rostedt@...dmis.org>
---
diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
index e3c430539a17..65f259f3f8cb 100644
--- a/drivers/char/virtio_console.c
+++ b/drivers/char/virtio_console.c
@@ -551,6 +551,7 @@ static ssize_t __send_control_msg(struct ports_device *portdev, u32 port_id,
struct scatterlist sg[1];
struct virtqueue *vq;
unsigned int len;
+ u64 end;
if (!use_multiport(portdev))
return 0;
@@ -567,9 +568,15 @@ static ssize_t __send_control_msg(struct ports_device *portdev, u32 port_id,
if (virtqueue_add_outbuf(vq, sg, 1, &portdev->cpkt, GFP_ATOMIC) == 0) {
virtqueue_kick(vq);
+ end = jiffies + 5 * HZ;
while (!virtqueue_get_buf(vq, &len)
- && !virtqueue_is_broken(vq))
+ && !virtqueue_is_broken(vq)) {
+ if (unlikely(end < jiffies)) {
+ dev_warn(&portdev->vdev->dev, "send_control_msg timed out!\n");
+ break;
+ }
cpu_relax();
+ }
}
spin_unlock(&portdev->c_ovq_lock);
View attachment "agent-fifo.c" of type "text/x-c++src" (2697 bytes)
View attachment "client-fifo.c" of type "text/x-c++src" (2401 bytes)
Powered by blists - more mailing lists