linux-kernel - [PATCH] virtio: Workaround fix for hard hang on guest using fifos

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <20220325172036.3f8f619e@gandalf.local.home>
Date:   Fri, 25 Mar 2022 17:20:36 -0400
From:   Steven Rostedt <rostedt@...dmis.org>
To:     LKML <linux-kernel@...r.kernel.org>
Cc:     Amit Shah <amit@...nel.org>, Arnd Bergmann <arnd@...db.de>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        virtualization@...ts.linux-foundation.org,
        Linus Torvalds <torvalds@...ux-foundation.org>
Subject: [PATCH] virtio: Workaround fix for hard hang on guest using fifos

This is more of a workaround patch and probably not the proper fix. But
I'm doing some work that is using fifos for guests and this is causing a
hang that is quite annoying.

I currently have this patch applied to continue my work.

I was working on analyzing data transfers between host and guests via
virtio sockets (FIFOs on host, dev on guest), vsockets and TCP packets.

I wrote a program to test each by passing a 1GB file and timing it. I'm
using the splice system call to help move things along. In doing so, I
found that my pipe between splice calls originally used "page_size" for
data transfer, and that is not as efficient as finding out what the pipe
size is. So I changed the code to use pipe_size and while debugging it, the
guest locked up hard.

I'm attaching the "agent-fifo" that runs on the guest, and the
"client-fifo" that runs on the host (the names may be backwards, but
makes sense when you add how I test vsockets and network packets).

Here's what I did:

<host> # ./client-fifo /var/lib/virt/Guest/trace-pipe-cpu0.out /test/bigfile

Where the trace-pipe-cpu0.out is the receiving side from the guest's virtio
pipe. The /test/bigfile is created when data starts coming in from the
guest pipe.

<guest> # dd if=/dev/urandom of=bigfile bs=1024 count=1048576
<guest> # ./agent-fifo /dev/virtio-ports/trace-pipe-cpu0 bigfile

With the updates to change the size being passed in the splice from
page_size to pipe_size, this never finished (it would copy around a meg or
so). And stopped. When I killed the agent-fifo task on the guest, the guest
hung hard.

Debugging this, I found that the guest is stuck in the loop in
drivers/char/virt_console.c: __send_control_msg():

	if (virtqueue_add_outbuf(vq, sg, 1, &portdev->cpkt, GFP_ATOMIC) == 0) {
		virtqueue_kick(vq);
		while (!virtqueue_get_buf(vq, &len)
			&& !virtqueue_is_broken(vq))
			cpu_relax();
	}

It never exits that loop. My workaround (this patch) is to put in a
timeout, and exit out if it spins there for more than 5 seconds. This
makes the problem go away.

Below is my changes, but this is a band-aid, it is not the cure.

Workaround-fix-by: Steven Rostedt (Google) <rostedt@...dmis.org>
---
diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
index e3c430539a17..65f259f3f8cb 100644
--- a/drivers/char/virtio_console.c
+++ b/drivers/char/virtio_console.c
@@ -551,6 +551,7 @@ static ssize_t __send_control_msg(struct ports_device *portdev, u32 port_id,
 	struct scatterlist sg[1];
 	struct virtqueue *vq;
 	unsigned int len;
+	u64 end;

 	if (!use_multiport(portdev))
 		return 0;
@@ -567,9 +568,15 @@ static ssize_t __send_control_msg(struct ports_device *portdev, u32 port_id,

 	if (virtqueue_add_outbuf(vq, sg, 1, &portdev->cpkt, GFP_ATOMIC) == 0) {
 		virtqueue_kick(vq);
+		end = jiffies + 5 * HZ;
 		while (!virtqueue_get_buf(vq, &len)
-			&& !virtqueue_is_broken(vq))
+			&& !virtqueue_is_broken(vq)) {
+			if (unlikely(end < jiffies)) {
+				dev_warn(&portdev->vdev->dev, "send_control_msg timed out!\n");
+				break;
+			}
 			cpu_relax();
+		}
 	}

 	spin_unlock(&portdev->c_ovq_lock);

View attachment "agent-fifo.c" of type "text/x-c++src" (2697 bytes)

View attachment "client-fifo.c" of type "text/x-c++src" (2401 bytes)