netdev - [PATCH AUTOSEL 6.16-5.4] vhost: fail early when __vhost_add

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [day] [month] [year] [list]

Message-Id: <20250805130945.471732-20-sashal@kernel.org>
Date: Tue,  5 Aug 2025 09:08:55 -0400
From: Sasha Levin <sashal@...nel.org>
To: patches@...ts.linux.dev,
	stable@...r.kernel.org
Cc: Jason Wang <jasowang@...hat.com>,
	Eugenio Pérez <eperezma@...hat.com>,
	"Michael S . Tsirkin" <mst@...hat.com>,
	Lei Yang <leiyang@...hat.com>,
	Sasha Levin <sashal@...nel.org>,
	kvm@...r.kernel.org,
	virtualization@...ts.linux.dev,
	netdev@...r.kernel.org
Subject: [PATCH AUTOSEL 6.16-5.4] vhost: fail early when __vhost_add_used() fails

From: Jason Wang <jasowang@...hat.com>

[ Upstream commit b4ba1207d45adaafa2982c035898b36af2d3e518 ]

This patch fails vhost_add_used_n() early when __vhost_add_used()
fails to make sure used idx is not updated with stale used ring
information.

Reported-by: Eugenio Pérez <eperezma@...hat.com>
Signed-off-by: Jason Wang <jasowang@...hat.com>
Message-Id: <20250714084755.11921-2-jasowang@...hat.com>
Signed-off-by: Michael S. Tsirkin <mst@...hat.com>
Tested-by: Lei Yang <leiyang@...hat.com>
Signed-off-by: Sasha Levin <sashal@...nel.org>
---

LLM Generated explanations, may be completely bogus:

**Backport Status: YES**

This commit should be backported to stable kernel trees. Here's my
extensive analysis:

## Bug Analysis

The commit fixes a critical error handling bug in `vhost_add_used_n()`
where the function continues execution even after `__vhost_add_used_n()`
fails on line 2776. Looking at the code:

1. **The Bug**: In the original code at drivers/vhost/vhost.c:2776, when
   the second call to `__vhost_add_used_n()` fails and returns an error,
   the function doesn't immediately return. Instead, it continues to:
   - Execute memory barrier (`smp_wmb()`) at line 2779
   - Update the used index via `vhost_put_used_idx()` at line 2780
   - Potentially log the used index update at lines 2784-2791

2. **Impact**: This means the vhost driver updates the used ring index
   even when the actual used ring entries weren't successfully written.
   This creates a **data corruption scenario** where:
   - The guest sees an updated used index
   - But the corresponding used ring entries contain stale/invalid data
   - This can lead to guest crashes, data corruption, or unpredictable
     behavior

3. **The Fix**: The patch adds a simple but crucial check at lines
   2778-2779 (after applying):
  ```c
  if (r < 0)
  return r;
  ```
  This ensures the function returns immediately upon failure, preventing
  the index from being updated with invalid ring state.

## Stable Backport Criteria Assessment

1. **Bug Fix**: ✓ This fixes a real bug that can cause data corruption
   in vhost operations
2. **Small and Contained**: ✓ The fix is only 3 lines of code -
   extremely minimal
3. **No Side Effects**: ✓ The change only adds proper error handling, no
   behavioral changes for success cases
4. **No Architectural Changes**: ✓ Simple error check addition, no
   design changes
5. **Critical Subsystem**: ✓ vhost is used for virtualization (virtio
   devices), affecting VMs and containers
6. **Clear Bug Impact**: ✓ Data corruption in guest-host communication
   is a serious issue
7. **Follows Stable Rules**: ✓ Important bugfix with minimal regression
   risk

## Additional Evidence

- The bug was reported by Eugenio Pérez from Red Hat, indicating it was
  found in production/testing environments
- The fix has been tested (as indicated by "Tested-by: Lei Yang")
- The function `__vhost_add_used_n()` can fail with -EFAULT when
  `vhost_put_used()` fails (line 2738-2740)
- The first call to `__vhost_add_used_n()` already has proper error
  handling (lines 2770-2772), making this an inconsistency bug

This is a textbook example of a stable backport candidate: a small,
obvious fix for a real bug that can cause data corruption in a critical
kernel subsystem.

 drivers/vhost/vhost.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 3a5ebb973dba..d1d3912f4804 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -2775,6 +2775,9 @@ int vhost_add_used_n(struct vhost_virtqueue *vq, struct vring_used_elem *heads,
 	}
 	r = __vhost_add_used_n(vq, heads, count);

+	if (r < 0)
+		return r;
+
 	/* Make sure buffer is written before we update index. */
 	smp_wmb();
 	if (vhost_put_used_idx(vq)) {
-- 
2.39.5