linux-kernel - [RFC PATCH] libceph: Handle sparse-read replies lacking data length

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <20260113033113.149842-1-CFSworks@gmail.com>
Date: Mon, 12 Jan 2026 19:31:13 -0800
From: Sam Edwards <cfsworks@...il.com>
To: Xiubo Li <xiubli@...hat.com>,
	Ilya Dryomov <idryomov@...il.com>,
	Jeff Layton <jlayton@...nel.org>
Cc: ceph-devel@...r.kernel.org,
	linux-kernel@...r.kernel.org,
	Sam Edwards <CFSworks@...il.com>
Subject: [RFC PATCH] libceph: Handle sparse-read replies lacking data length

When the OSD replies to a sparse-read request, but no extents matched
the read (because the object is empty, the read requested a region
backed by no extents, ...) it is expected to reply with two 32-bit
zeroes: one indicating that there are no extents, the other that the
total bytes read is zero.

In certain circumstances (e.g. on Ceph 19.2.3, when the requested object
is in an EC pool), the OSD sends back only one 32-bit zero. The
sparse-read state machine will end up reading something else (such as
the data CRC in the footer) and get stuck in a retry loop like:

  libceph:  [0] got 0 extents
  libceph: data len 142248331 != extent len 0
  libceph: osd0 (1)...:6801 socket error on read
  libceph: data len 142248331 != extent len 0
  libceph: osd0 (1)...:6801 socket error on read

This is probably a bug in the OSD, but even so, the kernel must handle
it to avoid misinterpreting replies and entering a retry loop.

Detect this condition when the extent count is zero by checking the
`payload_len` field of the op reply. If it is only big enough for the
extent count, conclude that the data length is omitted and skip to the
next op (which is what the state machine would have done immediately
upon reading and validating the data length, if it were present).

---

Hi list,

RFC: This patch is submitted for comment only. I've tested it for about
2 weeks now and am satisfied that it prevents the hang, but the current
approach decodes the entire op reply body while still in the
data-gathering step, which is suboptimal; feedback on cleaner
alternatives is welcome!

I have not searched for nor opened a report with Ceph proper; I'd like a
second pair of eyes to confirm that this is indeed an OSD bug before I
proceed with that.

Reproducer (Ceph 19.2.3, CephFS with an EC pool already created):
  mount -o sparseread ... /mnt/cephfs
  cd /mnt/cephfs
  mkdir ec/
  setfattr -n ceph.dir.layout.pool -v 'cephfs-data-ecpool' ec/
  echo 'Hello world' > ec/sparsely-packed
  truncate -s 1048576 ec/sparsely-packed
  # Read from a hole-backed region via sparse read
  dd if=ec/sparsely-packed bs=16 skip=10000 count=1 iflag=direct | xxd
  # The read hangs and triggers the retry loop described in the patch

Hope this works,
Sam

PS: I would also like to write a pair of patches to our messenger v1/v2
clients to check explicitly that sparse reads consume exactly the number
of bytes in the data section, as I see there have already been previous
bugs (including CVE-2023-52636) where the sparse-read machinery gets out
of sync with the incoming TCP stream. Has this already been proposed?
---
 net/ceph/osd_client.c | 20 +++++++++++++++++++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
index 1a7be2f615dc..e9e898a2415f 100644
--- a/net/ceph/osd_client.c
+++ b/net/ceph/osd_client.c
@@ -5840,7 +5840,25 @@ static int osd_sparse_read(struct ceph_connection *con,
 			sr->sr_state = CEPH_SPARSE_READ_DATA_LEN;
 			break;
 		}
-		/* No extents? Read data len */
+
+		/*
+		 * No extents? Read data len (which we expect is 0) if present.
+		 *
+		 * Sometimes the OSD will omit this for zero-extent replies
+		 * (e.g. in Ceph 19.2.3 when the object is in an EC pool) which
+		 * is likely a bug in the OSD, but nonetheless we must handle
+		 * it to avoid misinterpreting the reply.
+		 */
+		struct MOSDOpReply m;
+		ret = decode_MOSDOpReply(con->in_msg, &m);
+		if (ret)
+			return ret;
+		if (m.outdata_len[o->o_sparse_op_idx] == sizeof(sr->sr_count)) {
+			dout("[%d] missing data length\n", o->o_osd);
+			sr->sr_state = CEPH_SPARSE_READ_HDR;
+			goto next_op;
+		}
+
 		fallthrough;
 	case CEPH_SPARSE_READ_DATA_LEN:
 		convert_extent_map(sr);
-- 
2.51.2