linux-kernel - Re: [RFC PATCH] libceph: Handle sparse-read replies lacking data length

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAOi1vP94Eruq7k10fnpA7G+LjEHdxxFvL4jnTeLMqfoxnjrTkw@mail.gmail.com>
Date: Tue, 13 Jan 2026 18:26:59 +0100
From: Ilya Dryomov <idryomov@...il.com>
To: Sam Edwards <cfsworks@...il.com>
Cc: Xiubo Li <xiubli@...hat.com>, Jeff Layton <jlayton@...nel.org>, ceph-devel@...r.kernel.org, 
	linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH] libceph: Handle sparse-read replies lacking data length

On Tue, Jan 13, 2026 at 4:31 AM Sam Edwards <cfsworks@...il.com> wrote:
>
> When the OSD replies to a sparse-read request, but no extents matched
> the read (because the object is empty, the read requested a region
> backed by no extents, ...) it is expected to reply with two 32-bit
> zeroes: one indicating that there are no extents, the other that the
> total bytes read is zero.
>
> In certain circumstances (e.g. on Ceph 19.2.3, when the requested object
> is in an EC pool), the OSD sends back only one 32-bit zero. The
> sparse-read state machine will end up reading something else (such as
> the data CRC in the footer) and get stuck in a retry loop like:
>
>   libceph:  [0] got 0 extents
>   libceph: data len 142248331 != extent len 0
>   libceph: osd0 (1)...:6801 socket error on read
>   libceph: data len 142248331 != extent len 0
>   libceph: osd0 (1)...:6801 socket error on read
>
> This is probably a bug in the OSD, but even so, the kernel must handle
> it to avoid misinterpreting replies and entering a retry loop.

Hi Sam,

Yes, this is definitely a bug in the OSD (and I also see another
related bug in the userspace client code above the OSD...).  The
triggering condition is a sparse read beyond the end of an existing
object on an EC pool.  19.2.3 isn't the problem -- main branch is
affected as well.

If this was one of the common paths, I'd support adding some sort of
a workaround to "handle" this in the kernel client.  However, sparse
reads are pretty useless on EC pools because they just get converted
into regular thick reads.  Sparse reads offer potential benefits only
on replicated pools, but the kernel client doesn't use them by default
there either.  The sparseread mount option that is necessary for the
reproducer to work isn't documented and was added purely for testing
purposes.

>
> Detect this condition when the extent count is zero by checking the
> `payload_len` field of the op reply. If it is only big enough for the
> extent count, conclude that the data length is omitted and skip to the
> next op (which is what the state machine would have done immediately
> upon reading and validating the data length, if it were present).
>
> ---
>
> Hi list,
>
> RFC: This patch is submitted for comment only. I've tested it for about
> 2 weeks now and am satisfied that it prevents the hang, but the current
> approach decodes the entire op reply body while still in the
> data-gathering step, which is suboptimal; feedback on cleaner
> alternatives is welcome!
>
> I have not searched for nor opened a report with Ceph proper; I'd like a
> second pair of eyes to confirm that this is indeed an OSD bug before I
> proceed with that.

Let me know if you want me to file a Ceph tracker ticket on your
behalf.  I have a draft patch for the bug in the OSD and would link it
in the PR, crediting you as a reporter.

>
> Reproducer (Ceph 19.2.3, CephFS with an EC pool already created):
>   mount -o sparseread ... /mnt/cephfs
>   cd /mnt/cephfs
>   mkdir ec/
>   setfattr -n ceph.dir.layout.pool -v 'cephfs-data-ecpool' ec/
>   echo 'Hello world' > ec/sparsely-packed
>   truncate -s 1048576 ec/sparsely-packed
>   # Read from a hole-backed region via sparse read
>   dd if=ec/sparsely-packed bs=16 skip=10000 count=1 iflag=direct | xxd
>   # The read hangs and triggers the retry loop described in the patch
>
> Hope this works,
> Sam
>
> PS: I would also like to write a pair of patches to our messenger v1/v2
> clients to check explicitly that sparse reads consume exactly the number
> of bytes in the data section, as I see there have already been previous
> bugs (including CVE-2023-52636) where the sparse-read machinery gets out
> of sync with the incoming TCP stream. Has this already been proposed?

Not that I'm aware of.  An additional safety net would be welcome as
long as it doesn't end up too invasive of course.

Thanks,

                Ilya