linux-kernel - Re: [RFC PATCH] libceph: Handle sparse-read replies lacking data length

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <CAOi1vP_y1ZWKUFG92PKry=5xdTdi4704aKAHkV+OtkFnM5zR=g@mail.gmail.com>
Date: Wed, 14 Jan 2026 11:23:02 +0100
From: Ilya Dryomov <idryomov@...il.com>
To: Sam Edwards <cfsworks@...il.com>
Cc: Xiubo Li <xiubli@...hat.com>, Jeff Layton <jlayton@...nel.org>, ceph-devel@...r.kernel.org, 
	linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH] libceph: Handle sparse-read replies lacking data length

On Wed, Jan 14, 2026 at 2:28 AM Sam Edwards <cfsworks@...il.com> wrote:
>
> On Tue, Jan 13, 2026 at 12:15 PM Ilya Dryomov <idryomov@...il.com> wrote:
> >
> > On Tue, Jan 13, 2026 at 8:04 PM Sam Edwards <cfsworks@...il.com> wrote:
> > >
> > > On Tue, Jan 13, 2026 at 9:27 AM Ilya Dryomov <idryomov@...il.com> wrote:
> > > >
> > > > On Tue, Jan 13, 2026 at 4:31 AM Sam Edwards <cfsworks@...il.com> wrote:
> > > > >
> > > > > When the OSD replies to a sparse-read request, but no extents matched
> > > > > the read (because the object is empty, the read requested a region
> > > > > backed by no extents, ...) it is expected to reply with two 32-bit
> > > > > zeroes: one indicating that there are no extents, the other that the
> > > > > total bytes read is zero.
> > > > >
> > > > > In certain circumstances (e.g. on Ceph 19.2.3, when the requested object
> > > > > is in an EC pool), the OSD sends back only one 32-bit zero. The
> > > > > sparse-read state machine will end up reading something else (such as
> > > > > the data CRC in the footer) and get stuck in a retry loop like:
> > > > >
> > > > >   libceph:  [0] got 0 extents
> > > > >   libceph: data len 142248331 != extent len 0
> > > > >   libceph: osd0 (1)...:6801 socket error on read
> > > > >   libceph: data len 142248331 != extent len 0
> > > > >   libceph: osd0 (1)...:6801 socket error on read
> > > > >
> > > > > This is probably a bug in the OSD, but even so, the kernel must handle
> > > > > it to avoid misinterpreting replies and entering a retry loop.
> > > >
> > > > Hi Sam,
> > > >
> > >
> > > Hey Ilya,
> > >
> > > > Yes, this is definitely a bug in the OSD (and I also see another
> > > > related bug in the userspace client code above the OSD...).  The
> > > > triggering condition is a sparse read beyond the end of an existing
> > > > object on an EC pool.  19.2.3 isn't the problem -- main branch is
> > > > affected as well.
> > > >
> > > > If this was one of the common paths, I'd support adding some sort of
> > > > a workaround to "handle" this in the kernel client.  However, sparse
> > > > reads are pretty useless on EC pools because they just get converted
> > > > into regular thick reads.  Sparse reads offer potential benefits only
> > > > on replicated pools, but the kernel client doesn't use them by default
> > > > there either.  The sparseread mount option that is necessary for the
> > > > reproducer to work isn't documented and was added purely for testing
> > > > purposes.
> > >
> > > Note that the kernel client forces sparse reads when using fscrypt
> > > (see linux-6.18/fs/ceph/addr.c:361) and I encountered this problem
> > > organically as a result. It may still make sense to apply a kernel
> > > workaround.
> > >
> > > On the other hand, it sounds like fscrypt+EC is a niche corner case,
> > > we've now established that the OSD is definitely not following the
> > > protocol, and working around this client-side is more involved than
> > > just fixing this in the OSD. So I think simply telling affected users
> > > to update their OSDs is also a reasonable way to handle this.
> >
> > fscrypt and EC can't be mixed -- fscrypt+EC doesn't really work.  The
> > reason sparse reads are forced for fscrypt is that the client relies on
> > the sparseness metadata to be able tell if a given 4K block in the
> > encrypted file is a hole (in the PUNCH_HOLE sense) or not.  If it's
> > a hole, POSIX dictates that a read should return zeroes.  On an EC pool
> > where sparse reads are degraded into regular thick reads by the OSD,
> > a hole in the middle of an object wouldn't ever be signaled.  Instead,
> > the OSD would synthesize a bunch of zeroes and pass them to the client.
> > The client would then run them through the crypto engine (believing
> > it's a bona fide ciphertext) and return the resulting gibberish to the
> > user, thus violating POSIX and widespread assumptions about generic
> > filesystem behavior.
>
> Oof, thanks for the heads-up! Fortunately my workload tolerates
> garbage in holes... with the occasional (now-explained) warning, that
> is. :)
>
> I don't see the fscrypt+EC limitation mentioned in the kernel nor Ceph
> docs, so I'm guessing this is more a "known major limitation" than an
> out-of-scope use case.

Correct, it's tracked under https://tracker.ceph.com/issues/67507.

Thanks,

                Ilya