lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAOi1vP99j77p42xRtQSRroVreRoaJoe9a8Rms8se5-2d1YSKHg@mail.gmail.com>
Date: Tue, 13 Jan 2026 21:15:09 +0100
From: Ilya Dryomov <idryomov@...il.com>
To: Sam Edwards <cfsworks@...il.com>
Cc: Xiubo Li <xiubli@...hat.com>, Jeff Layton <jlayton@...nel.org>, ceph-devel@...r.kernel.org, 
	linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH] libceph: Handle sparse-read replies lacking data length

On Tue, Jan 13, 2026 at 8:04 PM Sam Edwards <cfsworks@...il.com> wrote:
>
> On Tue, Jan 13, 2026 at 9:27 AM Ilya Dryomov <idryomov@...il.com> wrote:
> >
> > On Tue, Jan 13, 2026 at 4:31 AM Sam Edwards <cfsworks@...il.com> wrote:
> > >
> > > When the OSD replies to a sparse-read request, but no extents matched
> > > the read (because the object is empty, the read requested a region
> > > backed by no extents, ...) it is expected to reply with two 32-bit
> > > zeroes: one indicating that there are no extents, the other that the
> > > total bytes read is zero.
> > >
> > > In certain circumstances (e.g. on Ceph 19.2.3, when the requested object
> > > is in an EC pool), the OSD sends back only one 32-bit zero. The
> > > sparse-read state machine will end up reading something else (such as
> > > the data CRC in the footer) and get stuck in a retry loop like:
> > >
> > >   libceph:  [0] got 0 extents
> > >   libceph: data len 142248331 != extent len 0
> > >   libceph: osd0 (1)...:6801 socket error on read
> > >   libceph: data len 142248331 != extent len 0
> > >   libceph: osd0 (1)...:6801 socket error on read
> > >
> > > This is probably a bug in the OSD, but even so, the kernel must handle
> > > it to avoid misinterpreting replies and entering a retry loop.
> >
> > Hi Sam,
> >
>
> Hey Ilya,
>
> > Yes, this is definitely a bug in the OSD (and I also see another
> > related bug in the userspace client code above the OSD...).  The
> > triggering condition is a sparse read beyond the end of an existing
> > object on an EC pool.  19.2.3 isn't the problem -- main branch is
> > affected as well.
> >
> > If this was one of the common paths, I'd support adding some sort of
> > a workaround to "handle" this in the kernel client.  However, sparse
> > reads are pretty useless on EC pools because they just get converted
> > into regular thick reads.  Sparse reads offer potential benefits only
> > on replicated pools, but the kernel client doesn't use them by default
> > there either.  The sparseread mount option that is necessary for the
> > reproducer to work isn't documented and was added purely for testing
> > purposes.
>
> Note that the kernel client forces sparse reads when using fscrypt
> (see linux-6.18/fs/ceph/addr.c:361) and I encountered this problem
> organically as a result. It may still make sense to apply a kernel
> workaround.
>
> On the other hand, it sounds like fscrypt+EC is a niche corner case,
> we've now established that the OSD is definitely not following the
> protocol, and working around this client-side is more involved than
> just fixing this in the OSD. So I think simply telling affected users
> to update their OSDs is also a reasonable way to handle this.

fscrypt and EC can't be mixed -- fscrypt+EC doesn't really work.  The
reason sparse reads are forced for fscrypt is that the client relies on
the sparseness metadata to be able tell if a given 4K block in the
encrypted file is a hole (in the PUNCH_HOLE sense) or not.  If it's
a hole, POSIX dictates that a read should return zeroes.  On an EC pool
where sparse reads are degraded into regular thick reads by the OSD,
a hole in the middle of an object wouldn't ever be signaled.  Instead,
the OSD would synthesize a bunch of zeroes and pass them to the client.
The client would then run them through the crypto engine (believing
it's a bona fide ciphertext) and return the resulting gibberish to the
user, thus violating POSIX and widespread assumptions about generic
filesystem behavior.

>
> I'll defer to you.
>
> >
> > >
> > > Detect this condition when the extent count is zero by checking the
> > > `payload_len` field of the op reply. If it is only big enough for the
> > > extent count, conclude that the data length is omitted and skip to the
> > > next op (which is what the state machine would have done immediately
> > > upon reading and validating the data length, if it were present).
> > >
> > > ---
> > >
> > > Hi list,
> > >
> > > RFC: This patch is submitted for comment only. I've tested it for about
> > > 2 weeks now and am satisfied that it prevents the hang, but the current
> > > approach decodes the entire op reply body while still in the
> > > data-gathering step, which is suboptimal; feedback on cleaner
> > > alternatives is welcome!
> > >
> > > I have not searched for nor opened a report with Ceph proper; I'd like a
> > > second pair of eyes to confirm that this is indeed an OSD bug before I
> > > proceed with that.
> >
> > Let me know if you want me to file a Ceph tracker ticket on your
> > behalf.  I have a draft patch for the bug in the OSD and would link it
> > in the PR, crediting you as a reporter.
>
> Please do! I'm also interested in seeing the patch -- the OSD code is
> pretty dense and I couldn't find the EC sparse read handler.

https://github.com/ceph/ceph/pull/66912

Thanks,

                Ilya

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ