linux-kernel - Re: [PATCH v2 2/6] ceph: Remove error return from ceph_process_folio

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAH5Ym4jn8wg+mYGqKGb17OZGBkyDeX-Vx3wgfVT0cqPtn36QFQ@mail.gmail.com>
Date: Thu, 8 Jan 2026 16:29:56 -0800
From: Sam Edwards <cfsworks@...il.com>
To: Viacheslav Dubeyko <Slava.Dubeyko@....com>
Cc: Xiubo Li <xiubli@...hat.com>, "idryomov@...il.com" <idryomov@...il.com>, 
	Milind Changire <mchangir@...hat.com>, 
	"ceph-devel@...r.kernel.org" <ceph-devel@...r.kernel.org>, "brauner@...nel.org" <brauner@...nel.org>, 
	"jlayton@...nel.org" <jlayton@...nel.org>, 
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v2 2/6] ceph: Remove error return from ceph_process_folio_batch()

On Thu, Jan 8, 2026 at 12:08 PM Viacheslav Dubeyko
<Slava.Dubeyko@....com> wrote:
>
> On Wed, 2026-01-07 at 13:01 -0800, Sam Edwards wrote:
> > Following the previous patch, ceph_process_folio_batch() no longer
> > returns errors because the writeback loop cannot handle them.
> >
> > Since this function already indicates failure to lock any pages by
> > leaving `ceph_wbc.locked_pages == 0`, and the writeback loop has no way
> > to handle abandonment of a locked batch, change the return type of
> > ceph_process_folio_batch() to `void` and remove the pathological goto in
> > the writeback loop. The lack of a return code emphasizes that
> > ceph_process_folio_batch() is designed to be abort-free: that is, once
> > it commits a folio for writeback, it will not later abandon it or
> > propagate an error for that folio.
> >
> > Signed-off-by: Sam Edwards <CFSworks@...il.com>
> > ---
> >  fs/ceph/addr.c | 17 +++++------------
> >  1 file changed, 5 insertions(+), 12 deletions(-)
> >
> > diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
> > index 3462df35d245..2b722916fb9b 100644
> > --- a/fs/ceph/addr.c
> > +++ b/fs/ceph/addr.c
> > @@ -1283,16 +1283,16 @@ static inline int move_dirty_folio_in_page_array(struct address_space *mapping,
> >  }
> >
> >  static
> > -int ceph_process_folio_batch(struct address_space *mapping,
> > -                          struct writeback_control *wbc,
> > -                          struct ceph_writeback_ctl *ceph_wbc)
> > +void ceph_process_folio_batch(struct address_space *mapping,
> > +                           struct writeback_control *wbc,
> > +                           struct ceph_writeback_ctl *ceph_wbc)
> >  {
> >       struct inode *inode = mapping->host;
> >       struct ceph_fs_client *fsc = ceph_inode_to_fs_client(inode);
> >       struct ceph_client *cl = fsc->client;
> >       struct folio *folio = NULL;
> >       unsigned i;
> > -     int rc = 0;
> > +     int rc;
> >
> >       for (i = 0; can_next_page_be_processed(ceph_wbc, i); i++) {
> >               folio = ceph_wbc->fbatch.folios[i];
> > @@ -1322,12 +1322,10 @@ int ceph_process_folio_batch(struct address_space *mapping,
> >               rc = ceph_check_page_before_write(mapping, wbc,
> >                                                 ceph_wbc, folio);
> >               if (rc == -ENODATA) {
> > -                     rc = 0;
> >                       folio_unlock(folio);
> >                       ceph_wbc->fbatch.folios[i] = NULL;
> >                       continue;
> >               } else if (rc == -E2BIG) {
> > -                     rc = 0;
> >                       folio_unlock(folio);
> >                       ceph_wbc->fbatch.folios[i] = NULL;
> >                       break;
> > @@ -1369,7 +1367,6 @@ int ceph_process_folio_batch(struct address_space *mapping,
> >               rc = move_dirty_folio_in_page_array(mapping, wbc, ceph_wbc,
> >                               folio);
> >               if (rc) {
> > -                     rc = 0;
> >                       folio_redirty_for_writepage(wbc, folio);
> >                       folio_unlock(folio);
> >                       break;
> > @@ -1380,8 +1377,6 @@ int ceph_process_folio_batch(struct address_space *mapping,
> >       }
> >
> >       ceph_wbc->processed_in_fbatch = i;
> > -
> > -     return rc;
> >  }
> >
> >  static inline
> > @@ -1685,10 +1680,8 @@ static int ceph_writepages_start(struct address_space *mapping,
> >                       break;
> >
> >  process_folio_batch:
> > -             rc = ceph_process_folio_batch(mapping, wbc, &ceph_wbc);
> > +             ceph_process_folio_batch(mapping, wbc, &ceph_wbc);
> >               ceph_shift_unused_folios_left(&ceph_wbc.fbatch);
> > -             if (rc)
> > -                     goto release_folios;
> >
> >               /* did we get anything? */
> >               if (!ceph_wbc.locked_pages)
>
>  I don't see the point of removing the returning error code from the function. I
> prefer to have it because this method calls other ones with complex
> functionality.

Hey Slava,

I've tried to clarify this patch a few ways now, but I still haven't
seen actionable technical feedback, only misunderstanding or
preference rather than concrete issues. I'm making one last, detailed
attempt to explain the rationale; if it still isn't clear, I believe
our only remaining option is to involve another maintainer to help
complete the review.

As the function ceph_process_folio_batch() exists today (i.e. in
Linus's tree), the API allows for 3 different possible outcomes (names
my own, for illustrative purposes):
- ERROR (rc != 0): The function returns an error code, maybe after
already allocating the page array and locking some pages, maybe not.
It's ambiguous.
- EMPTY (!rc && !locked_pages): The function executes normally, but
does not lock any folios. It may do this if it encounters a problem on
the first folio.
- LOCKED (!rc && locked_pages >= 1): The function executes to
completion without erroring after locking at least one folio. It may
have encountered errors on subsequent folios, but it handles these by
flushing the batch and trying again next time it's called.

The ceph_writepages_start() loop that calls it, today, does not
differentiate between the ERROR and EMPTY result:

if (rc)
    goto release_folios;

/* did we get anything? */
if (!ceph_wbc.locked_pages)
    goto release_folios;

So right now it's just a redundant degree of freedom. Standard
software engineering principles dictate that a function's API should
not create complexity beyond the actual needs of its caller. So either
we remove EMPTY... or remove ERROR.

The current problem with ERROR is that it (often, not always) results
in a crash: the loop will come around without freeing the page array
and violate the expectations of the next iteration, and we violate the
BUG_ON() in ceph_allocate_page_array(). The loop is, today, not
written in a way that "aborting" a batch is ever appropriate; it
assumes the page array only exists when LOCKED. We must either add
logic to the caller to fix it... or remove ERROR.

The only way that the ERROR result could occur today is if fscrypt is
enabled, the write storm bug is fixed, and a non-first folio fails its
bounce page allocation. That means that it isn't possible now, and it
wouldn't be made possible by applying this patchset. The "no dead
code" rule requires that we either add some logic to
ceph_allocate_page_array() that can actually result in ERROR... or
remove ERROR.

Finally, there is a big advantage to removing the return code from
ceph_process_folio_batch(): it uses C's type system to enforce the
guarantee that it doesn't abandon the page array. (Note! That does not
mean that ceph_process_folio_batch() is not allowed to encounter
failures -- I fully agree that it's a complex function and it may have
more fallible logic added in the future -- just that it is responsible
for cleaning up after those failures. Put simply, the lack of a return
code gives it no way for it to make the cleanup the caller's problem!)

> And I would like still save the path of returning error code from
> the function. So, I don't agree with this patch.

It can always be brought back in the future if something genuinely
needs it (such as a hypothetical "major failure" in
ceph_process_folio_batch() that needs to fail the entire writeback,
not merely retry). But Linux doesn't keep "vestigial" code around for
hypothetical future needs. If there isn't a concrete reason to add or
use it now, then there's no reason to preserve it.

Hope this helps,
Sam

>
> Thanks,
> Slava.