lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAH5Ym4is+dfE4Td8cuA0s4_4fkGNy+X1R_N+nSZDzSjYfxNBfg@mail.gmail.com>
Date: Mon, 5 Jan 2026 22:52:29 -0800
From: Sam Edwards <cfsworks@...il.com>
To: Viacheslav Dubeyko <Slava.Dubeyko@....com>
Cc: Xiubo Li <xiubli@...hat.com>, "idryomov@...il.com" <idryomov@...il.com>, 
	Milind Changire <mchangir@...hat.com>, "stable@...r.kernel.org" <stable@...r.kernel.org>, 
	"ceph-devel@...r.kernel.org" <ceph-devel@...r.kernel.org>, "brauner@...nel.org" <brauner@...nel.org>, 
	"jlayton@...nel.org" <jlayton@...nel.org>, 
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 3/5] ceph: Free page array when ceph_submit_write fails

On Mon, Jan 5, 2026 at 1:09 PM Viacheslav Dubeyko <Slava.Dubeyko@....com> wrote:
>
> On Tue, 2025-12-30 at 18:43 -0800, Sam Edwards wrote:
> > If `locked_pages` is zero, the page array must not be allocated:
> > ceph_process_folio_batch() uses `locked_pages` to decide when to
> > allocate `pages`,
> >
>

Hi Slava,

> I don't quite follow how this statement is relevant to the issue. If
> `locked_pages` is zero, then ceph_submit_write() will not to be called. Do I
> miss something here?

That statement is only informing that ceph_process_folio_batch() will
BUG() when locked_pages == 0 && pages != NULL. It establishes why
`pages` must be freed/NULLed before the next iteration of
ceph_writepages_start()'s loop (which sets locked_pages = 0).

>
> > and redundant allocations trigger
> > ceph_allocate_page_array()'s BUG_ON(), resulting in a worker oops (and
> > writeback stall) or even a kernel panic. Consequently, the main loop in
> > ceph_writepages_start() assumes that the lifetime of `pages` is confined
> > to a single iteration.
>
> It will be great to see the reproducer script or application and call trace of
> the issue. Could you please share the reproduction path and the call trace of
> the issue?

It's difficult to reproduce organically. It arises when
`!ceph_inc_osd_stopping_blocker(fsc->mdsc)`, which I understand can
only happen in a race. I used the fault injection framework to force
ceph_inc_osd_stopping_blocker() to fail.

The call trace is disinteresting. See my reply to your comments on
patch 1: it's the same trace.

>
> >
> > The ceph_submit_write() function claims ownership of the page array on
> > success.
> >
>
> As far as I can see, writepages_finish() should free the page array on success.

That's my understanding too; by "claims ownership of the page array" I
only mean that ceph_writepages_start() isn't responsible for cleaning
it up, once it calls ceph_submit_write().

>
> > But failures only redirty/unlock the pages and fail to free the
> > array, making the failure case in ceph_submit_write() fatal.
> >
> > Free the page array in ceph_submit_write()'s error-handling 'if' block
> > so that the caller's invariant (that the array does not outlive the
> > iteration) is maintained unconditionally, allowing failures in
> > ceph_submit_write() to be recoverable as originally intended.
> >
> > Fixes: 1551ec61dc55 ("ceph: introduce ceph_submit_write() method")
> > Cc: stable@...r.kernel.org
> > Signed-off-by: Sam Edwards <CFSworks@...il.com>
> > ---
> >  fs/ceph/addr.c | 7 +++++++
> >  1 file changed, 7 insertions(+)
> >
> > diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
> > index 2b722916fb9b..91cc43950162 100644
> > --- a/fs/ceph/addr.c
> > +++ b/fs/ceph/addr.c
> > @@ -1466,6 +1466,13 @@ int ceph_submit_write(struct address_space *mapping,
> >                       unlock_page(page);
> >               }
> >
> > +             if (ceph_wbc->from_pool) {
> > +                     mempool_free(ceph_wbc->pages, ceph_wb_pagevec_pool);
> > +                     ceph_wbc->from_pool = false;
> > +             } else
> > +                     kfree(ceph_wbc->pages);
> > +             ceph_wbc->pages = NULL;
>
> Probably, it makes sense to introduce a method ceph_free_page_array likewise to
> __ceph_allocate_page_array() and to use for freeing page array in all places.

I like the suggestion but not the name. Instead of
ceph_free_page_array(), it should probably be called
ceph_discard_page_array(), because it is also redirtying the pages and
must not be used after successful writeback. (To me, "free" implies
success while "discard" implies failure.)

> Could ceph_wbc->locked_pages be greater than zero but ceph_wbc->pages == NULL?

ceph_wbc->locked_pages is the current array index into
ceph_wbc->pages, so they both need to be reset sometime before the
next iteration of ceph_writepages_start()'s loop.

Warm regards,
Sam



>
> Thanks,
> Slava.
>
> > +
> >               ceph_osdc_put_request(req);
> >               return -EIO;
> >       }

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ