lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <24adfb894c25531d342fdc20310ca9286d605e3d.camel@ibm.com>
Date: Mon, 5 Jan 2026 20:23:58 +0000
From: Viacheslav Dubeyko <Slava.Dubeyko@....com>
To: Xiubo Li <xiubli@...hat.com>, "idryomov@...il.com" <idryomov@...il.com>,
        "cfsworks@...il.com" <cfsworks@...il.com>
CC: Milind Changire <mchangir@...hat.com>,
        "stable@...r.kernel.org"
	<stable@...r.kernel.org>,
        "ceph-devel@...r.kernel.org"
	<ceph-devel@...r.kernel.org>,
        "brauner@...nel.org" <brauner@...nel.org>,
        "jlayton@...nel.org" <jlayton@...nel.org>,
        "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>
Subject: Re:  [PATCH 1/5] ceph: Do not propagate page array emplacement errors
 as batch errors

On Tue, 2025-12-30 at 18:43 -0800, Sam Edwards wrote:
> When fscrypt is enabled, move_dirty_folio_in_page_array() may fail
> because it needs to allocate bounce buffers to store the encrypted
> versions of each folio. Each folio beyond the first allocates its bounce
> buffer with GFP_NOWAIT. Failures are common (and expected) under this
> allocation mode; they should flush (not abort) the batch.
> 
> However, ceph_process_folio_batch() uses the same `rc` variable for its
> own return code and for capturing the return codes of its routine calls;
> failing to reset `rc` back to 0 results in the error being propagated
> out to the main writeback loop, which cannot actually tolerate any
> errors here: once `ceph_wbc.pages` is allocated, it must be passed to
> ceph_submit_write() to be freed. If it survives until the next iteration
> (e.g. due to the goto being followed), ceph_allocate_page_array()'s
> BUG_ON() will oops the worker. (Subsequent patches in this series make
> the loop more robust.)

I think you are right with the fix. We have the loop here and if we already
moved some dirty folios, then we should flush it. But what if we failed on the
first one folio, then should we return no error code in this case?

> 
> Note that this failure mode is currently masked due to another bug
> (addressed later in this series) that prevents multiple encrypted folios
> from being selected for the same write.

So, maybe, this patch has been not correctly placed in the order? It will be
good to see the reproduction of the issue and which symptoms we have for this
issue. Do you have the reproduction script and call trace of the issue?

> 
> For now, just reset `rc` when redirtying the folio and prevent the
> error from propagating. After this change, ceph_process_folio_batch() no
> longer returns errors; its only remaining failure indicator is
> `locked_pages == 0`, which the caller already handles correctly. The
> next patch in this series addresses this.
> 
> Fixes: ce80b76dd327 ("ceph: introduce ceph_process_folio_batch() method")
> Cc: stable@...r.kernel.org
> Signed-off-by: Sam Edwards <CFSworks@...il.com>
> ---
>  fs/ceph/addr.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
> index 63b75d214210..3462df35d245 100644
> --- a/fs/ceph/addr.c
> +++ b/fs/ceph/addr.c
> @@ -1369,6 +1369,7 @@ int ceph_process_folio_batch(struct address_space *mapping,
>  		rc = move_dirty_folio_in_page_array(mapping, wbc, ceph_wbc,
>  				folio);
>  		if (rc) {
> +			rc = 0;

I like the fix but I would like to clarify the above questions at first.

Thanks,
Slava. 

>  			folio_redirty_for_writepage(wbc, folio);
>  			folio_unlock(folio);
>  			break;

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ