linux-kernel - RE: [PATCH] fs/ceph/addr: always call ceph_shift_unused_folios

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <9af154da6bc21654135631d1b5040dcdb97d9e3f.camel@ibm.com>
Date: Thu, 28 Aug 2025 19:08:27 +0000
From: Viacheslav Dubeyko <Slava.Dubeyko@....com>
To: "idryomov@...il.com" <idryomov@...il.com>
CC: "stable@...r.kernel.org" <stable@...r.kernel.org>,
        "max.kellermann@...os.com" <max.kellermann@...os.com>,
        "ceph-devel@...r.kernel.org" <ceph-devel@...r.kernel.org>,
        Xiubo Li
	<xiubli@...hat.com>,
        "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>,
        Alex Markuze <amarkuze@...hat.com>
Subject: RE: [PATCH] fs/ceph/addr: always call ceph_shift_unused_folios_left()

On Thu, 2025-08-28 at 21:05 +0200, Ilya Dryomov wrote:
> On Thu, Aug 28, 2025 at 8:55 PM Viacheslav Dubeyko
> <Slava.Dubeyko@....com> wrote:
> > 
> > On Wed, 2025-08-27 at 20:17 +0200, Max Kellermann wrote:
> > > The function ceph_process_folio_batch() sets folio_batch entries to
> > > NULL, which is an illegal state.  Before folio_batch_release() crashes
> > > due to this API violation, the function
> > > ceph_shift_unused_folios_left() is supposed to remove those NULLs from
> > > the array.
> > > 
> > > However, since commit ce80b76dd327 ("ceph: introduce
> > > ceph_process_folio_batch() method"), this shifting doesn't happen
> > > anymore because the "for" loop got moved to
> > > ceph_process_folio_batch(), and now the `i` variable that remains in
> > > ceph_writepages_start() doesn't get incremented anymore, making the
> > > shifting effectively unreachable much of the time.
> > > 
> > > Later, commit 1551ec61dc55 ("ceph: introduce ceph_submit_write()
> > > method") added more preconditions for doing the shift, replacing the
> > > `i` check (with something that is still just as broken):
> > > 
> > > - if ceph_process_folio_batch() fails, shifting never happens
> > > 
> > > - if ceph_move_dirty_page_in_page_array() was never called (because
> > >   ceph_process_folio_batch() has returned early for some of various
> > >   reasons), shifting never happens
> > > 
> > > - if `processed_in_fbatch` is zero (because ceph_process_folio_batch()
> > >   has returned early for some of the reasons mentioned above or
> > >   because ceph_move_dirty_page_in_page_array() has failed), shifting
> > >   never happens
> > > 
> > > Since those two commits, any problem in ceph_process_folio_batch()
> > > could crash the kernel, e.g. this way:
> > > 
> > >  BUG: kernel NULL pointer dereference, address: 0000000000000034
> > >  #PF: supervisor write access in kernel mode
> > >  #PF: error_code(0x0002) - not-present page
> > >  PGD 0 P4D 0
> > >  Oops: Oops: 0002 [#1] SMP NOPTI
> > >  CPU: 172 UID: 0 PID: 2342707 Comm: kworker/u778:8 Not tainted 6.15.10-cm4all1-es #714 NONE
> > >  Hardware name: Dell Inc. PowerEdge R7615/0G9DHV, BIOS 1.6.10 12/08/2023
> > >  Workqueue: writeback wb_workfn (flush-ceph-1)
> > >  RIP: 0010:folios_put_refs+0x85/0x140
> > >  Code: 83 c5 01 39 e8 7e 76 48 63 c5 49 8b 5c c4 08 b8 01 00 00 00 4d 85 ed 74 05 41 8b 44 ad 00 48 8b 15 b0 >
> > >  RSP: 0018:ffffb880af8db778 EFLAGS: 00010207
> > >  RAX: 0000000000000001 RBX: 0000000000000000 RCX: 0000000000000003
> > >  RDX: ffffe377cc3b0000 RSI: 0000000000000000 RDI: ffffb880af8db8c0
> > >  RBP: 0000000000000000 R08: 000000000000007d R09: 000000000102b86f
> > >  R10: 0000000000000001 R11: 00000000000000ac R12: ffffb880af8db8c0
> > >  R13: 0000000000000000 R14: 0000000000000000 R15: ffff9bd262c97000
> > >  FS:  0000000000000000(0000) GS:ffff9c8efc303000(0000) knlGS:0000000000000000
> > >  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > >  CR2: 0000000000000034 CR3: 0000000160958004 CR4: 0000000000770ef0
> > >  PKRU: 55555554
> > >  Call Trace:
> > >   <TASK>
> > >   ceph_writepages_start+0xeb9/0x1410
> > > 
> > > The crash can be reproduced easily by changing the
> > > ceph_check_page_before_write() return value to `-E2BIG`.
> > > 
> > 
> > I cannot reproduce the crash/issue. If ceph_check_page_before_write() returns
> > `-E2BIG`, then nothing happens. There is no crush and no write operations could
> > be processed by file system driver anymore. So, it doesn't look like recipe to
> > reproduce the issue. I cannot confirm that the patch fixes the issue without
> > clear way to reproduce the issue.
> > 
> > Could you please provide more clear explanation of the issue reproduction path?
> 
> Hi Slava,
> 
> Was this bit taken into account?
> 
>   (Interestingly, the crash happens only if `huge_zero_folio` has
>   already been allocated; without `huge_zero_folio`,
>   is_huge_zero_folio(NULL) returns true and folios_put_refs() skips NULL
>   entries instead of dereferencing them.  That makes reproducing the bug
>   somewhat unreliable.  See
>   https://lore.kernel.org/20250826231626.218675-1-max.kellermann@ionos.com  
>   for a discussion of this detail.)
> 
> 
Hi Ilya,

And which practical step of actions do you see to repeat and reproduce it? :)

Thanks,
Slava.