[<prev] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAMsNC+s1R-AUzhe80vjxYCSRu0X9Ybp33sSMHGHKpBL6=dG2_w@mail.gmail.com>
Date: Thu, 29 Jan 2026 11:31:43 +0800
From: Gerald Yang <gerald.yang@...onical.com>
To: Jan Kara <jack@...e.cz>
Cc: tytso@....edu, adilger.kernel@...ger.ca, linux-ext4@...r.kernel.org,
gerald.yang.tw@...il.com
Subject: Re: [PATCH] ext4: Fix call trace when remounting to read only in
data=journal mode
Thanks Jan for the review, originally this issue was observed during reboot
because the root filesystem is remounted to read only before shutdown to
make sure all data is flushed to disk.
We don't see any issue on the machine because the data is persisted to
journal. But I think your suggestion is the correct way to fix it, I
will look into
why ext4_writepages doesn't flush data to real file location after calling
sync_filesystem and re-submit the patch for review, thanks again.
On Wed, Jan 28, 2026 at 6:22 PM Jan Kara <jack@...e.cz> wrote:
>
> On Wed 28-01-26 15:45:15, Gerald Yang wrote:
> > When remounting the filesystem to read only in data=journal mode
> > it may dump the following call trace:
> >
> > [ 71.629350] CPU: 0 UID: 0 PID: 177 Comm: kworker/u96:5 Tainted: G E 6.19.0-rc7 #1 PREEMPT(voluntary)
> > [ 71.629352] Tainted: [E]=UNSIGNED_MODULE
> > [ 71.629353] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009)/LXD, BIOS unknown 2/2/2022
> > [ 71.629354] Workqueue: writeback wb_workfn (flush-7:4)
> > [ 71.629359] RIP: 0010:ext4_journal_check_start+0x8b/0xd0
> > [ 71.629360] Code: 31 ff 45 31 c0 45 31 c9 e9 42 ad c4 00 48 8b 5d f8 b8 fb ff ff ff c9 31 d2 31 c9 31 f6 31 ff 45 31 c0 45 31 c9 c3 cc cc cc cc <0f> 0b b8 e2 ff ff ff eb c2 0f 0b eb
> > a9 44 8b 42 08 68 c7 53 ce b8
> > [ 71.629361] RSP: 0018:ffffcf32c0fdf6a8 EFLAGS: 00010202
> > [ 71.629364] RAX: ffff8f08c8505000 RBX: ffff8f08c67ee800 RCX: 0000000000000000
> > [ 71.629366] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
> > [ 71.629367] RBP: ffffcf32c0fdf6b0 R08: 0000000000000001 R09: 0000000000000000
> > [ 71.629368] R10: ffff8f08db18b3a8 R11: 0000000000000000 R12: 0000000000000000
> > [ 71.629368] R13: 0000000000000002 R14: 0000000000000a48 R15: ffff8f08c67ee800
> > [ 71.629369] FS: 0000000000000000(0000) GS:ffff8f0a7d273000(0000) knlGS:0000000000000000
> > [ 71.629370] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 71.629371] CR2: 00007b66825905cc CR3: 000000011053d004 CR4: 0000000000772ef0
> > [ 71.629374] PKRU: 55555554
> > [ 71.629374] Call Trace:
> > [ 71.629378] <TASK>
> > [ 71.629382] __ext4_journal_start_sb+0x38/0x1c0
> > [ 71.629383] mpage_prepare_extent_to_map+0x4af/0x580
> > [ 71.629389] ? sbitmap_get+0x73/0x180
> > [ 71.629399] ext4_do_writepages+0x3cc/0x10a0
> > [ 71.629400] ? kvm_sched_clock_read+0x11/0x20
> > [ 71.629409] ext4_writepages+0xc8/0x1b0
> > [ 71.629410] ? ext4_writepages+0xc8/0x1b0
> > [ 71.629411] do_writepages+0xc4/0x180
> > [ 71.629416] __writeback_single_inode+0x45/0x350
> > [ 71.629419] ? _raw_spin_unlock+0xe/0x40
> > [ 71.629423] writeback_sb_inodes+0x260/0x5c0
> > [ 71.629425] ? __schedule+0x4d1/0x1870
> > [ 71.629429] __writeback_inodes_wb+0x54/0x100
> > [ 71.629431] ? queue_io+0x82/0x140
> > [ 71.629433] wb_writeback+0x1ab/0x330
> > [ 71.629448] wb_workfn+0x31d/0x410
> > [ 71.629450] process_one_work+0x191/0x3e0
> > [ 71.629455] worker_thread+0x2e3/0x420
> >
> > This issue can be easily reproduced by:
> > mkdir -p mnt
> > dd if=/dev/zero of=ext4disk bs=1G count=2 oflag=direct
> > mkfs.ext4 ext4disk
> > tune2fs -o journal_data ext4disk
> > mount ext4disk mnt
> > fio --name=fiotest --rw=randwrite --bs=4k --runtime=3 --ioengine=libaio --iodepth=128 --numjobs=4 --filename=mnt/fiotest --filesize=1G --group_reporting
> > mount -o remount,ro ext4disk mnt
> > sync
> >
> > In data=journal mode, metadata and data are both written to the journal
> > first, but for the second write, ext4 relies on the writeback thread to
> > flush the data to the real file location.
> >
> > After the filesystem is remounted to read only, writeback thread still
> > writes data to it and causes the issue. Return early to avoid starting
> > a journal transaction on a read only filesystem, once the filesystem
> > becomes writable again, the write thread will continue writing data.
> >
> > Signed-off-by: Gerald Yang <gerald.yang@...onical.com>
>
> Thanks for the report and the patch! I can indeed reproduce this warning.
> But the patch itself is certainly not the right fix for this problem.
> ext4_remount() must make sure there are no dirty pages on the filesystem
> anymore when remounting filesystem read only and it apparently fails to do
> so. In particular it calls sync_filesystem() which should make sure all
> data is written. So this bug needs more investigation why there are some
> dirty pages left in the inode in data=journal mode because
> ext4_writepages() should have written them all...
>
> Honza
>
> > ---
> > fs/ext4/inode.c | 11 +++++++++++
> > 1 file changed, 11 insertions(+)
> >
> > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> > index 15ba4d42982f..4e3bbf17995e 100644
> > --- a/fs/ext4/inode.c
> > +++ b/fs/ext4/inode.c
> > @@ -2787,6 +2787,17 @@ static int ext4_do_writepages(struct mpage_da_data *mpd)
> > if (unlikely(ret))
> > goto out_writepages;
> >
> > + /*
> > + * For data=journal, if the filesystem was remounted read-only,
> > + * the writeback thread may still write dirty pages to it.
> > + * Return early to avoid starting a journal transaction on a
> > + * read-only filesystem.
> > + */
> > + if (ext4_should_journal_data(inode) && sb_rdonly(inode->i_sb)) {
> > + ret = -EROFS;
> > + goto out_writepages;
> > + }
> > +
> > /*
> > * If we have inline data and arrive here, it means that
> > * we will soon create the block for the 1st page, so
> > --
> > 2.43.0
> >
> --
> Jan Kara <jack@...e.com>
> SUSE Labs, CR
Powered by blists - more mailing lists