[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20231211133619.3uaq4ri3r7fsdap5@quack3>
Date: Mon, 11 Dec 2023 14:36:19 +0100
From: Jan Kara <jack@...e.cz>
To: Ye Bin <yebin10@...wei.com>
Cc: tytso@....edu, adilger.kernel@...ger.ca,
linux-ext4@...r.kernel.org, linux-kernel@...r.kernel.org,
jack@...e.cz
Subject: Re: [PATCH] jbd2: fix soft lockup in
journal_finish_inode_data_buffers()
On Mon 11-12-23 19:25:44, Ye Bin wrote:
> There's issue when do io test:
> WARN: soft lockup - CPU#45 stuck for 11s! [jbd2/dm-2-8:4170]
> CPU: 45 PID: 4170 Comm: jbd2/dm-2-8 Kdump: loaded Tainted: G OE
> Call trace:
> dump_backtrace+0x0/0x1a0
> show_stack+0x24/0x30
> dump_stack+0xb0/0x100
> watchdog_timer_fn+0x254/0x3f8
> __hrtimer_run_queues+0x11c/0x380
> hrtimer_interrupt+0xfc/0x2f8
> arch_timer_handler_phys+0x38/0x58
> handle_percpu_devid_irq+0x90/0x248
> generic_handle_irq+0x3c/0x58
> __handle_domain_irq+0x68/0xc0
> gic_handle_irq+0x90/0x320
> el1_irq+0xcc/0x180
> queued_spin_lock_slowpath+0x1d8/0x320
> jbd2_journal_commit_transaction+0x10f4/0x1c78 [jbd2]
> kjournald2+0xec/0x2f0 [jbd2]
> kthread+0x134/0x138
> ret_from_fork+0x10/0x18
>
> Analyzed informations from vmcore as follows:
> (1) There are about 5k+ jbd2_inode in 'commit_transaction->t_inode_list';
> (2) Now is processing the 855th jbd2_inode;
> (3) JBD2 task has TIF_NEED_RESCHED flag;
> (4) There's no pags in address_space around the 855th jbd2_inode;
> (5) There are some process is doing drop caches;
> (6) Mounted with 'nodioread_nolock' option;
> (7) 128 CPUs;
>
> According to informations from vmcore we know 'journal->j_list_lock' spin lock
> competition is fierce. So journal_finish_inode_data_buffers() maybe process
> slowly. Theoretically, there is scheduling point in the filemap_fdatawait_range_keep_errors().
> However, if inode's address_space has no pages which taged with PAGECACHE_TAG_WRITEBACK,
> will not call cond_resched(). So may lead to soft lockup.
> journal_finish_inode_data_buffers
> filemap_fdatawait_range_keep_errors
> __filemap_fdatawait_range
> while (index <= end)
> nr_pages = pagevec_lookup_range_tag(&pvec, mapping, &index, end, PAGECACHE_TAG_WRITEBACK);
> if (!nr_pages)
> break; --> If 'nr_pages' is equal zero will break, then will not call cond_resched()
> for (i = 0; i < nr_pages; i++)
> wait_on_page_writeback(page);
> cond_resched();
>
> To solve above issue, add scheduling point in the journal_finish_inode_data_buffers();
>
> Signed-off-by: Ye Bin <yebin10@...wei.com>
Makes sense. Feel free to add:
Reviewed-by: Jan Kara <jack@...e.cz>
Honza
> ---
> fs/jbd2/commit.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
> index 9bdb377a348f..5e122586e06e 100644
> --- a/fs/jbd2/commit.c
> +++ b/fs/jbd2/commit.c
> @@ -270,6 +270,7 @@ static int journal_finish_inode_data_buffers(journal_t *journal,
> if (!ret)
> ret = err;
> }
> + cond_resched();
> spin_lock(&journal->j_list_lock);
> jinode->i_flags &= ~JI_COMMIT_RUNNING;
> smp_mb();
> --
> 2.31.1
>
--
Jan Kara <jack@...e.com>
SUSE Labs, CR
Powered by blists - more mailing lists