[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Y2U+Je+LICO2HkNY@linutronix.de>
Date: Fri, 4 Nov 2022 17:30:29 +0100
From: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
To: Jan Kara <jack@...e.cz>
Cc: LKML <linux-kernel@...r.kernel.org>,
Thomas Gleixner <tglx@...utronix.de>,
Steven Rostedt <rostedt@...dmis.org>,
Mel Gorman <mgorman@...e.de>
Subject: Re: Crash with PREEMPT_RT on aarch64 machine
On 2022-11-03 12:54:44 [+0100], Jan Kara wrote:
> Hello,
Hi,
> I was tracking down the following crash with 6.0 kernel with
> patch-6.0.5-rt14.patch applied:
>
> [ T6611] ------------[ cut here ]------------
> [ T6611] kernel BUG at fs/inode.c:625!
seems like an off-by-one ;)
> The machine is aarch64 architecture, kernel config is attached. I have seen
> the crashes also with 5.14-rt kernel so it is not a new thing. The crash is
> triggered relatively reliably (on two different aarch64 machines) by our
> performance testing framework when running dbench benchmark against an XFS
> filesystem.
different aarch64 machines as in different SoC? Or the same CPU twice.
And no trouble on x86-64 I guess?
> Now originally I thought this is some problem with XFS or writeback code
> but after debugging this for some time I don't think that anymore.
> clear_inode() complains about inode->i_wb_list being non-empty. In fact
> looking at the list_head, I can see it is corrupted. In all the occurences
> of the problem ->prev points back to the list_head itself but ->next points
> to some list_head that used to be part of the sb->s_inodes_wb list (or
> actually that list spliced in wait_sb_inodes() because I've seen a pointer to
> the stack as ->next pointer as well).
so you assume a delete and add operation in parallel?
> This is not just some memory ordering issue with the check in
> clear_inode(). If I add sb->s_inode_wblist_lock locking around the check in
> clear_inode(), the problem still reproduces.
What about dropping the list_empty() check in sb_mark_inode_writeback()
and sb_clear_inode_writeback() so that the check operation always
happens within the locked section? Either way, missing an add/delete
should result in consistent pointers.
> If I enable CONFIG_DEBUG_LIST or if I convert sb->s_inode_wblist_lock to
> raw_spinlock_t, the problem disappears.
>
> Finally, I'd note that the list is modified from three places which makes
> audit relatively simple. sb_mark_inode_writeback(),
> sb_clear_inode_writeback(), and wait_sb_inodes(). All these places hold
> sb->s_inode_wblist_lock when modifying the list. So at this point I'm at
> loss what could be causing this. As unlikely as it seems to me I've started
> wondering whether it is not some subtle issue with RT spinlocks on aarch64
> possibly in combination with interrupts (because sb_clear_inode_writeback()
> may be called from an interrupt).
This should be modified from a threaded interrupt so interrupts and
preemption should be enabled at this point.
If preemption and or interrupts are disabled at some point then
CONFIG_DEBUG_ATOMIC_SLEEP should complain about it.
spinlock_t and raw_spinlock_t differ slightly in terms of locking.
rt_spin_lock() has the fast path via try_cmpxchg_acquire(). If you
enable CONFIG_DEBUG_RT_MUTEXES then you would force the slow path which
always acquires the rt_mutex_base::wait_lock (which is a raw_spinlock_t)
while the actual lock is modified via cmpxchg.
> Any ideas?
>
> Honza
Sebastian
Powered by blists - more mailing lists