linux-kernel - Re: [tip:x86/core] [kallsyms] f138918162: WARNING:CPU:#PID:#at_fs/xfs/xfs_message.c:#xfs_buf_alert

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Y1qv3ku6ULHy8hOF@magnolia>
Date:   Thu, 27 Oct 2022 09:20:46 -0700
From:   "Darrick J. Wong" <djwong@...nel.org>
To:     Yujie Liu <yujie.liu@...el.com>
Cc:     Peter Zijlstra <peterz@...radead.org>, oe-lkp@...ts.linux.dev,
        lkp@...el.com, Thomas Gleixner <tglx@...utronix.de>,
        linux-kernel@...r.kernel.org, x86@...nel.org,
        fengwei.yin@...el.com, ying.huang@...el.com,
        fstests@...r.kernel.org, linux-xfs@...r.kernel.org
Subject: Re: [tip:x86/core] [kallsyms] f138918162:
 WARNING:CPU:#PID:#at_fs/xfs/xfs_message.c:#xfs_buf_alert_ratelimited.cold-#[xfs]

On Thu, Oct 27, 2022 at 10:24:51AM +0800, Yujie Liu wrote:
> On 10/27/2022 10:09, Yujie Liu wrote:
> > On Wed, Oct 26, 2022 at 02:06:17PM +0200, Peter Zijlstra wrote:
> > > On Wed, Oct 26, 2022 at 11:51:14AM +0200, Peter Zijlstra wrote:
> > > > On Wed, Oct 26, 2022 at 05:10:28PM +0800, kernel test robot wrote:
> > > > > Hi Peter,
> > > > > 
> > > > > For below patch, we couldn't find any connection between the code
> > > > > change of kallsyms and xfstests testcase, but we got very stable test
> > > > > results. We tested commit f1389181622a and its parent commit
> > > > > 7825451fa4dc for 12 runs each, parent was 100% good while this commit
> > > > > was 100% bad, so we still send out this report FYI though we don't have
> > > > > clear clue of the root cause. Please check the details below if
> > > > > interested in further investigation. Thanks.
> > > > 
> > > > *groan* I'll go have a poke.
> > > > 
> > > > Also; I've had these patches in my queue.git for over a month now and
> > > > this is the first report, how comes?
> > 
> > Hi Peter,
> > 
> > Sorry about late report. We did catch this problem on
> > call-depth-tracking branch of queue.git in Semtember, but we used to
> > think it may be a false alarm since we cannot connect the code change
> > with xfstests, so didn't report at that time. After merged to x86 tip,
> > we did more tests and reviewed this problem with more developers, and
> > sent out the report finally.
> > 
> > > dmesg starts at 42 seconds, you don't happen to have a complete one?
> > 
> > Sorry for the incomplete dmesg. Please check if attached kmsg.xz helps?
> > The time may vary a little since it is captured from another run.
> > Thanks.
> > 
> > Not sure if this has anything to do with xfs or testcases, so add
> > xfs folks to consult.
> > 
> > Hi xfs folks,
> > 
> > Could you please help check the original report at
> > https://lore.kernel.org/r/202210241508.2e203c3d-yujie.liu@intel.com
> > to see if can find any clue for the problem? Thanks.
> 
> Sorry, wrong link. Should be:
> 
> https://lore.kernel.org/all/202210241614.2ae4c1f5-yujie.liu@intel.com/

xfs/439 corrupts the log and checks that log recovery will notice the
corruption and abort the mount.

Excerpted dmesg from the link above:

[   75.708690][ T3668] ------------[ cut here ]------------
[   75.714041][ T3668] WARNING: CPU: 1 PID: 3668 at fs/xfs/xfs_message.c:104 xfs_buf_alert_ratelimited.cold-0x7/0x45 [xfs]
[   75.779828][ T3668] CPU: 1 PID: 3668 Comm: mount Not tainted 6.1.0-rc1-00053-gf1389181622a #1
[   75.788389][ T3668] Hardware name: Hewlett-Packard HP Pro 3340 MT/17A1, BIOS 8.07 01/24/2013
[   75.796864][ T3668] RIP: 0010:xfs_buf_alert_ratelimited.cold-0x7/0x45 [xfs]
[ 75.898077][ T3668] xlog_find_verify_log_record (fs/xfs/xfs_log_recover.c:431) xfs
[ 75.910856][ T3668] xlog_find_head (fs/xfs/xfs_log_recover.c:705) xfs
[ 75.925847][ T3668] xlog_find_tail (fs/xfs/xfs_log_recover.c:1256) xfs
[ 75.951249][ T3668] xlog_recover (fs/xfs/xfs_log_recover.c:3355) xfs
[ 75.961324][ T3668] xfs_log_mount (fs/xfs/xfs_log.c:739) xfs
[ 75.966486][ T3668] xfs_mountfs (fs/xfs/xfs_mount.c:805 (discriminator 4)) xfs
[ 75.998070][ T3668] xfs_fs_fill_super (fs/xfs/xfs_super.c:1666) xfs

So yes, this is xlog_find_verify_log_record behaving as intended:

		if (i < start_blk) {
			/* valid log record not found */
			xfs_warn(log->l_mp,
		"Log inconsistent (didn't find previous header)");
			ASSERT(0);  <------------------ here
			error = -EFSCORRUPTED;
			goto out;
		}

However, I noticed that the callsite in the WARNING: message has changed
from the usual 'asswarn' (which is the caller of WARN_ON) to
'xfs_buf_alert_ratelimited', which seems totally wrong since XFS log
recovery doesn't touch xfs_buf objects at all.

The reason this turned into a fstests regression is that xfs/439 uses
_filter_assert_dmesg to check for warnings in dmesg that are not the
result of an ASSERT failing (common/rc in fstests):

	# We generate assert related WARNINGs on purpose and make sure
	# test doesn't fail because of these warnings. This is a helper
	# for _check_dmesg() to filter out them.
	_filter_assert_dmesg()
	{
		local warn1="WARNING:.*fs/xfs/xfs_message\.c:.*asswarn.*"
		local warn2="WARNING:.*fs/xfs/xfs_message\.c:.*assfail.*"
		sed -e "s#$warn1#Intentional warnings in asswarn#" \
		    -e "s#$warn2#Intentional warnings in assfail#"
	}

As for why that commit reports the wrong caller ... I don't know.  The
function name and line number are correct.

--D

> 
> --
> Best Regards,
> Yujie