linux-kernel - Re: 3.8-rc5 xfs corruption

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <2103902716.11642129.1359619270232.JavaMail.root@redhat.com>
Date:	Thu, 31 Jan 2013 03:01:10 -0500 (EST)
From:	CAI Qian <caiqian@...hat.com>
To:	Dave Chinner <david@...morbit.com>
Cc:	xfs@....sgi.com, linux-xfs@...r.kernel.org,
	linux-kernel <linux-kernel@...r.kernel.org>
Subject: Re: 3.8-rc5 xfs corruption



----- Original Message -----
> From: "Dave Chinner" <david@...morbit.com>
> To: "CAI Qian" <caiqian@...hat.com>
> Cc: xfs@....sgi.com, linux-xfs@...r.kernel.org, "linux-kernel" <linux-kernel@...r.kernel.org>
> Sent: Thursday, January 31, 2013 12:07:48 PM
> Subject: Re: 3.8-rc5 xfs corruption
> 
> On Wed, Jan 30, 2013 at 10:16:47PM -0500, CAI Qian wrote:
> > Hello,
> > 
> > (Sorry to post to xfs mailing lists but unsure about which one is
> > the
> > best for this.)
> 
> Trimmed to just xfs@....sgi.com.
Thanks for quick response, Dave.
> 
> > I have seen something like this once during testing on a system
> > with a
> > EMC VNX FC/multipath back-end.
> 
> This is a trace from the verifier code that was added in 3.8-rc1 so
> I doubt it has anything to do with any problem you've seen in the
> past....
> 
> Can you tell us what workload you were running and what hardware you
> are using as per:
> 
> http://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F
This was the system,
- AMD Opteron(tm) Processor 4130 (1 socket, 4 cores)
- PowerEdge R415 
- 8G memory
- mptsas local disks

Software version,
- xfsprogs-3.1.10

The workload was running some fs_mark, syscalls tests, some nfs/cifs
connectathon tests, memory, libhugetlbfs tests, and some dynamic debug
(Documentation/dynamic-debug-howto.txt) tests.
> 
> As it is, if you mounted the filesystem after this problem was
> detected, log recovery probably propagated it to disk. I'd suggest
> that you run xfs_repair -n on the device and post the output so we
> can see if any corruption has actaully made it to disk. If no
> corruption made it to disk, it's possible that we've got the
> incorrect verifier attached to the buffer.
The system was taken away from me, so I can only occupy it again later
if needed.

Regards,
CAI Qian
> 
> > [ 3025.063024] ffff8801a0d50000: 2e 2e 2f 2e 2e 2f 75 73 72 2f 6c
> > 69 62 2f 6d 6f  ../../usr/lib/mo
> 
> The start of a block contains a path and the only
> type of block that can contain this format of metadata is remote
> symlink block. Remote symlink blocks don't have a verifier attached
> to them as there is nothing that can currently be used to verify
> them as correct.
> 
> I can't see exactly how this can occur as stale buffers have the
> verifier ops cleared before being returned to the new user, and
> newly allocated xfs_bufs are zeroed before being initialised. I
> really need to know what you are doing to be able to get to the
> bottom of it....
> 
> Cheers,
> 
> Dave.
> --
> Dave Chinner
> david@...morbit.com
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/