linux-kernel - Re: xfs: list corruption in xfs_setup

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAM_iQpWi7=4C1M-a3TROU-EE88FYRSbgDabQUS+nzzMZk9CEHQ@mail.gmail.com>
Date:   Wed, 1 Nov 2017 14:55:23 -0700
From:   Cong Wang <xiyou.wangcong@...il.com>
To:     Dave Chinner <david@...morbit.com>
Cc:     Dave Chinner <dchinner@...hat.com>, darrick.wong@...cle.com,
        linux-xfs@...r.kernel.org, LKML <linux-kernel@...r.kernel.org>,
        Christoph Hellwig <hch@....de>,
        Al Viro <viro@...iv.linux.org.uk>
Subject: Re: xfs: list corruption in xfs_setup_inode()

On Wed, Nov 1, 2017 at 2:32 PM, Dave Chinner <david@...morbit.com> wrote:
> On Wed, Nov 01, 2017 at 04:07:01PM +1100, Dave Chinner wrote:
>> On Tue, Oct 31, 2017 at 09:43:03PM -0700, Cong Wang wrote:
>> > On Tue, Oct 31, 2017 at 8:05 PM, Dave Chinner <david@...morbit.com> wrote:
>> > > On Tue, Oct 31, 2017 at 06:51:08PM -0700, Cong Wang wrote:
>> > >> >> Please let me know if I can provide any other information.
>> > >> >
>> > >> > How do you reproduce the problem?
>> > >>
>> > >> The warning is reported via ABRT email, we don't know what was
>> > >> happening at the time of crash.
>> > >
>> > > Which makes it even harder to track down. Perhaps you should
>> > > configure the box to crashdump on such a failure and then we
>> > > can do some post-failure forensic analysis...
>> >
>> > Yeah.
>> >
>> > We are trying to make kdump working, but even if kdump works
>> > we still can't turn on panic_on_warn since this is production
>> > machine.
>>
>> Hmmm. Ok, maybe you could leave a trace of the xfs_iget* trace
>> points running and check the log tail for unusual events around the
>> time of the next crash. e.g. xfs_iget_reclaim_fail events. That
>> might point us to a potential interaction we can look at more
>> closely. I'd also suggest slab poisoning as well, as that will
>> catch other lifecycle problems that could be causing list
>> corruptions such as use-after-free.

Not sure if I can use trace, because this stack trace was triggered
by systemd-tmpfile during boot (before login).

>
> FWIW, I note that you are reporting another memory
> corruption/use-after-free related crash in the pipe_inode_info
> structure on these same machines.  I'd suggest that you start with
> the premise that this list corruption has the same root cause...

That's impossible. First of all, the machine triggered xfs warning
is different from the machines triggered free_pipe_info() crashes.
Secondly, this one is on 4.9 kernel while the other one is on 4.1.

Thanks.