linux-kernel - Re: [4.10, panic, regression] iscsi: null pointer deref at iscsi_tcp_segment

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CA+55aFza=cv6TGYh+Be1xz2x-G-U=vruhbmfGpHSqY1ocZrBeg@mail.gmail.com>
Date:   Wed, 21 Dec 2016 21:46:37 -0800
From:   Linus Torvalds <torvalds@...ux-foundation.org>
To:     Dave Chinner <david@...morbit.com>,
        Johannes Weiner <hannes@...xchg.org>
Cc:     Chris Leech <cleech@...hat.com>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Lee Duncan <lduncan@...e.com>, open-iscsi@...glegroups.com,
        Linux SCSI List <linux-scsi@...r.kernel.org>,
        linux-block@...r.kernel.org, Christoph Hellwig <hch@....de>
Subject: Re: [4.10, panic, regression] iscsi: null pointer deref at iscsi_tcp_segment_done+0x20d/0x2e0

On Wed, Dec 21, 2016 at 9:13 PM, Dave Chinner <david@...morbit.com> wrote:
>
> There may be deeper issues. I just started running scalability tests
> (e.g. 16-way fsmark create tests) and about a minute in I got a
> directory corruption reported - something I hadn't seen in the dev
> cycle at all.

By "in the dev cycle", do you mean your XFS changes, or have you been
tracking the merge cycle at least for some testing?

> I unmounted the fs, mkfs'd it again, ran the
> workload again and about a minute in this fired:
>
> [628867.607417] ------------[ cut here ]------------
> [628867.608603] WARNING: CPU: 2 PID: 16925 at mm/workingset.c:461 shadow_lru_isolate+0x171/0x220

Well, part of the changes during the merge window were the shadow
entry tracking changes that came in through Andrew's tree. Adding
Johannes Weiner to the participants.

> Now, this workload does not touch the page cache at all - it's
> entirely an XFS metadata workload, so it should not really be
> affecting the working set code.

Well, I suspect that anything that creates memory pressure will end up
triggering the working set code, so ..

That said, obviously memory corruption could be involved and result in
random issues too, but I wouldn't really expect that in this code.

It would probably be really useful to get more data points - is the
problem reliably in this area, or is it going to be random and all
over the place.

That said:

> And worse, on that last error, the /host/ is now going into meltdown
> (running 4.7.5) with 32 CPUs all burning down in ACPI code:

The obvious question here is how much you trust the environment if the
host ends up also showing problems. Maybe you do end up having hw
issues pop up too.

The primary suspect would presumably be the development kernel you're
testing triggering something, but it has to be asked..

                     Linus