linux-kernel - Re: [4.10, panic, regression] iscsi: null pointer deref at iscsi_tcp_segment

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20161222185030.so4btkuzzkih3owz@straylight.hirudinean.org>
Date:   Thu, 22 Dec 2016 10:50:30 -0800
From:   Chris Leech <cleech@...hat.com>
To:     Dave Chinner <david@...morbit.com>
Cc:     Linus Torvalds <torvalds@...ux-foundation.org>,
        Johannes Weiner <hannes@...xchg.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Lee Duncan <lduncan@...e.com>, open-iscsi@...glegroups.com,
        Linux SCSI List <linux-scsi@...r.kernel.org>,
        linux-block@...r.kernel.org, Christoph Hellwig <hch@....de>
Subject: Re: [4.10, panic, regression] iscsi: null pointer deref at
 iscsi_tcp_segment_done+0x20d/0x2e0

On Thu, Dec 22, 2016 at 05:50:12PM +1100, Dave Chinner wrote:
> On Wed, Dec 21, 2016 at 09:46:37PM -0800, Linus Torvalds wrote:
> > On Wed, Dec 21, 2016 at 9:13 PM, Dave Chinner <david@...morbit.com> wrote:
> > >
> > > There may be deeper issues. I just started running scalability tests
> > > (e.g. 16-way fsmark create tests) and about a minute in I got a
> > > directory corruption reported - something I hadn't seen in the dev
> > > cycle at all.
> > 
> > By "in the dev cycle", do you mean your XFS changes, or have you been
> > tracking the merge cycle at least for some testing?
> 
> I mean the three months leading up to the 4.10 merge, when all the
> XFS changes were being tested against 4.9-rc kernels.
> 
> The iscsi problem showed up when I updated the base kernel from
> 4.9 to 4.10-current last week to test the pullreq I was going to
> send you. I've been bust with other stuff until now, so I didn't
> upgrade my working trees again until today in the hope the iscsi
> problem had already been found and fixed.
> 
> > > I unmounted the fs, mkfs'd it again, ran the
> > > workload again and about a minute in this fired:
> > >
> > > [628867.607417] ------------[ cut here ]------------
> > > [628867.608603] WARNING: CPU: 2 PID: 16925 at mm/workingset.c:461 shadow_lru_isolate+0x171/0x220
> > 
> > Well, part of the changes during the merge window were the shadow
> > entry tracking changes that came in through Andrew's tree. Adding
> > Johannes Weiner to the participants.
> > 
> > > Now, this workload does not touch the page cache at all - it's
> > > entirely an XFS metadata workload, so it should not really be
> > > affecting the working set code.
> > 
> > Well, I suspect that anything that creates memory pressure will end up
> > triggering the working set code, so ..
> > 
> > That said, obviously memory corruption could be involved and result in
> > random issues too, but I wouldn't really expect that in this code.
> > 
> > It would probably be really useful to get more data points - is the
> > problem reliably in this area, or is it going to be random and all
> > over the place.
> 
> The iscsi problem is 100% reproducable. create a pair of iscsi luns,
> mkfs, run xfstests on them. iscsi fails a second after xfstests mounts
> the filesystems.
> 
> The test machine I'm having all these other problems on? stable and
> steady as a rock using PMEM devices. Moment I go to use /dev/vdc
> (i.e. run load/perf benchmarks) it starts falling over left, right
> and center.

I'm not reproducing any problems with xfstests running over iscsi_tcp
right now.  Two 10G luns exported from an LIO target, attached directly
to a test VM as sda/sdb and xfstests configured to use sda1/sdb1 as
TEST_DEV and SCRATCH_DEV.

The virtio scatterlist issue that popped right away for me is triggered
by an hdparm ioctl, which is being run by tuned on Fedora.  And that
actually seems to happen back on 4.9 as well :(

Chris