linux-kernel - Re: cgroup/loop Bad page state oops in Linux v4.2-rc3-136-g45b4b782e848

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+5PVA7TaLUvD8AuZc0jKS2VGpOuYubScdWCsvEP26gOWnfz_w@mail.gmail.com>
Date:	Thu, 30 Jul 2015 19:14:11 -0400
From:	Josh Boyer <jwboyer@...oraproject.org>
To:	Ming Lei <ming.lei@...onical.com>, snitzer@...hat.com,
	ejt@...hat.com
Cc:	Johannes Weiner <hannes@...xchg.org>, Tejun Heo <tj@...nel.org>,
	Jens Axboe <axboe@...com>,
	"Linux-Kernel@...r. Kernel. Org" <linux-kernel@...r.kernel.org>
Subject: Re: cgroup/loop Bad page state oops in Linux v4.2-rc3-136-g45b4b782e848

On Thu, Jul 30, 2015 at 7:27 AM, Josh Boyer <jwboyer@...oraproject.org> wrote:
> On Wed, Jul 29, 2015 at 8:29 PM, Ming Lei <ming.lei@...onical.com> wrote:
>> On Wed, Jul 29, 2015 at 12:36 PM, Josh Boyer <jwboyer@...oraproject.org> wrote:
>>> On Wed, Jul 29, 2015 at 11:32 AM, Ming Lei <ming.lei@...onical.com> wrote:
>>>> On Wed, Jul 29, 2015 at 9:51 AM, Johannes Weiner <hannes@...xchg.org> wrote:
>>>>> On Wed, Jul 29, 2015 at 09:27:16AM -0400, Josh Boyer wrote:
>>>>>> Hi All,
>>>>>>
>>>>>> We've gotten a report[1] that any of the upcoming Fedora 23 install
>>>>>> images are all failing on 32-bit VMs/machines.  Looking at the first
>>>>>> instance of the oops, it seems to be a bad page state where a page is
>>>>>> still charged to a group and it is trying to be freed.  The oops
>>>>>> output is below.
>>>>>>
>>>>>> Has anyone seen this in their 32-bit testing at all?  Thus far nobody
>>>>>> can recreate this on a 64-bit machine/VM.
>>>>>>
>>>>>> josh
>>>>>>
>>>>>> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1247382
>>>>>>
>>>>>> [    9.026738] systemd[1]: Switching root.
>>>>>> [    9.036467] systemd-journald[149]: Received SIGTERM from PID 1 (systemd).
>>>>>> [    9.082262] BUG: Bad page state in process kworker/u5:1  pfn:372ac
>>>>>> [    9.083989] page:f3d32ae0 count:0 mapcount:0 mapping:f2252178 index:0x16a
>>>>>> [    9.085755] flags: 0x40020021(locked|lru|mappedtodisk)
>>>>>> [    9.087284] page dumped because: page still charged to cgroup
>>>>>> [    9.088772] bad because of flags:
>>>>>> [    9.089731] flags: 0x21(locked|lru)
>>>>>> [    9.090818] page->mem_cgroup:f2c3e400
>>>>>
>>>>> It's also still locked and on the LRU. This page shouldn't have been
>>>>> freed.
>>>>>
>>>>>> [    9.117848] Call Trace:
>>>>>> [    9.118738]  [<c0aa22c9>] dump_stack+0x41/0x52
>>>>>> [    9.120034]  [<c054e30a>] bad_page.part.80+0xaa/0x100
>>>>>> [    9.121461]  [<c054eea9>] free_pages_prepare+0x3b9/0x3f0
>>>>>> [    9.122934]  [<c054fae2>] free_hot_cold_page+0x22/0x160
>>>>>> [    9.124400]  [<c071a22f>] ? copy_to_iter+0x1af/0x2a0
>>>>>> [    9.125750]  [<c054c4a3>] ? mempool_free_slab+0x13/0x20
>>>>>> [    9.126840]  [<c054fc57>] __free_pages+0x37/0x50
>>>>>> [    9.127849]  [<c054c4fd>] mempool_free_pages+0xd/0x10
>>>>>> [    9.128908]  [<c054c8b6>] mempool_free+0x26/0x80
>>>>>> [    9.129895]  [<c06f77e6>] bounce_end_io+0x56/0x80
>>>>>
>>>>> The page state looks completely off for a bounce buffer page. Did
>>>>> somebody mess with a bounce bio's bv_page?
>>>>
>>>> Looks the page isn't touched in both lo_read_transfer() and
>>>> lo_read_simple().
>>>>
>>>> Maybe it is related with aa4d86163e4e(block: loop: switch to VFS ITER_BVEC),
>>>> or it  might be helpful to run 'git bisect' if reverting aa4d86163e4e can't
>>>> fix the issue, suppose the issue can be reproduced easily.
>>>
>>> I can try reverting that and getting someone to test it.  It is
>>> somewhat complicated by having to spin a new install ISO, so a report
>>> back will be somewhat delayed.  In the meantime, I'm also asking
>>> people to track down the first kernel build that hits this, so
>>> hopefully that gives us more of a clue as well.

The revert of that patch did not fix the issue.

>>> It is odd that only 32-bit hits this issue though.  At least from what
>>> we've seen thus far.
>>
>> Page bounce may be just valid on 32-bit, and I will try to find one ARM
>> box to see if it can be reproduced easily.
>>
>> BTW, are there any extra steps for reproducing the issue? Such as
>> cgroup operations?
>
> I'm not entirely sure what the install environment on the ISOs is
> doing, but nobody sees this issue with a kernel after install.  Thus
> far recreate efforts have focused on recreating the install ISOs using
> various kernels.  That is working, but I don't expect other people to
> easily be able to do that.
>
> Also, our primary tester seems to have narrowed it down to breaking
> somewhere between 4.1-rc5 (good) and 4.1-rc6 (bad).  I'll be working
> with him today to isolate it further, but the commit you pointed out
> was in 4.1-rc1 and that worked.  He still needs to test a 4.2-rc4
> kernel with it reverted, but so far it seems to be something else that
> came in with the 4.1 kernel.

After doing some RPM bisecting, we've narrowed it down to the
following commit range:

[jwboyer@...er linux]$ git log --pretty=oneline c2102f3d73d8..0f1e5b5d19f6
0f1e5b5d19f6c06fe2078f946377db9861f3910d Merge tag 'dm-4.1-fixes-3' of
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
1c220c69ce0dcc0f234a9f263ad9c0864f971852 dm: fix casting bug in dm_merge_bvec()
15b94a690470038aa08247eedbebbe7e2218d5ee dm: fix reload failure of 0
path multipath mapping on blk-mq devices
e5d8de32cc02a259e1a237ab57cba00f2930fa6a dm: fix false warning in
free_rq_clone() for unmapped requests
45714fbed4556149d7f1730f5bae74f81d5e2cd5 dm: requeue from blk-mq
dm_mq_queue_rq() using BLK_MQ_RQ_QUEUE_BUSY
4c6dd53dd3674c310d7379c6b3273daa9fd95c79 dm mpath: fix leak of
dm_mpath_io structure in blk-mq .queue_rq error path
3a1407559a593d4360af12dd2df5296bf8eb0d28 dm: fix NULL pointer when
clone_and_map_rq returns !DM_MAPIO_REMAPPED
4ae9944d132b160d444fa3aa875307eb0fa3eeec dm: run queue on re-queue
[jwboyer@...er linux]$

It is interesting to note that we're also carrying a patch in our 4.1
kernel for loop performance reasons that went into upstream 4.2.  That
patch is blk-loop-avoid-too-many-pending-per-work-IO.patch which
corresponds to upstream commit
4d4e41aef9429872ea3b105e83426941f7185ab6.  All of those commits are in
4.2-rcX, which matches the failures we're seeing.

We can try a 4.1-rc5 snapshot build without the block patch to see if
that helps, but the patch was included in all the previously tested
good kernels and the issue only appeared after the DM merge commits
were included.

josh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/