linux-kernel - Re: cgroup/loop Bad page state oops in Linux v4.2-rc3-136-g45b4b782e848

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 31 Jul 2015 14:58:03 -0400
From:	Josh Boyer <jwboyer@...oraproject.org>
To:	Mike Snitzer <snitzer@...hat.com>
Cc:	Ming Lei <ming.lei@...onical.com>, ejt@...hat.com,
	Johannes Weiner <hannes@...xchg.org>,
	Tejun Heo <tj@...nel.org>, Jens Axboe <axboe@...com>,
	"Linux-Kernel@...r. Kernel. Org" <linux-kernel@...r.kernel.org>
Subject: Re: cgroup/loop Bad page state oops in Linux v4.2-rc3-136-g45b4b782e848

On Thu, Jul 30, 2015 at 8:19 PM, Mike Snitzer <snitzer@...hat.com> wrote:
> On Thu, Jul 30 2015 at  7:14pm -0400,
> Josh Boyer <jwboyer@...oraproject.org> wrote:
>
>> On Thu, Jul 30, 2015 at 7:27 AM, Josh Boyer <jwboyer@...oraproject.org> wrote:
>> > On Wed, Jul 29, 2015 at 8:29 PM, Ming Lei <ming.lei@...onical.com> wrote:
>> >> On Wed, Jul 29, 2015 at 12:36 PM, Josh Boyer <jwboyer@...oraproject.org> wrote:
>> >>> On Wed, Jul 29, 2015 at 11:32 AM, Ming Lei <ming.lei@...onical.com> wrote:
>> >>>> On Wed, Jul 29, 2015 at 9:51 AM, Johannes Weiner <hannes@...xchg.org> wrote:
>> >>>>> On Wed, Jul 29, 2015 at 09:27:16AM -0400, Josh Boyer wrote:
>> >>>>>> Hi All,
>> >>>>>>
>> >>>>>> We've gotten a report[1] that any of the upcoming Fedora 23 install
>> >>>>>> images are all failing on 32-bit VMs/machines.  Looking at the first
>> >>>>>> instance of the oops, it seems to be a bad page state where a page is
>> >>>>>> still charged to a group and it is trying to be freed.  The oops
>> >>>>>> output is below.
>> >>>>>>
>> >>>>>> Has anyone seen this in their 32-bit testing at all?  Thus far nobody
>> >>>>>> can recreate this on a 64-bit machine/VM.
>> >>>>>>
>> >>>>>> josh
>> >>>>>>
>> >>>>>> [1] https://bugzilla.redhat.com/show_bug.cgi?id=1247382
>> >>>>>>
>> >>>>>> [    9.026738] systemd[1]: Switching root.
>> >>>>>> [    9.036467] systemd-journald[149]: Received SIGTERM from PID 1 (systemd).
>> >>>>>> [    9.082262] BUG: Bad page state in process kworker/u5:1  pfn:372ac
>> >>>>>> [    9.083989] page:f3d32ae0 count:0 mapcount:0 mapping:f2252178 index:0x16a
>> >>>>>> [    9.085755] flags: 0x40020021(locked|lru|mappedtodisk)
>> >>>>>> [    9.087284] page dumped because: page still charged to cgroup
>> >>>>>> [    9.088772] bad because of flags:
>> >>>>>> [    9.089731] flags: 0x21(locked|lru)
>> >>>>>> [    9.090818] page->mem_cgroup:f2c3e400
>> >>>>>
>> >>>>> It's also still locked and on the LRU. This page shouldn't have been
>> >>>>> freed.
>> >>>>>
>> >>>>>> [    9.117848] Call Trace:
>> >>>>>> [    9.118738]  [<c0aa22c9>] dump_stack+0x41/0x52
>> >>>>>> [    9.120034]  [<c054e30a>] bad_page.part.80+0xaa/0x100
>> >>>>>> [    9.121461]  [<c054eea9>] free_pages_prepare+0x3b9/0x3f0
>> >>>>>> [    9.122934]  [<c054fae2>] free_hot_cold_page+0x22/0x160
>> >>>>>> [    9.124400]  [<c071a22f>] ? copy_to_iter+0x1af/0x2a0
>> >>>>>> [    9.125750]  [<c054c4a3>] ? mempool_free_slab+0x13/0x20
>> >>>>>> [    9.126840]  [<c054fc57>] __free_pages+0x37/0x50
>> >>>>>> [    9.127849]  [<c054c4fd>] mempool_free_pages+0xd/0x10
>> >>>>>> [    9.128908]  [<c054c8b6>] mempool_free+0x26/0x80
>> >>>>>> [    9.129895]  [<c06f77e6>] bounce_end_io+0x56/0x80
>> >>>>>
>> >>>>> The page state looks completely off for a bounce buffer page. Did
>> >>>>> somebody mess with a bounce bio's bv_page?
>> >>>>
>> >>>> Looks the page isn't touched in both lo_read_transfer() and
>> >>>> lo_read_simple().
>> >>>>
>> >>>> Maybe it is related with aa4d86163e4e(block: loop: switch to VFS ITER_BVEC),
>> >>>> or it  might be helpful to run 'git bisect' if reverting aa4d86163e4e can't
>> >>>> fix the issue, suppose the issue can be reproduced easily.
>> >>>
>> >>> I can try reverting that and getting someone to test it.  It is
>> >>> somewhat complicated by having to spin a new install ISO, so a report
>> >>> back will be somewhat delayed.  In the meantime, I'm also asking
>> >>> people to track down the first kernel build that hits this, so
>> >>> hopefully that gives us more of a clue as well.
>>
>> The revert of that patch did not fix the issue.
>>
>> >>> It is odd that only 32-bit hits this issue though.  At least from what
>> >>> we've seen thus far.
>> >>
>> >> Page bounce may be just valid on 32-bit, and I will try to find one ARM
>> >> box to see if it can be reproduced easily.
>> >>
>> >> BTW, are there any extra steps for reproducing the issue? Such as
>> >> cgroup operations?
>> >
>> > I'm not entirely sure what the install environment on the ISOs is
>> > doing, but nobody sees this issue with a kernel after install.  Thus
>> > far recreate efforts have focused on recreating the install ISOs using
>> > various kernels.  That is working, but I don't expect other people to
>> > easily be able to do that.
>> >
>> > Also, our primary tester seems to have narrowed it down to breaking
>> > somewhere between 4.1-rc5 (good) and 4.1-rc6 (bad).  I'll be working
>> > with him today to isolate it further, but the commit you pointed out
>> > was in 4.1-rc1 and that worked.  He still needs to test a 4.2-rc4
>> > kernel with it reverted, but so far it seems to be something else that
>> > came in with the 4.1 kernel.
>>
>> After doing some RPM bisecting, we've narrowed it down to the
>> following commit range:
>>
>> [jwboyer@...er linux]$ git log --pretty=oneline c2102f3d73d8..0f1e5b5d19f6
>> 0f1e5b5d19f6c06fe2078f946377db9861f3910d Merge tag 'dm-4.1-fixes-3' of
>> git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
>> 1c220c69ce0dcc0f234a9f263ad9c0864f971852 dm: fix casting bug in dm_merge_bvec()
>> 15b94a690470038aa08247eedbebbe7e2218d5ee dm: fix reload failure of 0
>> path multipath mapping on blk-mq devices
>> e5d8de32cc02a259e1a237ab57cba00f2930fa6a dm: fix false warning in
>> free_rq_clone() for unmapped requests
>> 45714fbed4556149d7f1730f5bae74f81d5e2cd5 dm: requeue from blk-mq
>> dm_mq_queue_rq() using BLK_MQ_RQ_QUEUE_BUSY
>> 4c6dd53dd3674c310d7379c6b3273daa9fd95c79 dm mpath: fix leak of
>> dm_mpath_io structure in blk-mq .queue_rq error path
>> 3a1407559a593d4360af12dd2df5296bf8eb0d28 dm: fix NULL pointer when
>> clone_and_map_rq returns !DM_MAPIO_REMAPPED
>> 4ae9944d132b160d444fa3aa875307eb0fa3eeec dm: run queue on re-queue
>> [jwboyer@...er linux]$
>>
>> It is interesting to note that we're also carrying a patch in our 4.1
>> kernel for loop performance reasons that went into upstream 4.2.  That
>> patch is blk-loop-avoid-too-many-pending-per-work-IO.patch which
>> corresponds to upstream commit
>> 4d4e41aef9429872ea3b105e83426941f7185ab6.  All of those commits are in
>> 4.2-rcX, which matches the failures we're seeing.
>>
>> We can try a 4.1-rc5 snapshot build without the block patch to see if
>> that helps, but the patch was included in all the previously tested
>> good kernels and the issue only appeared after the DM merge commits
>> were included.
>
> The only commit that looks even remotely related (given 32bit concerns)
> would be 1c220c69ce0dcc0f234a9f263ad9c0864f971852

Confirmed.  I built kernels for our tester that started with the
working snapshot and applied the patches above one at a time.  The
failing patch was the commit you suspected.

I can try and build a 4.2-rc4 kernel with that reverted, but it would
be good if someone could start thinking about how that could cause
this issue.

josh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/