[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20160523142459.GA31159@node.shutemov.name>
Date: Mon, 23 May 2016 17:24:59 +0300
From: "Kirill A. Shutemov" <kirill@...temov.name>
To: Mika Westerberg <mika.westerberg@...ux.intel.com>,
Andrea Arcangeli <aarcange@...hat.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
linux-kernel@...r.kernel.org
Subject: Re: v4.6 kernel BUG at mm/rmap.c:1101!
On Mon, May 23, 2016 at 05:06:38PM +0300, Mika Westerberg wrote:
> Hi,
>
> After upgrading kernel of my desktop system from v4.6-rc7 to v4.6, I've
> started seeing following:
>
> [176611.093747] page:ffffea0000360000 count:1 mapcount:0 mapping:ffff880034d2e0a1 index:0x1f9b06600 compound_mapcount: 0
> [176611.093751] flags: 0x3fff8000044079(locked|uptodate|dirty|lru|active|head|swapbacked)
> [176611.093752] page dumped because: VM_BUG_ON_PAGE(page->index != linear_page_index(vma, address))
> [176611.093753] page->mem_cgroup:ffff88049e81b800
> [176611.093765] ------------[ cut here ]------------
> [176611.093778] kernel BUG at mm/rmap.c:1101!
> [176611.093787] invalid opcode: 0000 [#1] PREEMPT SMP
> [176611.093800] Modules linked in: vfat fat usb_storage fuse bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables pl2303 snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic snd_hda_intel snd_hda_codec x86_pkg_temp_thermal coretemp kvm_intel snd_hwdep snd_hda_core kvm snd_seq snd_seq_device iTCO_wdt iTCO_vendor_support snd_pcm mxm_wmi irqbypass crct10dif_pclmul joydev crc32_pclmul crc32c_intel mei_me snd_timer ghash_clmulni_intel snd mei lpc_ich i2c_i801 shpchp mfd_core soundcore wmi i915 drm_kms_helper drm e1000e igb serio_raw dca i2c_algo_bit i2c_core ptp pps_core video
> [176611.093947] CPU: 1 PID: 2851 Comm: BrowserBlocking Tainted: G I 4.6.0 #71
> [176611.093962] Hardware name: Gigabyte Technology Co., Ltd. Z87X-UD7 TH/Z87X-UD7 TH-CF, BIOS F4 03/18/2014
> [176611.093981] task: ffff880492193600 ti: ffff8804971e0000 task.ti: ffff8804971e0000
> [176611.093996] RIP: 0010:[<ffffffff811dbcb3>] [<ffffffff811dbcb3>] page_move_anon_rmap+0x93/0xa0
> [176611.094018] RSP: 0000:ffff8804971e3d58 EFLAGS: 00010296
> [176611.094030] RAX: 0000000000000021 RBX: ffffea0000360000 RCX: 0000000000000002
> [176611.094045] RDX: 0000000080000002 RSI: ffffffff81a2dce2 RDI: 00000000ffffffff
> [176611.094059] RBP: ffff8804971e3d70 R08: 0000000000016e39 R09: 0000000000000004
> [176611.094074] R10: 800000000d81f065 R11: ffffffff81f19c4e R12: ffff880034d2e0a0
> [176611.094088] R13: 00000001f9b06600 R14: ffffea00003607c0 R15: ffff880495b3bc00
> [176611.094103] FS: 00007f0a91e71700(0000) GS:ffff8804af240000(0000) knlGS:0000000000000000
> [176611.094119] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [176611.094131] CR2: 00001f9b0661fcc8 CR3: 0000000497097000 CR4: 00000000001406e0
> [176611.094146] Stack:
> [176611.094151] ffff880042301398 00001f9b0661fcc8 ffffea0011c746b0 ffff8804971e3df8
> [176611.094169] ffffffff811ccdd7 000000000000000c ffff880471d1a0f8 ffff880498d2f198
> [176611.094186] 0000000000000001 ffff8804971e3e50 ffffffff8119b156 0000000000000001
> [176611.094203] Call Trace:
> [176611.094213] [<ffffffff811ccdd7>] do_wp_page+0x487/0x710
> [176611.094225] [<ffffffff8119b156>] ? generic_file_read_iter+0x606/0x6f0
> [176611.094238] [<ffffffff811cf1e9>] handle_mm_fault+0xf59/0x1d30
> [176611.094252] [<ffffffff8121eef7>] ? __vfs_read+0xa7/0xd0
> [176611.094266] [<ffffffff81066298>] __do_page_fault+0x1a8/0x520
> [176611.094280] [<ffffffff81066632>] do_page_fault+0x22/0x30
> [176611.094295] [<ffffffff81759508>] page_fault+0x28/0x30
> [176611.094306] Code: 20 05 a1 81 e8 2f d0 fe ff 0f 0b e8 68 ce fe ff 0f 0b 48 89 d6 e8 ee 32 01 00 eb cd 48 c7 c6 b0 2e a1 81 48 89 df e8 0d d0 fe ff <0f> 0b 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89
> [176611.094386] RIP [<ffffffff811dbcb3>] page_move_anon_rmap+0x93/0xa0
> [176611.094400] RSP <ffff8804971e3d58>
> [176611.099920] ---[ end trace d9cb6b7ad0bd6c55 ]---
> [176611.099922] note: BrowserBlocking[2851] exited with preempt_count 1
>
> I haven't bisected this yet but there seems to be only one commit
> touching mm in v4.6 so I kind of suspect that it has something to do
> with the issue. I'll try to revert it next and see if that changes
> anything.
>
> I've seen the issue now few times but I have no easy means to reproduce
> it. Only thing that seems to be consistent is the fact that the running
> process is always chrome.
>
> The commit in question is:
>
> 6d0a07edd17c ("mm: thp: calculate the mapcount correctly for THP pages
> during WP faults").
>
> Does this ring any bells? Thanks in advance.
Looks like we forgot to align address if the page is huge.
I'm not sure if caller or callee should do this.
Below is callee version.
Note that we use address only in CONFIG_DEBUG_VM=y case and the bug is not
visible on production kernels with the option disabled.
diff --git a/mm/rmap.c b/mm/rmap.c
index 8a839935b18c..0ea5d9071b32 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1098,6 +1098,8 @@ void page_move_anon_rmap(struct page *page,
VM_BUG_ON_PAGE(!PageLocked(page), page);
VM_BUG_ON_VMA(!anon_vma, vma);
+ if (IS_ENABLED(CONFIG_DEBUG_VM) && PageTransHuge(page))
+ address &= HPAGE_PMD_MASK;
VM_BUG_ON_PAGE(page->index != linear_page_index(vma, address), page);
anon_vma = (void *) anon_vma + PAGE_MAPPING_ANON;
--
Kirill A. Shutemov
Powered by blists - more mailing lists