linux-kernel - Re: Regression in v4.2-rc1: vmalloc_to

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ndhwbqwpxuznkx.fsf@rasscmlmus001.pdx.intel.com>
Date:	Mon, 20 Jul 2015 11:33:02 -0700
From:	Ashutosh Dixit <ashutosh.dixit@...el.com>
To:	Toshi Kani <toshi.kani@...com>
Cc:	"linux-kernel\@vger.kernel.org" <linux-kernel@...r.kernel.org>,
	"Dutt\, Sudeep" <sudeep.dutt@...el.com>,
	"Rao\, Nikhil" <nikhil.rao@...el.com>,
	"Williams\, Dan J" <dan.j.williams@...el.com>
Subject: Re: Regression in v4.2-rc1: vmalloc_to_page with ioremap

On Mon, Jul 20 2015 at 08:59:02 AM, Toshi Kani <toshi.kani@...com> wrote:

> Can you also try the 'nohugeiomap' kernel option to the 4.2-rc1 kernel
> to see if the problem goes away?

Yes the problem goes away with 'nohugeiomap'.

> Yes, vmalloc_to_page() assumes 4KB mappings.  But ioremap with huge
> pages was enabled in v4.1, while v4.2-rc1 has update for the check with
> MTRRs.

Yes we had bisected the failure down to the following commit, so the
problem goes away if this commit is reverted. So we think that without this
commit ioremap actually maps device memory in 4K pages and that is why we
don't see the issue.

commit b73522e0c1be58d3c69b124985b8ccf94e3677f7
Author: Toshi Kani <toshi.kani@...com>
Date:   Tue May 26 10:28:10 2015 +0200

    x86/mm/mtrr: Enhance MTRR checks in kernel mapping helpers

> Can you send me outputs of the following files?  If the driver
> fails to load in v4.2-rc1, you can obtain the info in v4.1.
>
>   /proc/mtrr
>   /proc/iomem
>   /proc/vmallocinfo
>   /sys/kernel/debug/kernel_page_tables (need CONFIG_X86_PTDUMP set)

Since the outputs are large I have sent you the outputs in a separate mail
outside the mailing list.

> Also, does the driver map a regular memory range with ioremap?  If not,
> how does 'struct page' get allocated for the range (since
> vmalloc_to_page returns a page pointer)?

No the driver does not map regular memory with ioremap, only device
memory. vmalloc_to_page was returning a valid 'struct page' in this case
too. It appears it can do this correctly using pte_page as long as all four
page table levels (pgd, pud, pmd, pte) are present and the problem seems to
be happening because in the case of huge pages they are not. For us the bar
size is 8 G so we think that with the new ioremap maps the bars using 1 G
pages.

Here is the call stack for the crash:

[47360.050724] BUG: unable to handle kernel paging request at ffffc47e00000000
[47360.050965] IP: [<ffffffff8119da5c>] vmalloc_to_page+0x6c/0xb0
[47360.051136] PGD 0
[47360.051288] Oops: 0000 [#1] SMP
[47360.059112] Workqueue: SCIF INTR 2 scif_intr_bh_handler [scif]
[47360.059481] task: ffff88042d659d80 ti: ffff88042d664000 task.ti: ffff88042d664000
[47360.059986] RIP: 0010:[<ffffffff8119da5c>]  [<ffffffff8119da5c>] vmalloc_to_page+0x6c/0xb0
[47360.060568] RSP: 0018:ffff88042d667bb8  EFLAGS: 00010206
[47360.060863] RAX: 00003c7e00000000 RBX: ffffc90040000000 RCX: 00003ffffffff000
[47360.061165] RDX: 00003c7e00000000 RSI: ffff880000000000 RDI: ffffc90040000000
[47360.061466] RBP: ffff88042d667bb8 R08: 0000000000000000 R09: ffff880679100000
[47360.061766] R10: 0000000000000000 R11: 0000000000000008 R12: 0000000000040000
[47360.062066] R13: 0000000000008000 R14: ffff880679100000 R15: ffff880679100000
[47360.062367] FS:  0000000000000000(0000) GS:ffff88083f600000(0000) knlGS:0000000000000000
[47360.062873] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[47360.063169] CR2: ffffc47e00000000 CR3: 0000000001c0e000 CR4: 00000000000407e0
[47360.063468] Stack:
[47360.063754]  ffff88042d667c08 ffffffffa054c141 0000000000000000 0000000000040000
[47360.064565]  ffff88042d667be8 ffff88082a6ebd20 0000000000000020 0000000000008000
[47360.065389]  ffff88067f140228 ffff88082993a000 ffff88042d667c88 ffffffffa054d068
[47360.066212] Call Trace:
[47360.066530]  [<ffffffffa054c141>] scif_p2p_setsg+0xb1/0xd0 [scif]
[47360.066835]  [<ffffffffa054d068>] scif_init_p2p_info+0xe8/0x290 [scif]
[47360.067171]  [<ffffffffa054d37a>] scif_init+0x16a/0x2c0 [scif]
[47360.067487]  [<ffffffff810a9dea>] ? pick_next_task_fair+0x42a/0x4f0
[47360.067817]  [<ffffffffa054c242>] scif_nodeqp_msg_handler+0x42/0x80 [scif]
[47360.068122]  [<ffffffffa054c322>] scif_nodeqp_intrhandler+0x32/0x70 [scif]
[47360.068453]  [<ffffffffa0547575>] scif_intr_bh_handler+0x25/0x40 [scif]
[47360.068759]  [<ffffffff8108b553>] process_one_work+0x143/0x410
[47360.069087]  [<ffffffff8108b93b>] worker_thread+0x11b/0x4b0
[47360.069387]  [<ffffffff816c4659>] ? __schedule+0x309/0x880
[47360.069713]  [<ffffffff8108b820>] ? process_one_work+0x410/0x410
[47360.070013]  [<ffffffff8108b820>] ? process_one_work+0x410/0x410
[47360.070334]  [<ffffffff8109094c>] kthread+0xcc/0xf0
[47360.070639]  [<ffffffff8109b9ae>] ? schedule_tail+0x1e/0xc0
[47360.070961]  [<ffffffff81090880>] ? kthread_freezable_should_stop+0x70/0x70
[47360.071266]  [<ffffffff816c895f>] ret_from_fork+0x3f/0x70
[47360.071584]  [<ffffffff81090880>] ? kthread_freezable_should_stop+0x70/0x70
[47360.077579] RIP  [<ffffffff8119da5c>] vmalloc_to_page+0x6c/0xb0
[47360.077979]  RSP <ffff88042d667bb8>

And gdb points to the following source:

(gdb) list *vmalloc_to_page+0x6c
0xffffffff8119e50c is in vmalloc_to_page (mm/vmalloc.c:246).
241              */
242             VIRTUAL_BUG_ON(!is_vmalloc_or_module_addr(vmalloc_addr));
243
244             if (!pgd_none(*pgd)) {
245                     pud_t *pud = pud_offset(pgd, addr);
246                     if (!pud_none(*pud)) {
247                             pmd_t *pmd = pmd_offset(pud, addr);
248                             if (!pmd_none(*pmd)) {
249                                     pte_t *ptep, pte;
250

Thanks,
Ashutosh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/