lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.LNX.2.00.1301061037140.28950@eggly.anvils>
Date:	Sun, 6 Jan 2013 11:06:29 -0800 (PST)
From:	Hugh Dickins <hughd@...gle.com>
To:	Hillf Danton <dhillf@...il.com>
cc:	Dave Jones <davej@...hat.com>,
	Linux Kernel <linux-kernel@...r.kernel.org>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Andrea Arcangeli <aarcange@...hat.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Mel Gorman <mgorman@...e.de>
Subject: Re: oops in copy_page_rep()

On Sun, 6 Jan 2013, Hillf Danton wrote:
> On Sat, Jan 5, 2013 at 11:22 PM, Dave Jones <davej@...hat.com> wrote:
> > I have no idea what happened here, but this is the first time I've seen this one.
> > This was running a tree pulled yesterday afternoon.
> >
> > BUG: unable to handle kernel paging request at ffff880100201000
> > IP: [<ffffffff81333235>] copy_page_rep+0x5/0x10
> > PGD 1c0c063 PUD cfbff067 PMD cfc01067 PTE 8000000100201160
> > Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
> > Modules linked in: nfnetlink_log hidp fuse bnep llc2 rose caif_socket caif af_rxrpc phonet netrom af_key binfmt_misc rfcomm l2tp_ppp l2tp_core pppoe pppox ppp_generic slhc ipt_ULOG scsi_transp
> > ort_iscsi can_raw nfnetlink ipx x25 p8023 p8022 nfc ax25 decnet rds can_bcm irda crc_ccitt can appletalk atm psnap llc lockd sunrpc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_conntrack nf_conntrack ip6ta
> > ble_filter ip6_tables snd_hda_codec_realtek btusb snd_hda_intel bluetooth snd_hda_codec usb_debug microcode rfkill snd_pcm serio_raw snd_page_alloc snd_timer pcspkr edac_core snd soundcore r8169 mii vhost_net
> >  tun macvtap macvlan kvm_amd kvm
> > CPU 0
> > Pid: 3505, comm: trinity-child0 Not tainted 3.8.0-rc2+ #45 Gigabyte Technology Co., Ltd. GA-MA78GM-S2H/GA-MA78GM-S2H
> > RIP: 0010:[<ffffffff81333235>]  [<ffffffff81333235>] copy_page_rep+0x5/0x10
> > RSP: 0018:ffff88001ecabd00  EFLAGS: 00010286
> > RAX: 0000000100201000 RBX: 000000011d215000 RCX: 0000000000000200
> > RDX: cccccccccccccccd RSI: ffff880100201000 RDI: ffff88011d215000
> > RBP: ffff88001ecabd98 R08: 0000000000000001 R09: 0000000000000000
> > R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000008
> > R13: 000000000500a050 R14: ffff8800916af080 R15: ffff880095435668
> > FS:  00007f48a2280740(0000) GS:ffff88012ee00000(0000) knlGS:0000000000000000
> > CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> > CR2: ffff880100201000 CR3: 0000000054eda000 CR4: 00000000000007f0
> > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> > Process trinity-child0 (pid: 3505, threadinfo ffff88001ecaa000, task ffff8800a9628000)
> > Stack:
> >  ffffffff8119a9c7 ffff88001ecabd28 ffff8800a9628000 000000000157e088
> >  000000000157e088 000000000157e088 ffff880095435668 ffff8800a9077600
> >  ffff880095435668 80000001002000e5 ffff8800ba515050 0000000001400000
> > Call Trace:
> >  [<ffffffff8119a9c7>] ? do_huge_pmd_wp_page+0x707/0xc00
> >  [<ffffffff81165f1c>] handle_mm_fault+0x14c/0x590
> >  [<ffffffff810b35ce>] ? __lock_is_held+0x5e/0x90
> >  [<ffffffff816a280c>] __do_page_fault+0x15c/0x4e0
> >  [<ffffffff8100a1b6>] ? native_sched_clock+0x26/0x90
> >  [<ffffffff810b28e8>] ? trace_hardirqs_off_caller+0x28/0xc0
> >  [<ffffffff81334cbd>] ? trace_hardirqs_off_thunk+0x3a/0x3c
> >  [<ffffffff816a2b9e>] do_page_fault+0xe/0x10
> >  [<ffffffff8169f822>] page_fault+0x22/0x30
> > Code: 90 90 90 90 90 90 9c fa 65 48 3b 06 75 14 65 48 3b 56 08 75 0d 65 48 89 1e 65 48 89 4e 08 9d b0 01 c3 9d 30 c0 c3 b9 00 02 00 00 <f3> 48 a5 c3 0f 1f 80 00 00 00 00 eb ee 66 66 66 90 66 66 66 90
> > RIP  [<ffffffff81333235>] copy_page_rep+0x5/0x10
> >  RSP <ffff88001ecabd00>
> > CR2: ffff880100201000
> >
> Would you please try the following patch?
> 
> Hillf
> ---
> --- a/mm/memory.c	Sun Jan  6 19:49:50 2013
> +++ b/mm/memory.c	Sun Jan  6 19:52:42 2013
> @@ -3710,7 +3710,9 @@ retry:
>  				return do_huge_pmd_numa_page(mm, vma, address,
>  							     orig_pmd, pmd);
> 
> -			if (dirty && !pmd_write(orig_pmd)) {
> +			if (dirty && !pmd_write(orig_pmd) &&
> +					!pmd_trans_splitting(orig_pmd)) {
> +
>  				ret = do_huge_pmd_wp_page(mm, vma, address, pmd,
>  							  orig_pmd);
>  				/*
> --

Excellent suggestion!

I don't think it need wait on Dave trying+failing to reproduce his oops,
which strongly suggested that we're involved with a page which had very
recently been split out from a hugepage, and in fact had got freed.

It's clear that 3.7 had an important pmd_trans_splitting(orig_pmd)
check there, which went AWOL in
d10e63f29488 "mm: numa: Create basic numa page hinting infrastructure".
Perhaps intended to be moved into do_huge_pmd_wp_page, but the checks
there are pmd_same against orig_pmd, so vital to get orig_pmd right.

I don't entirely like your patch (or the original code): shouldn't
there be a wait_split_huge_page(), rather than hammering back with
repeated faults until the split has completed?  Or perhaps it makes
little difference.  Let's see what Mel or Andrea suggest.

Hugh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ