[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Pine.LNX.4.64.0707180521470.18593@blonde.wat.veritas.com>
Date: Wed, 18 Jul 2007 05:49:50 +0100 (BST)
From: Hugh Dickins <hugh@...itas.com>
To: Joe Jin <joe.jin@...cle.com>
cc: Andrew Morton <akpm@...ux-foundation.org>,
Oleg Nesterov <oleg@...sign.ru>, bill.irwin@...cle.com,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH] Add nid sanity on alloc_pages_node
On Wed, 18 Jul 2007, Joe Jin wrote:
>
> With your patch, I have reproduced the panic:
That is... surprising to me. (I hadn't been able to reproduce it with
or without the patches: maybe I just need to try harder.) Please post
your gcc --version, and the disassembly (objdump -d) output for
alloc_fresh_huge_page.
Or can someone else make sense of this - Oleg? To me it still seems
that nid_lock can only be irrelevant (doesn't even provide a compiler
barrier between prev_nid and nid transactions).
You remark lower down
> From your patch, if we dont the lock, the race condition maybe occur
> at next_node().
but I don't see how (if next_node were a macro which evaluates its
args more than once, perhaps, but that doesn't seem to be the case).
Thanks a lot,
Hugh
>
> Unable to handle kernel paging request at 000000000000186a RIP:
> [<ffffffff8105d4e7>] __alloc_pages+0x2f/0x2c3
> PGD 72595067 PUD 72594067 PMD 0
> Oops: 0000 [1] SMP
> CPU 0
> Modules linked in: xt_tcpudp iptable_filter ip_tables x_tables cpufreq_ondemand dm_mirror dm_multipath dm_mod video sbs button battery backlight ac snd_intel8x0 snd_ac97_codec ac97_bus snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm sg snd_timer shpchp snd soundcore tg3 i2c_i801 piix i2c_core snd_page_alloc ide_cd cdrom serio_raw ata_piix libata sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd
> Pid: 3996, comm: sh Not tainted 2.6.22 #9
> RIP: 0010:[<ffffffff8105d4e7>] [<ffffffff8105d4e7>] __alloc_pages+0x2f/0x2c3
> RSP: 0018:ffff810071563e48 EFLAGS: 00010246
> RAX: 0000000000000000 RBX: 000000e8d4a51000 RCX: 0000000000000000
> RDX: ffff810071563fd8 RSI: 0000000000000009 RDI: 00000000000242d2
> RBP: 00000000000242d2 R08: 0000000000000000 R09: 0000000000043a01
> R10: ffff810000e88000 R11: ffffffff81072a02 R12: 0000000000001862
> R13: ffff810070c74ea0 R14: 0000000000000009 R15: 00002adc51042000
> FS: 00002adc4db3ddb0(0000) GS:ffffffff81312000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 000000000000186a CR3: 000000007d963000 CR4: 00000000000006e0
> Process sh (pid: 3996, threadinfo ffff810071562000, task ffff810070c74ea0)
> Stack: 00000010000242d2 ffff810000e88000 ffff810070e76710 ffffffff812e31e0
> ffffffffffffffff 000000e8d4a51000 ffffffffffffffff ffff810077d11b00
> 000000000000000e ffff810071563f50 00002adc51042000 ffffffff81071a67
> Call Trace:
> [<ffffffff81071a67>] alloc_fresh_huge_page+0x98/0xe7
> [<ffffffff8107263b>] hugetlb_sysctl_handler+0x14/0xf1
> [<ffffffff810bdad9>] proc_sys_write+0x7c/0xa6
> [<ffffffff8108074b>] vfs_write+0xad/0x156
> [<ffffffff81080ced>] sys_write+0x45/0x6e
> [<ffffffff81009c9c>] tracesys+0xdc/0xe1
>
>
> Code: 49 83 7c 24 08 00 75 0e 48 c7 44 24 08 00 00 00 00 e9 6a 02
> RIP [<ffffffff8105d4e7>] __alloc_pages+0x2f/0x2c3
> RSP <ffff810071563e48>
> CR2: 000000000000186a
>
> >From your patch, if we dont the lock, the race condition maybe occur at
> next_node().
>
> On 2007-07-17 21:35, Hugh Dickins wrote:
> > On Tue, 17 Jul 2007, Andrew Morton wrote:
> > >
> > > Given that we've now gone and added deliberate-but-we-hope-benign
> > > races into this code, an elaborate comment which explains and justifies
> > > it all is pretty much obligatory, IMO.
> >
> > [PATCH] Remove nid_lock from alloc_fresh_huge_page
> >
> > The fix to that race in alloc_fresh_huge_page() which could give an illegal
> > node ID did not need nid_lock at all: the fix was to replace static int nid
> > by static int prev_nid and do the work on local int nid. nid_lock did make
> > sure that racers strictly roundrobin the nodes, but that's not something we
> > need to enforce strictly. Kill nid_lock.
> >
> > Signed-off-by: Hugh Dickins <hugh@...itas.com>
> > ---
> > mm/hugetlb.c | 10 +++++++---
> > 1 file changed, 7 insertions(+), 3 deletions(-)
> >
> > --- 2.6.22-git9/mm/hugetlb.c 2007-07-17 20:29:33.000000000 +0100
> > +++ linux/mm/hugetlb.c 2007-07-17 21:29:58.000000000 +0100
> > @@ -107,15 +107,19 @@ static int alloc_fresh_huge_page(void)
> > {
> > static int prev_nid;
> > struct page *page;
> > - static DEFINE_SPINLOCK(nid_lock);
> > int nid;
> >
> > - spin_lock(&nid_lock);
> > + /*
> > + * Copy static prev_nid to local nid, work on that, then copy it
> > + * back to prev_nid afterwards: otherwise there's a window in which
> > + * a racer might pass invalid nid MAX_NUMNODES to alloc_pages_node.
> > + * But we don't need to use a spin_lock here: it really doesn't
> > + * matter if occasionally a racer chooses the same nid as we do.
> > + */
> > nid = next_node(prev_nid, node_online_map);
> > if (nid == MAX_NUMNODES)
> > nid = first_node(node_online_map);
> > prev_nid = nid;
> > - spin_unlock(&nid_lock);
> >
> > page = alloc_pages_node(nid, htlb_alloc_mask|__GFP_COMP|__GFP_NOWARN,
> > HUGETLB_PAGE_ORDER);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists