lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 9 Nov 2010 22:36:06 +0100
From:	Andrea Arcangeli <aarcange@...hat.com>
To:	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>
Cc:	linux-mm@...ck.org, Linus Torvalds <torvalds@...ux-foundation.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	linux-kernel@...r.kernel.org,
	Marcelo Tosatti <mtosatti@...hat.com>,
	Adam Litke <agl@...ibm.com>, Avi Kivity <avi@...hat.com>,
	Hugh Dickins <hugh.dickins@...cali.co.uk>,
	Rik van Riel <riel@...hat.com>, Mel Gorman <mel@....ul.ie>,
	Dave Hansen <dave@...ux.vnet.ibm.com>,
	Benjamin Herrenschmidt <benh@...nel.crashing.org>,
	Ingo Molnar <mingo@...e.hu>, Mike Travis <travis@....com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
	Christoph Lameter <cl@...ux-foundation.org>,
	Chris Wright <chrisw@...s-sol.org>, bpicco@...hat.com,
	Balbir Singh <balbir@...ux.vnet.ibm.com>,
	"Michael S. Tsirkin" <mst@...hat.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Johannes Weiner <hannes@...xchg.org>,
	Daisuke Nishimura <nishimura@....nes.nec.co.jp>,
	Chris Mason <chris.mason@...cle.com>,
	Borislav Petkov <bp@...en8.de>
Subject: Re: [PATCH 02 of 66] mm, migration: Fix race between
 shift_arg_pages and rmap_walk by guaranteeing rmap_walk finds PTEs created
 within the temporary stack

On Tue, Nov 09, 2010 at 12:01:55PM +0900, KOSAKI Motohiro wrote:
> Hi
> 
> > From: Andrea Arcangeli <aarcange@...hat.com>
> > 
> > Page migration requires rmap to be able to find all migration ptes
> > created by migration. If the second rmap_walk clearing migration PTEs
> > misses an entry, it is left dangling causing a BUG_ON to trigger during
> > fault. For example;
> > 
> > [  511.201534] kernel BUG at include/linux/swapops.h:105!
> > [  511.201534] invalid opcode: 0000 [#1] PREEMPT SMP
> > [  511.201534] last sysfs file: /sys/block/sde/size
> > [  511.201534] CPU 0
> > [  511.201534] Modules linked in: kvm_amd kvm dm_crypt loop i2c_piix4 serio_raw tpm_tis shpchp evdev tpm i2c_core pci_hotplug tpm_bios wmi processor button ext3 jbd mbcache dm_mirror dm_region_hash dm_log dm_snapshot dm_mod raid10 raid456 async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy async_tx raid1 raid0 multipath linear md_mod sg sr_mod cdrom sd_mod ata_generic ahci libahci libata ide_pci_generic ehci_hcd ide_core r8169 mii ohci_hcd scsi_mod floppy thermal fan thermal_sys
> > [  511.888526]
> > [  511.888526] Pid: 20431, comm: date Not tainted 2.6.34-rc4-mm1-fix-swapops #6 GA-MA790GP-UD4H/GA-MA790GP-UD4H
> > [  511.888526] RIP: 0010:[<ffffffff811094ff>]  [<ffffffff811094ff>] migration_entry_wait+0xc1/0x129
> 
> Do you mean current linus-tree code is broken? do we need to merge this
> ASAP and need to backport stable tree?

No current upstream is not broken here, but it's required with THP
unless I add the below block to huge_memory.c rmap walks too. I am
afraid to forget special cases like the below in the long term, so the
fewer special/magic cases there are in the kernel to remember forever
the better for me, as it's less likely I introduce hard-to-reproduce
bugs later on in the kernel when I add more rmap walks.

-
-               /*
-                * During exec, a temporary VMA is setup and later
-               moved.
-                * The VMA is moved under the anon_vma lock but not
-               the
-                * page tables leading to a race where migration
-               cannot
-                * find the migration ptes. Rather than increasing the
-                * locking requirements of exec(), migration skips
-                * temporary VMAs until after exec() completes.
-                */
-               if (PAGE_MIGRATION && (flags & TTU_MIGRATION) &&
-                               is_vma_temporary_stack(vma))
-                       continue;
-

Now if Linus tells me to drop this patch, I will add the above block
to huge_memory.c, and then I can drop this 2/66 patch immediately
(hopefully the user stack will never be born as an hugepage or the
randomizing mremap during execve will be executed with perfect 2m
aligned offsets or then it wouldn't work anymore, but for now I can
safely remove this patch and add the above block to huge_memory.c).

I see the ugliness in exec.c, but I think we're less likely to have
bugs in the kernel in the future without the above magic in every
remap walk that needs to be accurate (like huge_memory.c and
migrate.c), the rmap walks are complicated enough already for my
taste. But I'm totally fine either ways, as I think I can remember.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists