[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ff6cb2b9-b930-afad-1a1f-1c437eced3cf@nvidia.com>
Date: Mon, 10 Jul 2017 16:44:38 -0700
From: Evgeny Baskakov <ebaskakov@...dia.com>
To: Jerome Glisse <jglisse@...hat.com>
CC: "akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linux-mm@...ck.org" <linux-mm@...ck.org>,
John Hubbard <jhubbard@...dia.com>,
David Nellans <dnellans@...dia.com>,
Mark Hairgrove <mhairgrove@...dia.com>,
Sherry Cheung <SCheung@...dia.com>,
Subhash Gutti <sgutti@...dia.com>
Subject: Re: [HMM 12/15] mm/migrate: new memory migration helper for use with
device memory v4
On 6/30/17 5:57 PM, Jerome Glisse wrote:
...
Hi Jerome,
I am working on a sporadic data corruption seen in highly contented use
cases. So far, I've been able to re-create a sporadic hang that happens
when multiple threads compete to migrate the same page to and from
device memory. The reproducer uses only the dummy driver from hmm-next.
Please find attached. This is how it hangs on my 12-core Intel i7-5930K
SMT system:
&&& 2 migrate threads, 2 read threads: STARTING
(EE:84) hmm_buffer_mirror_read error -1
&&& 2 migrate threads, 2 read threads: PASSED
&&& 2 migrate threads, 3 read threads: STARTING
&&& 2 migrate threads, 3 read threads: PASSED
&&& 2 migrate threads, 4 read threads: STARTING
&&& 2 migrate threads, 4 read threads: PASSED
&&& 3 migrate threads, 2 read threads: STARTING
The kernel log (also attached) shows multiple threads blocked in
hmm_vma_fault() and migrate_vma():
[ 139.054907] sanity_rmem004 D13528 3997 3818 0x00000000
[ 139.054912] Call Trace:
[ 139.054914] __schedule+0x20b/0x6c0
[ 139.054916] schedule+0x36/0x80
[ 139.054920] io_schedule+0x16/0x40
[ 139.054923] __lock_page+0xf2/0x130
[ 139.054929] migrate_vma+0x48a/0xee0
[ 139.054933] dummy_migrate.isra.10+0xd9/0x110 [hmm_dmirror]
[ 139.054945] dummy_fops_unlocked_ioctl+0x1e8/0x330 [hmm_dmirror]
[ 139.054954] do_vfs_ioctl+0x96/0x5a0
[ 139.054957] SyS_ioctl+0x79/0x90
[ 139.054960] entry_SYSCALL_64_fastpath+0x13/0x94
...
[ 139.055067] sanity_rmem004 D13136 3999 3818 0x00000000
[ 139.055072] Call Trace:
[ 139.055074] __schedule+0x20b/0x6c0
[ 139.055076] schedule+0x36/0x80
[ 139.055079] io_schedule+0x16/0x40
[ 139.055083] wait_on_page_bit+0xee/0x120
[ 139.055089] __migration_entry_wait+0xe8/0x190
[ 139.055091] migration_entry_wait+0x5f/0x70
[ 139.055094] do_swap_page+0x4c7/0x4e0
[ 139.055096] __handle_mm_fault+0x347/0x9d0
[ 139.055099] handle_mm_fault+0x88/0x150
[ 139.055103] hmm_vma_walk_clear+0x8f/0xd0
[ 139.055105] hmm_vma_walk_pmd+0x1ba/0x250
[ 139.055109] __walk_page_range+0x1e8/0x420
[ 139.055112] walk_page_range+0x73/0xf0
[ 139.055114] hmm_vma_fault+0x180/0x260
[ 139.055121] dummy_fault+0xda/0x1f0 [hmm_dmirror]
[ 139.055138] dummy_fops_unlocked_ioctl+0x12c/0x330 [hmm_dmirror]
[ 139.055142] do_vfs_ioctl+0x96/0x5a0
[ 139.055145] SyS_ioctl+0x79/0x90
[ 139.055148] entry_SYSCALL_64_fastpath+0x13/0x94
Please compile and run the attached program this way:
$ ./build.sh
$ sudo ./kload.sh
$ sudo ./run.sh
Thanks!
Evgeny Baskakov
NVIDIA
Download attachment "sanity_rmem004_repeated_faults_threaded.tgz" of type "application/x-gzip" (5265 bytes)
View attachment "kernel.log" of type "text/plain" (8482 bytes)
Powered by blists - more mailing lists