linux-kernel - Re: [HMM 12/15] mm/migrate: new memory migration helper for use with device memory v4

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ff6cb2b9-b930-afad-1a1f-1c437eced3cf@nvidia.com>
Date:   Mon, 10 Jul 2017 16:44:38 -0700
From:   Evgeny Baskakov <ebaskakov@...dia.com>
To:     Jerome Glisse <jglisse@...hat.com>
CC:     "akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "linux-mm@...ck.org" <linux-mm@...ck.org>,
        John Hubbard <jhubbard@...dia.com>,
        David Nellans <dnellans@...dia.com>,
        Mark Hairgrove <mhairgrove@...dia.com>,
        Sherry Cheung <SCheung@...dia.com>,
        Subhash Gutti <sgutti@...dia.com>
Subject: Re: [HMM 12/15] mm/migrate: new memory migration helper for use with
 device memory v4

On 6/30/17 5:57 PM, Jerome Glisse wrote:

...

Hi Jerome,

I am working on a sporadic data corruption seen in highly contented use 
cases. So far, I've been able to re-create a sporadic hang that happens 
when multiple threads compete to migrate the same page to and from 
device memory. The reproducer uses only the dummy driver from hmm-next.

Please find attached. This is how it hangs on my 12-core Intel i7-5930K 
SMT system:

&&& 2 migrate threads, 2 read threads: STARTING
(EE:84) hmm_buffer_mirror_read error -1
&&& 2 migrate threads, 2 read threads: PASSED
&&& 2 migrate threads, 3 read threads: STARTING
&&& 2 migrate threads, 3 read threads: PASSED
&&& 2 migrate threads, 4 read threads: STARTING
&&& 2 migrate threads, 4 read threads: PASSED
&&& 3 migrate threads, 2 read threads: STARTING

The kernel log (also attached) shows multiple threads blocked in 
hmm_vma_fault() and migrate_vma():

[  139.054907] sanity_rmem004  D13528  3997   3818 0x00000000
[  139.054912] Call Trace:
[  139.054914]  __schedule+0x20b/0x6c0
[  139.054916]  schedule+0x36/0x80
[  139.054920]  io_schedule+0x16/0x40
[  139.054923]  __lock_page+0xf2/0x130
[  139.054929]  migrate_vma+0x48a/0xee0
[  139.054933]  dummy_migrate.isra.10+0xd9/0x110 [hmm_dmirror]
[  139.054945]  dummy_fops_unlocked_ioctl+0x1e8/0x330 [hmm_dmirror]
[  139.054954]  do_vfs_ioctl+0x96/0x5a0
[  139.054957]  SyS_ioctl+0x79/0x90
[  139.054960]  entry_SYSCALL_64_fastpath+0x13/0x94
...
[  139.055067] sanity_rmem004  D13136  3999   3818 0x00000000
[  139.055072] Call Trace:
[  139.055074]  __schedule+0x20b/0x6c0
[  139.055076]  schedule+0x36/0x80
[  139.055079]  io_schedule+0x16/0x40
[  139.055083]  wait_on_page_bit+0xee/0x120
[  139.055089]  __migration_entry_wait+0xe8/0x190
[  139.055091]  migration_entry_wait+0x5f/0x70
[  139.055094]  do_swap_page+0x4c7/0x4e0
[  139.055096]  __handle_mm_fault+0x347/0x9d0
[  139.055099]  handle_mm_fault+0x88/0x150
[  139.055103]  hmm_vma_walk_clear+0x8f/0xd0
[  139.055105]  hmm_vma_walk_pmd+0x1ba/0x250
[  139.055109]  __walk_page_range+0x1e8/0x420
[  139.055112]  walk_page_range+0x73/0xf0
[  139.055114]  hmm_vma_fault+0x180/0x260
[  139.055121]  dummy_fault+0xda/0x1f0 [hmm_dmirror]
[  139.055138]  dummy_fops_unlocked_ioctl+0x12c/0x330 [hmm_dmirror]
[  139.055142]  do_vfs_ioctl+0x96/0x5a0
[  139.055145]  SyS_ioctl+0x79/0x90
[  139.055148]  entry_SYSCALL_64_fastpath+0x13/0x94

Please compile and run the attached program this way:

$ ./build.sh
$ sudo ./kload.sh
$ sudo ./run.sh

Thanks!

Evgeny Baskakov
NVIDIA


Download attachment "sanity_rmem004_repeated_faults_threaded.tgz" of type "application/x-gzip" (5265 bytes)

View attachment "kernel.log" of type "text/plain" (8482 bytes)