lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20191222132357.GF13335@unreal>
Date:   Sun, 22 Dec 2019 15:23:57 +0200
From:   Leon Romanovsky <leon@...nel.org>
To:     John Hubbard <jhubbard@...dia.com>
Cc:     Jason Gunthorpe <jgg@...pe.ca>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Al Viro <viro@...iv.linux.org.uk>,
        Alex Williamson <alex.williamson@...hat.com>,
        Benjamin Herrenschmidt <benh@...nel.crashing.org>,
        Björn Töpel <bjorn.topel@...el.com>,
        Christoph Hellwig <hch@...radead.org>,
        Dan Williams <dan.j.williams@...el.com>,
        Daniel Vetter <daniel@...ll.ch>,
        Dave Chinner <david@...morbit.com>,
        David Airlie <airlied@...ux.ie>,
        "David S . Miller" <davem@...emloft.net>,
        Ira Weiny <ira.weiny@...el.com>, Jan Kara <jack@...e.cz>,
        Jens Axboe <axboe@...nel.dk>, Jonathan Corbet <corbet@....net>,
        Jérôme Glisse <jglisse@...hat.com>,
        Magnus Karlsson <magnus.karlsson@...el.com>,
        Mauro Carvalho Chehab <mchehab@...nel.org>,
        Michael Ellerman <mpe@...erman.id.au>,
        Michal Hocko <mhocko@...e.com>,
        Mike Kravetz <mike.kravetz@...cle.com>,
        Paul Mackerras <paulus@...ba.org>,
        Shuah Khan <shuah@...nel.org>,
        Vlastimil Babka <vbabka@...e.cz>, bpf@...r.kernel.org,
        dri-devel@...ts.freedesktop.org, kvm@...r.kernel.org,
        linux-block@...r.kernel.org, linux-doc@...r.kernel.org,
        linux-fsdevel@...r.kernel.org, linux-kselftest@...r.kernel.org,
        linux-media@...r.kernel.org, linux-rdma@...r.kernel.org,
        linuxppc-dev@...ts.ozlabs.org, netdev@...r.kernel.org,
        linux-mm@...ck.org, LKML <linux-kernel@...r.kernel.org>,
        Maor Gottlieb <maorg@...lanox.com>,
        Ran Rozenstein <ranro@...lanox.com>
Subject: Re: [PATCH v11 00/25] mm/gup: track dma-pinned pages: FOLL_PIN

On Fri, Dec 20, 2019 at 03:54:55PM -0800, John Hubbard wrote:
> On 12/20/19 10:29 AM, Leon Romanovsky wrote:
> ...
> >> $ ./build.sh
> >> $ build/bin/run_tests.py
> >>
> >> If you get things that far I think Leon can get a reproduction for you
> >
> > I'm not so optimistic about that.
> >
>
> OK, I'm going to proceed for now on the assumption that I've got an overflow
> problem that happens when huge pages are pinned. If I can get more information,
> great, otherwise it's probably enough.
>
> One thing: for your repro, if you know the huge page size, and the system
> page size for that case, that would really help. Also the number of pins per
> page, more or less, that you'd expect. Because Jason says that only 2M huge
> pages are used...
>
> Because the other possibility is that the refcount really is going negative,
> likely due to a mismatched pin/unpin somehow.
>
> If there's not an obvious repro case available, but you do have one (is it easy
> to repro, though?), then *if* you have the time, I could point you to a github
> branch that reduces GUP_PIN_COUNTING_BIAS by, say, 4x, by applying this:
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index bb44c4d2ada7..8526fd03b978 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1077,7 +1077,7 @@ static inline void put_page(struct page *page)
>   * get_user_pages and page_mkclean and other calls that race to set up page
>   * table entries.
>   */
> -#define GUP_PIN_COUNTING_BIAS (1U << 10)
> +#define GUP_PIN_COUNTING_BIAS (1U << 8)
>
>  void unpin_user_page(struct page *page);
>  void unpin_user_pages_dirty_lock(struct page **pages, unsigned long npages,
>
> If that fails to repro, then we would be zeroing in on the root cause.
>
> The branch is here (I just tested it and it seems healthy):
>
> git@...hub.com:johnhubbard/linux.git  pin_user_pages_tracking_v11_with_diags

Hi,

We tested the following branch and here comes results:
[root@...ver consume_mtts]# (master) $ grep foll_pin /proc/vmstat
nr_foll_pin_requested 0
nr_foll_pin_returned 0

[root@...er consume_mtts]# (master) $ dmesg
[  425.221459] ------------[ cut here ]------------
[  425.225894] WARNING: CPU: 1 PID: 6738 at mm/gup.c:61 try_grab_compound_head+0x90/0xa0
[  425.228021] Modules linked in: mlx5_ib mlx5_core mlxfw mlx4_ib mlx4_en ptp pps_core mlx4_core bonding ip6_gre ip6_tunnel tunnel6 ip_gre gre ip_tunnel rdma_rxe ip6_udp_tunnel udp_tunnel rdma_ucm ib_uverbs ib_ipoib ib_umad ib_srp scsi_transport_srp rpcrdma ib_iser libiscsi scsi_transport_iscsi rdma_cm iw_cm ib_cm ib_core [last unloaded: mlxfw]
[  425.235266] CPU: 1 PID: 6738 Comm: consume_mtts Tainted: G           O      5.5.0-rc2+ #1
[  425.237480] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
[  425.239738] RIP: 0010:try_grab_compound_head+0x90/0xa0
[  425.241170] Code: 06 48 8d 4f 34 f0 0f b1 57 34 74 cd 85 c0 74 cf 8d 14 06 f0 0f b1 11 74 c0 eb f1 8d 14 06 f0 0f b1 11 74 b5 85 c0 75 f3 eb b5 <0f> 0b 31 c0 c3 90 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41
[  425.245739] RSP: 0018:ffffc900006878a8 EFLAGS: 00010082
[  425.247124] RAX: 0000000080000001 RBX: 00007f780488a000 RCX: 0000000000000bb0
[  425.248956] RDX: ffffea000e031087 RSI: 0000000000008a00 RDI: ffffea000dc58000
[  425.250761] RBP: ffffea000e031080 R08: ffffc90000687974 R09: 000fffffffe00000
[  425.252661] R10: 0000000000000000 R11: ffff888362560000 R12: 000000000000008a
[  425.254487] R13: 80000003716000e7 R14: 00007f780488a000 R15: ffffc90000687974
[  425.256309] FS:  00007f780d9d3740(0000) GS:ffff8883b1c80000(0000) knlGS:0000000000000000
[  425.258401] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  425.259949] CR2: 0000000002334048 CR3: 000000039c68c001 CR4: 00000000001606a0
[  425.261884] Call Trace:
[  425.262735]  gup_pgd_range+0x517/0x5a0
[  425.263819]  internal_get_user_pages_fast+0x210/0x250
[  425.265193]  ib_umem_get+0x298/0x550 [ib_uverbs]
[  425.266476]  mr_umem_get+0xc9/0x260 [mlx5_ib]
[  425.267699]  mlx5_ib_reg_user_mr+0xcc/0x7e0 [mlx5_ib]
[  425.269134]  ? xas_load+0x8/0x80
[  425.270074]  ? xa_load+0x48/0x90
[  425.271038]  ? lookup_get_idr_uobject.part.10+0x12/0x70 [ib_uverbs]
[  425.272757]  ib_uverbs_reg_mr+0x127/0x280 [ib_uverbs]
[  425.274120]  ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0xc2/0xf0 [ib_uverbs]
[  425.276058]  ib_uverbs_cmd_verbs.isra.6+0x5be/0xbe0 [ib_uverbs]
[  425.277657]  ? uverbs_disassociate_api+0xd0/0xd0 [ib_uverbs]
[  425.279155]  ? __alloc_pages_nodemask+0x148/0x2b0
[  425.280445]  ib_uverbs_ioctl+0xc0/0x120 [ib_uverbs]
[  425.281755]  do_vfs_ioctl+0x9d/0x650
[  425.282766]  ksys_ioctl+0x70/0x80
[  425.283745]  __x64_sys_ioctl+0x16/0x20
[  425.284912]  do_syscall_64+0x42/0x130
[  425.285973]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  425.287377] RIP: 0033:0x7f780d2df267
[  425.288449] Code: b3 66 90 48 8b 05 19 3c 2c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e9 3b 2c 00 f7 d8 64 89 01 48
[  425.293073] RSP: 002b:00007ffce49a88a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[  425.295034] RAX: ffffffffffffffda RBX: 00007ffce49a8938 RCX: 00007f780d2df267
[  425.296895] RDX: 00007ffce49a8920 RSI: 00000000c0181b01 RDI: 0000000000000003
[  425.298689] RBP: 00007ffce49a8900 R08: 0000000000000003 R09: 00007f780d9a1010
[  425.300480] R10: 00000000ffffffff R11: 0000000000000246 R12: 00007f780d9a1150
[  425.302290] R13: 00007ffce49a8900 R14: 00007ffce49a8ad8 R15: 00007f780468a000
[  425.304113] ---[ end trace 1ecbefdb403190dd ]---
[  425.305434] ------------[ cut here ]------------
[  425.307147] WARNING: CPU: 1 PID: 6738 at mm/gup.c:150 try_grab_page+0x56/0x60
[  425.309111] Modules linked in: mlx5_ib mlx5_core mlxfw mlx4_ib mlx4_en ptp pps_core mlx4_core bonding ip6_gre ip6_tunnel tunnel6 ip_gre gre ip_tunnel rdma_rxe ip6_udp_tunnel udp_tunnel rdma_ucm ib_uverbs ib_ipoib ib_umad ib_srp scsi_transport_srp rpcrdma ib_iser libiscsi scsi_transport_iscsi rdma_cm iw_cm ib_cm ib_core [last unloaded: mlxfw]
[  425.316461] CPU: 1 PID: 6738 Comm: consume_mtts Tainted: G        W  O      5.5.0-rc2+ #1
[  425.318582] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
[  425.320958] RIP: 0010:try_grab_page+0x56/0x60
[  425.322167] Code: 7e 28 f0 81 47 34 00 01 00 00 c3 48 8b 47 08 48 8d 50 ff a8 01 48 0f 45 fa 8b 47 34 85 c0 7e 0f f0 ff 47 34 b8 01 00 00 00 c3 <0f> 0b 31 c0 c3 0f 0b 31 c0 c3 0f 1f 44 00 00 41 57 41 56 41 55 41
[  425.326814] RSP: 0018:ffffc90000687830 EFLAGS: 00010282
[  425.328226] RAX: 0000000000000001 RBX: ffffea000dc58000 RCX: ffffea000e031087
[  425.330104] RDX: 0000000080000001 RSI: 0000000000040000 RDI: ffffea000dc58000
[  425.331980] RBP: 00007f7804800000 R08: 000ffffffffff000 R09: 80000003716000e7
[  425.333898] R10: ffff88834af80120 R11: ffff8883ac16f000 R12: ffff88834af80120
[  425.335704] R13: ffff88837c0915c0 R14: 0000000000050201 R15: 00007f7804800000
[  425.337638] FS:  00007f780d9d3740(0000) GS:ffff8883b1c80000(0000) knlGS:0000000000000000
[  425.339734] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  425.341369] CR2: 0000000002334048 CR3: 000000039c68c001 CR4: 00000000001606a0
[  425.343160] Call Trace:
[  425.343967]  follow_trans_huge_pmd+0x16f/0x2e0
[  425.345263]  follow_p4d_mask+0x51c/0x630
[  425.346344]  __get_user_pages+0x1a1/0x6c0
[  425.347463]  internal_get_user_pages_fast+0x17b/0x250
[  425.348918]  ib_umem_get+0x298/0x550 [ib_uverbs]
[  425.350174]  mr_umem_get+0xc9/0x260 [mlx5_ib]
[  425.351383]  mlx5_ib_reg_user_mr+0xcc/0x7e0 [mlx5_ib]
[  425.352849]  ? xas_load+0x8/0x80
[  425.353776]  ? xa_load+0x48/0x90
[  425.354730]  ? lookup_get_idr_uobject.part.10+0x12/0x70 [ib_uverbs]
[  425.356410]  ib_uverbs_reg_mr+0x127/0x280 [ib_uverbs]
[  425.357843]  ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0xc2/0xf0 [ib_uverbs]
[  425.359749]  ib_uverbs_cmd_verbs.isra.6+0x5be/0xbe0 [ib_uverbs]
[  425.361405]  ? uverbs_disassociate_api+0xd0/0xd0 [ib_uverbs]
[  425.362898]  ? __alloc_pages_nodemask+0x148/0x2b0
[  425.364206]  ib_uverbs_ioctl+0xc0/0x120 [ib_uverbs]
[  425.365564]  do_vfs_ioctl+0x9d/0x650
[  425.366567]  ksys_ioctl+0x70/0x80
[  425.367537]  __x64_sys_ioctl+0x16/0x20
[  425.368698]  do_syscall_64+0x42/0x130
[  425.369782]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  425.371117] RIP: 0033:0x7f780d2df267
[  425.372159] Code: b3 66 90 48 8b 05 19 3c 2c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e9 3b 2c 00 f7 d8 64 89 01 48
[  425.376774] RSP: 002b:00007ffce49a88a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[  425.378740] RAX: ffffffffffffffda RBX: 00007ffce49a8938 RCX: 00007f780d2df267
[  425.380598] RDX: 00007ffce49a8920 RSI: 00000000c0181b01 RDI: 0000000000000003
[  425.382411] RBP: 00007ffce49a8900 R08: 0000000000000003 R09: 00007f780d9a1010
[  425.384312] R10: 00000000ffffffff R11: 0000000000000246 R12: 00007f780d9a1150
[  425.386132] R13: 00007ffce49a8900 R14: 00007ffce49a8ad8 R15: 00007f780468a000
[  425.387964] ---[ end trace 1ecbefdb403190de ]---

Thanks

>
>
>
> thanks,
> --
> John Hubbard
> NVIDIA

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ