linux-kernel - Solved: [PATCH 0/4] mm/gup, drm/i915: refactor gup_fast, convert to pin_user

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <7d79c089-7b21-cf7f-66ea-078d44c5e007@nvidia.com>
Date:   Thu, 21 May 2020 13:40:39 -0700
From:   John Hubbard <jhubbard@...dia.com>
To:     Chris Wilson <chris@...is-wilson.co.uk>,
        Andrew Morton <akpm@...ux-foundation.org>
CC:     Souptick Joarder <jrdr.linux@...il.com>,
        Matthew Wilcox <willy@...radead.org>,
        Jani Nikula <jani.nikula@...ux.intel.com>,
        Joonas Lahtinen <joonas.lahtinen@...ux.intel.com>,
        Rodrigo Vivi <rodrigo.vivi@...el.com>,
        David Airlie <airlied@...ux.ie>,
        Daniel Vetter <daniel@...ll.ch>,
        Tvrtko Ursulin <tvrtko.ursulin@...el.com>,
        Matthew Auld <matthew.auld@...el.com>,
        <intel-gfx@...ts.freedesktop.org>,
        <dri-devel@...ts.freedesktop.org>,
        LKML <linux-kernel@...r.kernel.org>, <linux-mm@...ck.org>
Subject: Solved: [PATCH 0/4] mm/gup, drm/i915: refactor gup_fast, convert to
 pin_user_pages()

On 2020-05-21 12:11, John Hubbard wrote:
> On 2020-05-21 11:57, Chris Wilson wrote:
>> Quoting John Hubbard (2020-05-19 01:21:20)
>>> This needs to go through Andrew's -mm tree, due to adding a new gup.c
>>> routine. However, I would really love to have some testing from the
>>> drm/i915 folks, because I haven't been able to run-time test that part
>>> of it.
>>
>> CI hit
>>
>> <4> [185.667750] WARNING: CPU: 0 PID: 1387 at mm/gup.c:2699 
>> internal_get_user_pages_fast+0x63a/0xac0


OK, what happened here is that it's WARN()'ing due to passing in the new
FOLL_FAST_ONLY flag, which was not added to the whitelist.

So the fix is easy, and should be applied to the refactoring patch. I'll
send out a v2 of the series, which will effectively have this applied:


diff --git a/mm/gup.c b/mm/gup.c
index 6cbe98c93466..4f0ca3f849d1 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -2696,7 +2696,8 @@ static int internal_get_user_pages_fast(unsigned long start, 
int nr_pages,
  	int nr_pinned = 0, ret = 0;

  	if (WARN_ON_ONCE(gup_flags & ~(FOLL_WRITE | FOLL_LONGTERM |
-				       FOLL_FORCE | FOLL_PIN | FOLL_GET)))
+				       FOLL_FORCE | FOLL_PIN | FOLL_GET |
+				       FOLL_FAST_ONLY)))
  		return -EINVAL;

  	start = untagged_addr(start) & PAGE_MASK;


>> <4> [185.667752] Modules linked in: vgem snd_hda_codec_hdmi snd_hda_codec_realtek 
>> snd_hda_codec_generic i915 mei_hdcp x86_pkg_temp_thermal coretemp snd_hda_intel 
>> snd_intel_dspcfg crct10dif_pclmul snd_hda_codec crc32_pclmul snd_hwdep snd_hda_core 
>> ghash_clmulni_intel cdc_ether usbnet mii snd_pcm e1000e mei_me ptp pps_core mei 
>> intel_lpss_pci prime_numbers
>> <4> [185.667774] CPU: 0 PID: 1387 Comm: gem_userptr_bli Tainted: G     U            
>> 5.7.0-rc5-CI-Patchwork_17704+ #1
>> <4> [185.667777] Hardware name: Intel Corporation Ice Lake Client Platform/IceLake 
>> U DDR4 SODIMM PD RVP, BIOS ICLSFWR1.R00.3234.A01.1906141750 06/14/2019
>> <4> [185.667782] RIP: 0010:internal_get_user_pages_fast+0x63a/0xac0
>> <4> [185.667785] Code: 24 40 08 48 39 5c 24 38 49 89 df 0f 85 74 fc ff ff 48 83 44 
>> 24 50 08 48 39 5c 24 58 49 89 dc 0f 85 e0 fb ff ff e9 14 fe ff ff <0f> 0b b8 ea ff 
>> ff ff e9 36 fb ff ff 4c 89 e8 48 21 e8 48 39 e8 0f
>> <4> [185.667789] RSP: 0018:ffffc90001133c38 EFLAGS: 00010206
>> <4> [185.667792] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff8884999ee800
>> <4> [185.667795] RDX: 00000000000c0001 RSI: 0000000000000100 RDI: 00007f419e774000
>> <4> [185.667798] RBP: ffff888453dbf040 R08: 0000000000000000 R09: 0000000000000001
>> <4> [185.667800] R10: 0000000000000000 R11: 0000000000000000 R12: ffff888453dbf380
>> <4> [185.667803] R13: ffff8884999ee800 R14: ffff888453dbf3e8 R15: 0000000000000040
>> <4> [185.667806] FS:  00007f419e875e40(0000) GS:ffff88849fe00000(0000) 
>> knlGS:0000000000000000
>> <4> [185.667808] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> <4> [185.667811] CR2: 00007f419e873000 CR3: 0000000458bd2004 CR4: 0000000000760ef0
>> <4> [185.667814] PKRU: 55555554
>> <4> [185.667816] Call Trace:
>> <4> [185.667912]  ? i915_gem_userptr_get_pages+0x1c6/0x290 [i915]
>> <4> [185.667918]  ? mark_held_locks+0x49/0x70
>> <4> [185.667998]  ? i915_gem_userptr_get_pages+0x1c6/0x290 [i915]
>> <4> [185.668073]  ? i915_gem_userptr_get_pages+0x1c6/0x290 [i915]
>>
>> and then panicked, across a range of systems.
>> -Chris
>>

btw, the panic seems to indicate an additional, pre-existing problem:
i915_gem_userptr_get_pages(), in this case at least, is not able to
recover from a get_user_pages/pin_user_pages failure.


thanks,
-- 
John Hubbard
NVIDIA