linux-kernel - Re: [PATCH] mm: Fix mmap_assert_locked() in follow

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <ZpE-1nBtaxuSAiqD@x1n>
Date: Fri, 12 Jul 2024 10:33:58 -0400
From: Peter Xu <peterx@...hat.com>
To: David Wang <00107082@....com>
Cc: akpm@...ux-foundation.org, david@...hat.com,
	linux-kernel-mentees@...ts.linuxfoundation.org,
	linux-kernel@...r.kernel.org, linux-mm@...ck.org,
	peili.dev@...il.com, skhan@...uxfoundation.org,
	syzbot+35a4414f6e247f515443@...kaller.appspotmail.com,
	syzkaller-bugs@...glegroups.com
Subject: Re: [PATCH] mm: Fix mmap_assert_locked() in follow_pte()

On Fri, Jul 12, 2024 at 09:19:31PM +0800, David Wang wrote:
> Hi,
> 
> > Ah yes, I had one rfc patch for that, I temporarily put that aside as it
> > seemed nobody cared except myself.. it's here:
> > 
> > https://lore.kernel.org/all/20240523223745.395337-2-peterx@redhat.com
> > 
> > I didn't know it can already cause real trouble.  It looks like that patch
> > should fix this.
> > 
> > Thanks,
> > 
> > -- 
> > Peter Xu
> 
> Just add another user scenario concering this kernel warning.
> Ever since 6.10-rc1, when I suspend my system via `systemctl suspend`, nvidia gpu driver would trigger a warning:
> 
>              	 Call Trace:
>              	  <TASK>
>              	  ? __warn+0x7c/0x120
>              	  ? follow_pte+0x15b/0x170
>              	  ? report_bug+0x18d/0x1c0
>              	  ? handle_bug+0x3c/0x80
>              	  ? exc_invalid_op+0x13/0x60
>              	  ? asm_exc_invalid_op+0x16/0x20
>              	  ? follow_pte+0x15b/0x170
>              	  follow_phys+0x3a/0xf0
>              	  untrack_pfn+0x53/0x120
>              	  unmap_single_vma+0xa6/0xe0
>              	  zap_page_range_single+0xe4/0x190
>              	  ? _nv002569kms+0x17b/0x210 [nvidia_modeset]
>              	  ? srso_return_thunk+0x5/0x5f
>              	  ? kfree+0x257/0x290
>              	  unmap_mapping_range+0x10d/0x130
>              	  nv_revoke_gpu_mappings_locked+0x43/0x70 [nvidia]
>              	  nv_set_system_power_state+0x1c9/0x470 [nvidia]
>              	  nv_procfs_write_suspend+0xd3/0x140 [nvidia]
>              	  proc_reg_write+0x58/0xa0
>              	  ? srso_return_thunk+0x5/0x5f
>              	  vfs_write+0xf6/0x440
>              	  ? __count_memcg_events+0x73/0x110
>              	  ? srso_return_thunk+0x5/0x5f
>              	  ? count_memcg_events.constprop.0+0x1a/0x30
>              	  ? srso_return_thunk+0x5/0x5f
>              	  ? handle_mm_fault+0xa9/0x2d0
>              	  ? srso_return_thunk+0x5/0x5f
>              	  ? preempt_count_add+0x47/0xa0
>              	  ksys_write+0x63/0xe0
>              	  do_syscall_64+0x4b/0x110
>              	  entry_SYSCALL_64_after_hwframe+0x76/0x7e
>              	 RIP: 0033:0x7f34a3914240
>              	 Code: 40 00 48 8b 15 c1 9b 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 80 3d a1 23 0e 00 00 74 17 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 48 83 ec 28 48 89
>              	 RSP: 002b:00007ffca2aa2688 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
>              	 RAX: ffffffffffffffda RBX: 0000000000000008 RCX: 00007f34a3914240
>              	 RDX: 0000000000000008 RSI: 000055a02968ed80 RDI: 0000000000000001
>              	 RBP: 000055a02968ed80 R08: 00007f34a39eecd0 R09: 00007f34a39eecd0
>              	 R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000008
>              	 R13: 00007f34a39ef760 R14: 0000000000000008 R15: 00007f34a39ea9e0
>              	  </TASK>
>              	 ---[ end trace 0000000000000000 ]---
>              	 PM: suspend entry (deep)
> 
> Considering out-of-tree nature of nvidia gpu driver, and nobody reported this kernel warning before with in-trees,
>  I had almost convinced myself that nvidia driver may need "big" rework to live with those "PTE" changes.
> So glad to see this thread of discussion/issue/fix now, I have been patching my system manually ever since 6.10-rc1,
> hope things got fixed soon...

Yep this is a similar file truncation path.  I'll repost my previous rfc
patch separately soon.

Thanks,

-- 
Peter Xu