lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d72d044c-0c24-4899-ac9c-3e1a24e9b85f@redhat.com>
Date: Mon, 22 Jul 2024 15:23:49 +0200
From: David Hildenbrand <david@...hat.com>
To: Hillf Danton <hdanton@...a.com>
Cc: syzbot <syzbot+ec4b7d82bb051330f15a@...kaller.appspotmail.com>,
 hughd@...gle.com, linux-kernel@...r.kernel.org, linux-mm@...ck.org,
 syzkaller-bugs@...glegroups.com, Matthew Wilcox <willy@...radead.org>,
 Vivek Kasireddy <vivek.kasireddy@...el.com>
Subject: Re: [syzbot] [mm?] BUG: Bad page map (8)

On 20.07.24 07:02, Hillf Danton wrote:
> On Fri, 19 Jul 2024 13:21:30 +0200 David Hildenbrand <david@...hat.com>
>> On 19.07.24 00:51, syzbot wrote:
>>> Hello,
>>>
>>> syzbot found the following issue on:
>>>
>>> HEAD commit:    4d145e3f830b Merge tag 'i2c-for-6.10-rc8' of git://git.ker..
>>> git tree:       upstream
>>> console output: https://syzkaller.appspot.com/x/log.txt?x=11321495980000
>>> kernel config:  https://syzkaller.appspot.com/x/.config?x=6b5a15443200e31
>>> dashboard link: https://syzkaller.appspot.com/bug?extid=ec4b7d82bb051330f15a
>>> compiler:       aarch64-linux-gnu-gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
>>> userspace arch: arm64
>>> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=113e054e980000
>>> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=1366ab85980000
>>>
>>
>> The reproducer involves udmabuf. I suspect it has to do with it.
>>
>> But I'm curius, does the reproducer not trigger before 4d145e3f830b on
>> mainliny?
>>
>> Viveks changes are not upstream yet, but I can only speculate that we
>> have some issue similar to the one we had with hugetlb: udmabuf doing
>> things with memfd/shmem pages that it shouldn't do, because it doesn't
>> "own" these pages.
>>
>> "udmabuf: Use vmf_insert_pfn and VM_PFNMAP for handling mmap" might help.
> 
> 	cpu1				cpu2
> 	---				---
> 	evict()				find folio2 in page cache
> 	truncate_inode_folio()
> 	truncate_cleanup_folio();
> 	  // unmap folio2 from mmA
> 	  unmap_mapping_folio(folio2);
> 					mmap folio2 to mmB
> 	filemap_remove_folio(folio2);
> 
> 
> If the window exists for mapping folio to userspace while indoe is evicted,
> is this report false positive?

I think what happens here is that filemap_unaccount_folio() will force 
the mapcount to be logically 0 (value -1).

And if we then actually go ahead and unmap that folio from our udmabuf 
page tables, we will let it go negative (and also free up the refcount 
too early) resulting in all kinds of issues.

filemap_unaccount_folio() was written under the assumption that the 
mapcount will only get modified when we map something via the pagecache, 
not when some other code (udmabuf) looked up something from the 
pagecache and then maps it to user space itself.

"Fortunately", the issue only exists with CONFIG_DEBUG_VM.

The right fix is probably to stop udmabuf from touching the mapcount 
(use a PFNMAP as that patch does). Another fix would be removing that 
debugging code from filemap_unaccount_folio().

I do see value in part of that debugging code. The refcount+mapcount 
modifications, not so much. But the "BUG: Bad page cache in process ..." 
message sounds helpful.

-- 
Cheers,

David / dhildenb


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ