[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d72d044c-0c24-4899-ac9c-3e1a24e9b85f@redhat.com>
Date: Mon, 22 Jul 2024 15:23:49 +0200
From: David Hildenbrand <david@...hat.com>
To: Hillf Danton <hdanton@...a.com>
Cc: syzbot <syzbot+ec4b7d82bb051330f15a@...kaller.appspotmail.com>,
hughd@...gle.com, linux-kernel@...r.kernel.org, linux-mm@...ck.org,
syzkaller-bugs@...glegroups.com, Matthew Wilcox <willy@...radead.org>,
Vivek Kasireddy <vivek.kasireddy@...el.com>
Subject: Re: [syzbot] [mm?] BUG: Bad page map (8)
On 20.07.24 07:02, Hillf Danton wrote:
> On Fri, 19 Jul 2024 13:21:30 +0200 David Hildenbrand <david@...hat.com>
>> On 19.07.24 00:51, syzbot wrote:
>>> Hello,
>>>
>>> syzbot found the following issue on:
>>>
>>> HEAD commit: 4d145e3f830b Merge tag 'i2c-for-6.10-rc8' of git://git.ker..
>>> git tree: upstream
>>> console output: https://syzkaller.appspot.com/x/log.txt?x=11321495980000
>>> kernel config: https://syzkaller.appspot.com/x/.config?x=6b5a15443200e31
>>> dashboard link: https://syzkaller.appspot.com/bug?extid=ec4b7d82bb051330f15a
>>> compiler: aarch64-linux-gnu-gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
>>> userspace arch: arm64
>>> syz repro: https://syzkaller.appspot.com/x/repro.syz?x=113e054e980000
>>> C reproducer: https://syzkaller.appspot.com/x/repro.c?x=1366ab85980000
>>>
>>
>> The reproducer involves udmabuf. I suspect it has to do with it.
>>
>> But I'm curius, does the reproducer not trigger before 4d145e3f830b on
>> mainliny?
>>
>> Viveks changes are not upstream yet, but I can only speculate that we
>> have some issue similar to the one we had with hugetlb: udmabuf doing
>> things with memfd/shmem pages that it shouldn't do, because it doesn't
>> "own" these pages.
>>
>> "udmabuf: Use vmf_insert_pfn and VM_PFNMAP for handling mmap" might help.
>
> cpu1 cpu2
> --- ---
> evict() find folio2 in page cache
> truncate_inode_folio()
> truncate_cleanup_folio();
> // unmap folio2 from mmA
> unmap_mapping_folio(folio2);
> mmap folio2 to mmB
> filemap_remove_folio(folio2);
>
>
> If the window exists for mapping folio to userspace while indoe is evicted,
> is this report false positive?
I think what happens here is that filemap_unaccount_folio() will force
the mapcount to be logically 0 (value -1).
And if we then actually go ahead and unmap that folio from our udmabuf
page tables, we will let it go negative (and also free up the refcount
too early) resulting in all kinds of issues.
filemap_unaccount_folio() was written under the assumption that the
mapcount will only get modified when we map something via the pagecache,
not when some other code (udmabuf) looked up something from the
pagecache and then maps it to user space itself.
"Fortunately", the issue only exists with CONFIG_DEBUG_VM.
The right fix is probably to stop udmabuf from touching the mapcount
(use a PFNMAP as that patch does). Another fix would be removing that
debugging code from filemap_unaccount_folio().
I do see value in part of that debugging code. The refcount+mapcount
modifications, not so much. But the "BUG: Bad page cache in process ..."
message sounds helpful.
--
Cheers,
David / dhildenb
Powered by blists - more mailing lists