linux-kernel - Re: [syzbot] [mm?] WARNING in __folio_rmap_sanity

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <793c5cee-d5fc-4eb1-86a2-39e05686233d@redhat.com>
Date: Fri, 5 Jan 2024 17:05:23 +0100
From: David Hildenbrand <david@...hat.com>
To: Ryan Roberts <ryan.roberts@....com>, Yin Fengwei <fengwei.yin@...el.com>,
 syzbot <syzbot+50ef73537bbc393a25bb@...kaller.appspotmail.com>,
 akpm@...ux-foundation.org, linux-kernel@...r.kernel.org, linux-mm@...ck.org,
 syzkaller-bugs@...glegroups.com, Matthew Wilcox <willy@...radead.org>
Subject: Re: [syzbot] [mm?] WARNING in __folio_rmap_sanity_checks

>>
>> Think about it: we are adding/removing a page from rmap handling that can
>> *never* be looked up using the rmap because there is no rmap for these pages,
>> and folio->index is just completely unexpressive. VM_MIXEDMAP doesn't have any
>> linearity constraints.
> 
> I guess I was assuming treating it the same way as anon folios. But I guess that
> would be VeryBad (TM) because these aren't anon pages and we don't want to swap,
> etc? OK got it.

Yes, there can't and never will be an rmap walk. No rmap chains are involved.
There is no rmap concept for these folios.

That's the ugly part about all this :)

> 
>>
>> Logically, it doesn't make any sense to involve rmap code although it currently
>> might work. validate_page_before_insert() blocks off most pages where the
>> order-0 mapcount would be used for other purposes and everything would blow up.
>>
>> Looking at vm_insert_page(), this interface is only for pages the caller
>> allocated. Maybe we should just not do any rmap accounting when
>> mapping/unmapping these pages: not involve any rmap code, including mapcounts?
>>
>> vm_normal_page() works on these mappings, so we'd also have to skip rmap code
>> when unmapping these pages etc. Maybe that's the whole reason we have the rmap
>> handling here: to not special-case the unmap path.
> 
> Right. I guess it depends what vm_insert_page() is spec'ed to expect; is the bug
> in the implementation or is the caller providing the wrong type of folio? I
> guess there will be many callers providing non-rmappable folios (inc out of tree?).

I would assume that *all* are providing non-pagecache pages. And if they would
be pagecache pages, likely it would be a BUG because the rmap does not work with
VM_MIXEDMAP and we'd be breaking folio unmapping etc. (we had such a BUG recently
with udmabuf that rightfully switched to VM_PFNMAP)

As vm_insert_page() documents:

"This allows drivers to insert individual pages they've allocated into a user vma."

> 
>>
>> Alternatively, we can:
>>
>> (1) Require the caller to make sure large folios are rmappable. We
>>      already require allocations to be compound. Should be easy to add.
> 
> I'm not sure this is practical given vm_insert_page() is exported?

We could fixup all in-tree users. out-of-tree have to fixup their stuff themselves.

> 
>> (2) Allow non-rmappable folios in rmap code just for mapcount tracking.
>>      Confusing but possible.
>>
>>>>
>>>> I do have another question: why do we just check the large folio
>>>> rmappable? Does that mean order0 folio is always rmappable?
>>>>
>>
>> We didn't really have a check for that I believe. We simply reject all pages in
>> vm_insert_page() that are problematic because the pagecount is overloaded.
>>
>>>> I ask this because vm_insert_pages() is called in net/ipv4/tcp.c
>>>> and drivers call vm_insert_page. I suppose they all need be VM_PFNMAP.

Just to comment on that: I think they don't want VM_PFNMAP for some reason.
For example, because GUP has to work on these pages.


To summarize my thoughts: Using "rmap" code for these folios doesn't make
any sense. mapping/unmapping these folios shouldn't call into rmap code.

If we want to adjust the mapcount/counters, we should mimic what rmap
map/unmap does separately, special-cases for VM_MIXEDMAP mappings.

For the time being, it's probably easiest to do the following:

 From 9ff752e9b82991b55813dbf1088cf5b573577bdd Mon Sep 17 00:00:00 2001
From: David Hildenbrand <david@...hat.com>
Date: Fri, 5 Jan 2024 16:57:29 +0100
Subject: [PATCH] mm/rmap: silence VM_WARN_ON_FOLIO() in
  __folio_rmap_sanity_checks()

Unfortunately, vm_insert_page() and friends and up passing
driver-allocated folios into folio_add_file_rmap_pte() using
insert_page_into_pte_locked().

While these driver-allocated folios can be compound pages (large
folios), they are not proper "rmappable" folios.

In these VM_MIXEDMAP VMAs, there isn't really the concept of a reverse
mapping, so long-term, we should clean that up and not call into rmap
code.

For the time being, document how we can end up in rmap code with large
folios that are not marked rmappable.

Reported-by: syzbot+50ef73537bbc393a25bb@...kaller.appspotmail.com
Fixes: 68f0320824fa ("mm/rmap: convert folio_add_file_rmap_range() into folio_add_file_rmap_[pte|ptes|pmd]()")
Signed-off-by: David Hildenbrand <david@...hat.com>
---
  include/linux/rmap.h | 11 +++++++++--
  1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index fd6fe16fa3583..b7944a833668a 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -199,8 +199,15 @@ static inline void __folio_rmap_sanity_checks(struct folio *folio,
  {
  	/* hugetlb folios are handled separately. */
  	VM_WARN_ON_FOLIO(folio_test_hugetlb(folio), folio);
-	VM_WARN_ON_FOLIO(folio_test_large(folio) &&
-			 !folio_test_large_rmappable(folio), folio);
+
+	/*
+	 * TODO: we get driver-allocated folios that have nothing to do with
+	 * the rmap using vm_insert_page(); therefore, we cannot assume that
+	 * folio_test_large_rmappable() holds for large folios. We should
+	 * handle any desired mapcount+stats accounting for these folios in
+	 * VM_MIXEDMAP VMAs separately, and then sanity-check here that
+	 * we really only get rmappable folios.
+	 */
  
  	VM_WARN_ON_ONCE(nr_pages <= 0);
  	VM_WARN_ON_FOLIO(page_folio(page) != folio, folio);
-- 
2.43.0


-- 
Cheers,

David / dhildenb