linux-kernel - Re: [syzbot] [mm?] WARNING in __folio_rmap_sanity

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date: Fri, 5 Jan 2024 09:56:29 +0100
From: David Hildenbrand <david@...hat.com>
To: Ryan Roberts <ryan.roberts@....com>, Yin Fengwei <fengwei.yin@...el.com>,
 syzbot <syzbot+50ef73537bbc393a25bb@...kaller.appspotmail.com>,
 akpm@...ux-foundation.org, linux-kernel@...r.kernel.org, linux-mm@...ck.org,
 syzkaller-bugs@...glegroups.com, Matthew Wilcox <willy@...radead.org>
Subject: Re: [syzbot] [mm?] WARNING in __folio_rmap_sanity_checks

>>>>> If I am not wrong, that triggers:
>>>>>
>>>>> VM_WARN_ON_FOLIO(folio_test_large(folio) &&
>>>>>             !folio_test_large_rmappable(folio), folio);
>>>>>
>>>>> So we are trying to rmap a large folio that did not go through
>>>>> folio_prep_large_rmappable().
> 
> Would someone mind explaining the rules to me for this? As far as I can see,
> folio_prep_large_rmappable() just inits the _deferred_list and sets a flag so we
> remember to deinit the list on destruction. Why can't we just init that list for
> all folios order-2 or greater? Then everything is rmappable?

I think we much rather want to look into moving all mapcount-related 
stuff into folio_prep_large_rmappable(). It doesn't make any sense to 
initialize that for any compound pages, especially the ones that will 
never get mapped to user space.

> 
>>>>>
>>>>> net/packet/af_packet.c calls vm_insert_page() on some pages/folios stoed
>>>>> in the "struct packet_ring_buffer". No idea where that comes from, but I
>>>>> suspect it's simply some compound allocation.
>>>> Looks like:
>>>>     alloc_pg_vec
>>>>       alloc_one_pg_vec_page
>>>>            gfp_t gfp_flags = GFP_KERNEL | __GFP_COMP |
>>>>                              __GFP_ZERO | __GFP_NOWARN | __GFP_NORETRY;
>>>>
>>>>            buffer = (char *) __get_free_pages(gfp_flags, order);
>>>> So you are right here... :).
>>>
>>> Hm, but I wonder if this something that's supposed to work or is this one of
>>> the cases where we should actually use a VM_PFN mapping?
>>>
>>> It's not a pagecache(file/shmem) page after all.
>>>
>>> We could relax that check and document why we expect something that is not
>>> marked rmappable. But it fells wrong. I suspect this should be a VM_PFNMAP
>>> instead (like recent udmabuf changes).
>>
>> VM_PFNMAP looks correct.
> 
> And why is making the folio rmappable and mapping it the normal way not the
> right solution here? Because the folio could be order-1? Or something more profound?
> 

Think about it: we are adding/removing a page from rmap handling that 
can *never* be looked up using the rmap because there is no rmap for 
these pages, and folio->index is just completely unexpressive. 
VM_MIXEDMAP doesn't have any linearity constraints.

Logically, it doesn't make any sense to involve rmap code although it 
currently might work. validate_page_before_insert() blocks off most 
pages where the order-0 mapcount would be used for other purposes and 
everything would blow up.

Looking at vm_insert_page(), this interface is only for pages the caller 
allocated. Maybe we should just not do any rmap accounting when 
mapping/unmapping these pages: not involve any rmap code, including 
mapcounts?

vm_normal_page() works on these mappings, so we'd also have to skip rmap 
code when unmapping these pages etc. Maybe that's the whole reason we 
have the rmap handling here: to not special-case the unmap path.

Alternatively, we can:

(1) Require the caller to make sure large folios are rmappable. We
     already require allocations to be compound. Should be easy to add.
(2) Allow non-rmappable folios in rmap code just for mapcount tracking.
     Confusing but possible.

>>
>> I do have another question: why do we just check the large folio
>> rmappable? Does that mean order0 folio is always rmappable?
>>

We didn't really have a check for that I believe. We simply reject all 
pages in vm_insert_page() that are problematic because the pagecount is 
overloaded.

>> I ask this because vm_insert_pages() is called in net/ipv4/tcp.c
>> and drivers call vm_insert_page. I suppose they all need be VM_PFNMAP.

Right, similar problem.

-- 
Cheers,

David / dhildenb