linux-kernel - Re: [PATCH v2] mm/swapfile: unuse_pte can map random data if swap read fails

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Yl7gY6G8/To1yHOe@xz-m1.local>
Date:   Tue, 19 Apr 2022 12:16:35 -0400
From:   Peter Xu <peterx@...hat.com>
To:     David Hildenbrand <david@...hat.com>
Cc:     Alistair Popple <apopple@...dia.com>,
        Miaohe Lin <linmiaohe@...wei.com>, akpm@...ux-foundation.org,
        willy@...radead.org, vbabka@...e.cz, dhowells@...hat.com,
        neilb@...e.de, surenb@...gle.com, minchan@...nel.org,
        sfr@...b.auug.org.au, rcampbell@...dia.com,
        naoya.horiguchi@....com, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] mm/swapfile: unuse_pte can map random data if swap
 read fails

On Tue, Apr 19, 2022 at 01:14:29PM +0200, David Hildenbrand wrote:
> On 19.04.22 10:08, Alistair Popple wrote:
> > David Hildenbrand <david@...hat.com> writes:
> > 
> >> On 19.04.22 09:29, Miaohe Lin wrote:
> >>> On 2022/4/19 11:51, Alistair Popple wrote:
> >>>> Miaohe Lin <linmiaohe@...wei.com> writes:
> >>>>
> >>>>> There is a bug in unuse_pte(): when swap page happens to be unreadable,
> >>>>> page filled with random data is mapped into user address space. In case
> >>>>> of error, a special swap entry indicating swap read fails is set to the
> >>>>> page table. So the swapcache page can be freed and the user won't end up
> >>>>> with a permanently mounted swap because a sector is bad. And if the page
> >>>>> is accessed later, the user process will be killed so that corrupted data
> >>>>> is never consumed. On the other hand, if the page is never accessed, the
> >>>>> user won't even notice it.
> >>>>
> >>>> Hi Miaohe,
> >>>>> It seems we're not actually using the pfn that gets stored in the special swap
> >>>> entry here. Is my understanding correct? If so I think it would be better to use
> >>>
> >>> Yes, you're right. The pfn is not used now. What we need here is a special swap entry
> >>> to do the right things. I think we can change to store some debugging information instead
> >>> of pfn if needed in the future.
> >>>
> >>>> the new PTE markers Peter introduced[1] rather than adding another swap entry
> >>>> type.
> >>>
> >>> IIUC, we should not reuse that swap entry here. From definition:
> >>>
> >>> PTE markers
> >>> `========='
> >>> ...
> >>> PTE marker is a new type of swap entry that is ony applicable to file
> >>> backed memories like shmem and hugetlbfs.  It's used to persist some
> >>> pte-level information even if the original present ptes in pgtable are
> >>> zapped.
> >>>
> >>> It's designed for file backed memories while swapin error entry is for anonymous
> >>> memories. And there has some differences in processing. So it's not a good idea
> >>> to reuse pte markers. Or am I miss something?
> >>
> >> I tend to agree. As raised in my other reply, maybe we can simply reuse
> >> hwpoison entries and update the documentation of them accordingly.
> > 
> > Unless I've missed something I don't think PTE markers should be restricted
> > solely to file backed memory. It's true that the only user of them at the moment
> > is UFFD-WP for file backed memory, but PTE markers are just a special swap entry
> > same as what is added here.
> 
> There is a difference.
> 
> What we want here is "there used to be something mapped but it's not
> readable anymore. Please fail hard when userspace tries accessing
> this.". Just like with hwpoison entries.
> 
> What a pte marker expresses is that "here is nothing mapped right now
> but we have additional metadata available here. For file-backed memory,
> it translates to: If we ever touch this page, lookup the pagecache what
> to map here."
> 
> In the anonymous memory world, this would map to "populate the zeropage
> or a fresh anonymous page on access." and keep the metadata around.

So far it's defined like that, but it does not necessarily need to.  IMHO
PTE marker could work here for the anonymous use case as Alistair stated.
Say, it's fairly simple to not go into anonymous page handling at all if we
see this pte marker with the new bit set.  It's indeed just tailored for
such use case where we don't need to store special data like pfn.

Hwpoison entry looks good to me too, but as discussed we may need to
reserve pfn=0 or -1 or anything we're sure an invalid value, and then we'll
also need to cover the rest hwpoison related code (carefully, as rightfully
pointed out by Miaohe on the difference of VM_FAULT_* fields being
returned) to not faultly treat the "swp device read error" with general
MCEs.

>From that POV it seems pte markers would be slightly cleaner, we'll need to
touch up existing pte markers code path to start accept anonymous vmas,
though.  No strong opinion on this.

Btw, is there an error dumped into dmesg when the read error happens (e.g.,
would block IO path trigger some warning already)?  I'm wondering whether
we should report it to the user somehow so that the user should know even
earlier than when the bad page is accessed, then the user could potentially
do something useful.

-- 
Peter Xu