linux-kernel - Is _PAGE_PROTNONE set only for user mappings?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <YntHrTX12TGp35aF@hyeyoo>
Date:   Wed, 11 May 2022 14:20:45 +0900
From:   Hyeonggon Yoo <42.hyeyoo@...il.com>
To:     Dave Hansen <dave.hansen@...el.com>
Cc:     Tom Lendacky <thomas.lendacky@....com>,
        Rick Edgecombe <rick.p.edgecombe@...el.com>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        Andy Lutomirski <luto@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        x86@...nel.org, "H. Peter Anvin" <hpa@...or.com>,
        "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
        Tianyu Lan <Tianyu.Lan@...rosoft.com>,
        "Aneesh Kumar K.V" <aneesh.kumar@...ux.ibm.com>,
        linux-kernel@...r.kernel.org, linux-mm@...ck.org, vbabka@...e.cz,
        akpm@...ux-foundation.org, mgorman@...hsingularity.net,
        willy@...radead.org
Subject: Is _PAGE_PROTNONE set only for user mappings?

On Tue, May 10, 2022 at 07:39:30AM -0700, Dave Hansen wrote:
> On 5/10/22 06:35, Tom Lendacky wrote:
> > I'm wondering if adding a specific helper that takes a boolean to
> > indicate whether to set the global flag would be best. I'll let some of
> > the MM maintainers comment about that.
> 
> First of all, I'm not positive that _PAGE_BIT_PROTNONE is ever used for
> kernel mappings.  This would all get a lot easier if we decided that
> _PAGE_BIT_PROTNONE is only for userspace mappings and we don't have to
> worry about it when _PAGE_USER is clear.

After quickly skimming code it seems the place that actually sets _PAGE_PROTNONE
is via mm/mmap.c's protection_map:

> /* description of effects of mapping type and prot in current implementation.
>  * this is due to the limited x86 page protection hardware.  The expected
>  * behavior is in parens:
>  *
>  * map_type     prot
>  *              PROT_NONE       PROT_READ       PROT_WRITE      PROT_EXEC
>  * MAP_SHARED   r: (no) no      r: (yes) yes    r: (no) yes     r: (no) yes
>  *              w: (no) no      w: (no) no      w: (yes) yes    w: (no) no
>  *              x: (no) no      x: (no) yes     x: (no) yes     x: (yes) yes
>  *              
>  * MAP_PRIVATE  r: (no) no      r: (yes) yes    r: (no) yes     r: (no) yes
>  *              w: (no) no      w: (no) no      w: (copy) copy  w: (no) no
>  *              x: (no) no      x: (no) yes     x: (no) yes     x: (yes) yes
>  *
>  */
> pgprot_t protection_map[16] = { 
>        __P000, __P001, __P010, __P011, __P100, __P101, __P110, __P111,
>        __S000, __S001, __S010, __S011, __S100, __S101, __S110, __S111
> };

Where __P000, __S000 is PAGE_NONE (_PAGE_ACCESSED | _PAGE_PROTNONE).

And protection_map is accessed via:
> pgprot_t vm_get_page_prot(unsigned long vm_flags)
> {
>        pgprot_t ret = __pgprot(pgprot_val(protection_map[vm_flags &
>                                (VM_READ|VM_WRITE|VM_EXEC|VM_SHARED)]) |
>                        pgprot_val(arch_vm_get_page_prot(vm_flags)));
>
>        return arch_filter_pgprot(ret);
> }
> EXPORT_SYMBOL(vm_get_page_prot);

I guess it's only set for processes' VMA if no caller is abusing
vm_get_page_prot() for kernel mappings.

But yeah, just quick guessing does not make us convinced.
Let's Cc people working on mm.

If kernel never uses _PAGE_PROTNONE for kernel mappings, it's just okay
not to clear _PAGE_GLOBAL at first in __change_page_attr() if it's not user address,
because no user will confuse _PAGE_GLOBAL as _PAGE_PROTNONE if it's kernel
address. right?

> 
> Second, the number of places that do these
> __set_pages_p()/__set_pages_np() pairs is pretty limited.  Some of them
> are *quite* unambiguous over whether they are dealing with the direct map:
> 
> > int set_direct_map_invalid_noflush(struct page *page)
> > {
> >         return __set_pages_np(page, 1);
> > }
> > 
> > int set_direct_map_default_noflush(struct page *page)
> > {
> >         return __set_pages_p(page, 1);
> > }
> 
> which would make it patently obvious whether __set_pages_p() should
> restore the global bit.  That would have been a problem in the "old" PTI
> days where _some_ of the direct map was exposed to Meltdown.  I don't
> think we have any of those mappings left, though.  They're all aliases
> like text and cpu_entry_area.
>
> It would be nice if someone could look into unraveling
> _PAGE_BIT_PROTNONE.  We could even probably move it to another bit for
> kernel mappings if we actually need it (I'm not convinced we do).

-- 
Thanks,
Hyeonggon