linux-kernel - Re: [PATCH] um: asm/page.h: zero out a pte's high value in set_pte

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87bn845ufn.fsf@gmail.com>
Date:	Fri, 29 Jan 2016 15:31:08 +0100
From:	Nicolai Stange <nicstange@...il.com>
To:	Richard Weinberger <richard@....at>
Cc:	Nicolai Stange <nicstange@...il.com>,
	Dan Williams <dan.j.williams@...el.com>,
	Alexander Viro <viro@...iv.linux.org.uk>,
	Jeff Dike <jdike@...toit.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	user-mode-linux-devel@...ts.sourceforge.net,
	user-mode-linux-user@...ts.sourceforge.net,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH] um: asm/page.h: zero out a pte's high value in set_pte_val()

Richard Weinberger <richard@....at> writes:

> Am 29.01.2016 um 02:32 schrieb Nicolai Stange:
>> Richard Weinberger <richard@....at> writes:
>> 
>>> Am 29.01.2016 um 00:56 schrieb Nicolai Stange:
>>>> Commit 16da306849d0 ("um: kill pfn_t")
>>>> introduced a compile warning for defconfig:
>>>>
>>>>   arch/um/kernel/skas/mmu.c:38:206: warning: right shift count >= width of type
>>>>                                     [-Wshift-count-overflow]
>>>>
>>>> Aforementioned patch changes the definition of the phys_to_pfn() macro from
>>>>
>>>>   ((pfn_t) ((p) >> PAGE_SHIFT))
>>>>
>>>> to
>>>>
>>>>   ((p) >> PAGE_SHIFT)
>>>>
>>>> This effectively changes the phys_to_pfn() expansion's type from
>>>> unsigned long long to unsigned long.
>>>>
>>>> Through the callchain init_stub_pte()->mk_pte(), the expansion of
>>>> phys_to_pfn() is (indirectly) fed into the 'phys' argument of the
>>>> pte_set_val(pte, phys, prot) macro, eventually leading to
>>>>
>>>>   (pte).pte_high = (phys) >> 32;
>>>>
>>>> This results in the warning from above.
>>>>
>>>> Since UML only deals with 32 bit addresses, the upper 32 bits from 'phys'
>>>> used to be zero anyway.
>>>>
>>>> Zero out the pte value's high part in pte_set_val() in order to get rid
>>>> of the offending shift.
>>>>
>>>> Fixes: 16da306849d0 ("um: kill pfn_t")
>>>> Signed-off-by: Nicolai Stange <nicstange@...il.com>
>>>> ---
>>>>  arch/um/include/asm/page.h | 4 ++--
>>>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/arch/um/include/asm/page.h b/arch/um/include/asm/page.h
>>>> index e13d41c..61e235f 100644
>>>> --- a/arch/um/include/asm/page.h
>>>> +++ b/arch/um/include/asm/page.h
>>>> @@ -46,8 +46,8 @@ typedef struct { unsigned long pgd; } pgd_t;
>>>>  			      smp_wmb(); \
>>>>  			      (to).pte_low = (from).pte_low; })
>>>>  #define pte_is_zero(pte) (!((pte).pte_low & ~_PAGE_NEWPAGE) && !(pte).pte_high)
>>>> -#define pte_set_val(pte, phys, prot) \
>>>> -	({ (pte).pte_high = (phys) >> 32; \
>>>> +#define pte_set_val(pte, phys, prot)				\
>>>> +	({ (pte).pte_high = 0;					\
>>>>  	   (pte).pte_low = (phys) | pgprot_val(prot); })
>>>
>>> I think we can completely kill .pte_high.
>>>
>> 
>> I did a quick test with ->pte_high purged and this doesn't introduce any
>> new warnings.
>> 
>> Booting w/o a rootfs works up to mount_root.
>> 
>> 
>> Note that an implication of getting rid of ->pte_high would be that the
>> type of pte_val() would get changed from unsigned long long to unsigned
>> long. However, outside of arch/um, pte_val() is only used here:
>> - drivers/gpu/drm/drm_vm.c
>> - include/trace/events/xen.h
>> - mm/gup.c
>> - mm/memory.c
>> All these uses and the ones in arch/um itself look compatible with this
>> change (if relevant at all for UML).
>> 
>> 
>> I'll post a follow up patch for this tomorrow.
>> 
>> Question 1: now that ->pte_high will be gone, do you want to have
>>             ->pte_low renamed to e.g. ->pte_val?
>
> So, with a freshly booted brain the story looks a bit different.
> All this code needs a cleanup and we need to check what other archs do
> before we change pte_val(). Are you ready for some research? :)

So this is what arch/x86 does:

1.) typedef a pteval_t to a type matching the underlying hardware's
    native PTE size.
    Examples:
    - x86: arch/x86/include/asm/pgtable-2level_types.h -- unsigned long
    - x86(PAE): arch/x86/include/asm/pgtable-3level_types.h -- u64
    - x86_64: arch/x86/include/asm/pgtable_64_types.h -- unsigned long

2.) pte_t is typedefed to either a struct or union like this:

      typedef struct { pteval_t pte; } pte_t;

    In the case of a union (x86 and x86 w/ PAE), an additional member
    'pte_low' is introduced, aliasing the low half of ->pte.

Now, all three x86-arch cases define typedef a pgprotval_t matching their
respective pteval_t type and have a common then

  typedef struct pgprot { pgprotval_t pgprot; } pgprot_t;

Basically, mk_pte(page, pgprot) shifts the page's physical address to
some architecturally defined point (PAGE_SHIFT) within pteval_t and ors
the architecturally defined protection flags (_PAGE_*) in.

Of course, the protection flags are defined such that hardware
eventually finds them at the expected place within the final PTE
(c.f. arch/x86/include/asm/pgtable_types.h).


Summarizing:
The content of pteval_t is completely architecture dependent.  The only
semantics on pte values defined for out-of-arch users, e.g. mm/gup.c
seems to be equality on a pte_val(pte).


Finally, the page protection flags defined for UML do not have any bit
at a position greater than 9 assigned to them
(c.g. arch/um/include/asm/pgtable.h). (If that had been the case, we had
been in trouble already because protection flags are only or'ed into
->pte_low).


Thus, under the assumption that with UML, physical addresses are always
32 bits, I would say that it is safe to change pte_t.


Proposal:
Introduce pteval_t and pgprotval_t like x86 does and do

  typedef struct { pteval_t pte; } pte_t;
  typedef struct pgprot { pgprotval_t pgprot; } pgprot_t;

Change the pte macros accordingly.

What about pgd_t and pmd_t?


>> Question 2: what is the smp_wmb() in pte_copy() paired with/good for?
>
> AFACT to make sure that a write to pte_high is complete before we write pte_low.
> 100% copy&pasted from arch/i386 15 years ago. ;-)
>
> Thanks,
> //richard