linux-kernel - Re: [PATCH v3 1/3] arm64, vmcoreinfo : Append 'PTRS_PER

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <66da4098-b221-408b-50ca-f3790b943965@redhat.com>
Date:   Wed, 3 Apr 2019 23:24:12 +0530
From:   Bhupesh Sharma <bhsharma@...hat.com>
To:     James Morse <james.morse@....com>
Cc:     linux-kernel@...r.kernel.org, bhupesh.linux@...il.com,
        Mark Rutland <mark.rutland@....com>,
        Will Deacon <will.deacon@....com>,
        Dave Anderson <anderson@...hat.com>,
        Kazuhito Hagio <k-hagio@...jp.nec.com>,
        kexec@...ts.infradead.org, linux-arm-kernel@...ts.infradead.org,
        Steve Capper <Steve.Capper@....com>,
        Dave Anderson <anderson@...hat.com>,
        "kexec@...ts.infradead.org" <kexec@...ts.infradead.org>
Subject: Re: [PATCH v3 1/3] arm64, vmcoreinfo : Append 'PTRS_PER_PGD' to
 vmcoreinfo

Hi James,

On 04/02/2019 10:56 PM, James Morse wrote:
> Hi Bhupesh,
> 
> On 28/03/2019 11:42, Bhupesh Sharma wrote:
>> On 03/26/2019 10:06 PM, James Morse wrote:
>>> On 20/03/2019 05:09, Bhupesh Sharma wrote:
>>>> With ARMv8.2-LVA architecture extension availability, arm64 hardware
>>>> which supports this extension can support a virtual address-space upto
>>>> 52-bits.
>>>>
>>>> Since at the moment we enable the support of this extension in kernel
>>>> via CONFIG flags, e.g.
>>>>    - User-space 52-bit LVA via CONFIG_ARM64_USER_VA_BITS_52
>>>>
>>>> so, there is no clear mechanism in the user-space right now to
>>>> determine these CONFIG flag values and hence determine the maximum
>>>> virtual address space supported by the underlying kernel.
>>>>
>>>> User-space tools like 'makedumpfile' therefore are broken currently
>>>> as they have no proper method to calculate the 'PTRS_PER_PGD' value
>>>> which is required to perform a page table walk to determine the
>>>> physical address of a corresponding virtual address found in
>>>> kcore/vmcoreinfo.
>>>>
>>>> If one appends 'PTRS_PER_PGD' number to vmcoreinfo for arm64,
>>>> it can be used in user-space to determine the maximum virtual address
>>>> supported by underlying kernel.
>>>
>>> I don't think this really solves the problem, it feels fragile.
>>>
>>> I can see how vmcoreinfo tells you VA_BITS==48, PAGE_SIZE==64K and PTRS_PER_PGD=1024.
>>> You can use this to work out that the top level page table size isn't consistent with a
>>> 48bit VA, so 52bit VA must be in use...
>>>
>>> But wasn't your problem walking the kernel page tables? In particular the offset that we
>>> apply because the tables were based on a 48bit VA shifted up in swapper_pg_dir.
>>>
>>> Where does the TTBR1_EL1 offset come from with this property? I assume makedumpfile
>>> hard-codes it when it sees 52bit is in use ... somewhere.
>>> We haven't solved the problem!
> 
>> But isn't the TTBR1_EL1 offset already appended by the kernel via e842dfb5a2d3 ("arm64:
>> mm: Offset TTBR1 to allow 52-bit PTRS_PER_PGD")
>> in case of kernel configuration where 52-bit userspace VAs are possible.
> 
>> Accordingly we have the following assembler helper in 'arch/arm64/include/asm/assembler.h':
>>
>>         .macro  offset_ttbr1, ttbr
>> #ifdef CONFIG_ARM64_52BIT_VA
>>         orr     \ttbr, \ttbr, #TTBR1_BADDR_4852_OFFSET
>> #endif
>>         .endm
>>
>> where:
>> #ifdef CONFIG_ARM64_52BIT_VA
>> /* Must be at least 64-byte aligned to prevent corruption of the TTBR */
>> #define TTBR1_BADDR_4852_OFFSET        (((UL(1) << (52 - PGDIR_SHIFT)) - \
>>                                  (UL(1) << (48 - PGDIR_SHIFT))) * 8)
>> #endif
> 
> Sure, and all this would work today, because there is only one weird combination. But once
> we support another combination of 52bit-va, you'd either need another value, or to start
> using PTRS_PER_PGD as a flag for v5.1_FUNNY_BEHAVIOUR_ONE.

I completed my user-space experimentation with 52-bit kernel VA changes 
from Steve today and have shared a detailed review on his patchset (See 
<http://lists.infradead.org/pipermail/kexec/2019-April/022750.html>).

But first let me share some opinion on how we are adding the 52-bit 
address space changes for arm64 in the kernel.

I think we have ended up adding just a bit _too many_ CONFIG and MACRO 
values for the increased address space changes. For e.g. after the 
52-bit kernel VA changes we have at-least 4 macros which explain the VA 
address range with CONFIG_ARM64_USER_KERNEL_VA_BITS_52=y:

VA_BITS = 52,
VA_BITS_ACTUAL = vabits_actual = 48,
VA_BITS_MIN = min (48, VA_BITS) = 48.
PTRS_PER_PGD = 64 (48-bit) or 1024 (52-bit)

Of these, VA_BITS, VA_BITS_ACTUAL and PTRS_PER_PGD are definitely of 
interest in the userspace as they define:

1.
  /*
     * VMEMMAP_SIZE - allows the whole linear region to be covered by
     *                a struct page array
     */
    #define VMEMMAP_SIZE (UL(1) << (VA_BITS - PAGE_SHIFT - 1 + 
STRUCT_PAGE_MAX_SHIFT))

2. #define __is_lm_address(addr)    (!((addr) & BIT(VA_BITS_ACTUAL - 1)))

We have discussed the usage of PTRS_PER_PGD in userspace already at 
quite some length, so I will focus on the other two below (VA_BITS and 
VA_BITS_ACUAL).

Both are critical for determining VMEMMAP_SIZE and whether a virtual 
address lies in the linear map range respectively.

I don't see any standard mechanism other than the following to achieve a 
working user-space with these changes:
- a sysfs node (may be a 
'/sys/devices/system/cpu/addressing-capabilities' node?) or HWCAP 
capability export for user-space utilities which perform a live analysis 
and use the above variables.
- exporting these variables in vmcoreinfo (for analysis of crash dump).

VA_BITS is already exported in vmcoreinfo, whereas I have proposed 
exporting PTRS_PER_PGD to vmcoreinfo via this patch.

For 52-bit kernel VA changes, VA_BITS_ACTUAL will also be needed in 
vmcoreinfo (See 
<http://lists.infradead.org/pipermail/kexec/2019-April/022750.html> for 
details).

>> Note that the above computation holds true both for PTRS_PER_PGD = 64 (48-bit kernel with
>> 48-bit User VA) and 1024 (48-bit with 52-bit User VA) cases. And these are the
>> configurations for which we are trying to fix the user-space regressions reported (on
>> arm64) recently.
> 
> ... and revisit it when there is another combination?

See above.

>>> Today __cpu_setup() sets T0SZ and T1SZ differently for 52bit VA, but in the future it
>>> could set them the same, or different the other-way-round.
>>>
>>> Will makedumpfile using this value keep working once T1SZ is 52bit VA too? In this case
>>> there would be no ttbr offset.
>>>
>>> If you need another vmcoreinfo flag once that happens, we've done something wrong here.
>>
>> I am currently experimenting with Steve's patches for 52-bit kernel VA
>> (<https://lwn.net/Articles/780093/>) and will comment more on the same when I am able to
>> get the user-space utilities like makedumpfile and kexec-tools to work with the same on
>> both ARMv8 Fast Simulator model and older CPUs which don't support ARMv8.2 extensions.
> 
>> However, I think we should not hold up fixes for regressions already reported, because the
>> 52-bit kernel VA changes probably still need some more rework.
> 
> Chucking things into vmcoreinfo isn't free: we need to keep them there forever, otherwise
> yesterdays version of the tools breaks. Can we take the time to get this right for the
> cases we know about?

Sure, but exporting variable(s) in vmcoreinfo in directly related to the 
information variable(s) we add in the kernel side without which the 
user-space would break.

I have added the requirements for 52-bit kernel VA above (i.e we need an 
additional VA_BITS_ACTUAL variable export'ed rather than any tinkering 
with already proposed PTRS_PER_PGD).

May be this is a good time to also talk about minimizing the kernel 
interfaces we are proposing to hold and indicate normal (48-bit) and 
extended (52-bit) address spaces on arm64.

Ideally, we would want to simplify it further to be on similar lines as x86:
CONFIG_X86_5LEVEL=y
vmcoreinfo_append_str("NUMBER(pgtable_l5_enabled)=%d\n",
			pgtable_l5_enabled());

which seems much cleaner..

I am open to any suggestions on the same.

> Yes the kernel code is going to move around, this is why the information we expose via
> vmcoreinfo needs to be thought through: something we would always need, regardless of how
> the kernel implements it.
> 
> 
>>> (Not to mention what happens if the TTBR1_EL1 uses 52bit va, but TTBR0_EL1 doesn't)
>>
>> I am wondering if there are any real users of the above combination.
> 
> Heh! Is there any hardware that supports this?
> 
> Pointer-auth changes all this again, as we may prefer to use the bits for pointer-auth in
> one TTB or the other. PTRS_PER_PGD may show the 52bit value in this case, but neither TTBR
> is mapping 52bits of VA.
> 
> 
>> So far, I have generally come across discussions where the following variations of the
>> address spaces have been proposed/requested:
>> - 48bit kernel VA + 48-bit User VA,
>> - 48-bit kernel VA + 52-bit User VA,
> 
> + 52bit kernel, because there is excessive quantities of memory, and the kernel maps it
> all, but 48-bit user, because it never maps all the memory, and we prefer the bits for
> pointer-auth.
> 
>> - 52-bit kernel VA + 52-bit User VA.
> 
> And...  all four may happen with the same built image. I don't see how you can tell these
> cases apart with the one (build-time-constant!) PTRS_PER_PGD value.
> 
> I'm sure some of these cases are hypothetical, but by considering it all now, we can avoid
> three more kernel:vmcoreinfo updates, and three more fix-user-space-to-use-the-new-value.

Agree.

> I think you probably do need PTRS_PER_PGD, as this is the one value the mm is using to
> generate page tables. I'm pretty sure you also need T0SZ and T1SZ to know if that's
> actually in use, or the kernel is bodging round it with an offset.

Sure, I am open to suggestions (as I realize that we need an additional 
VA_BITS_ACTUAL variable export'ed for 52-bit kernel VA changes).

Also how do we standardize reading T0SZ and T1SZ in user-space. Do you 
propose I make an enhancement in the cpu-feature-registers interface 
(see [1]) or the HWCAPS interface (see [2]) for the same?

[1]. 
https://www.kernel.org/doc/Documentation/arm64/cpu-feature-registers.txt
[2]. https://www.kernel.org/doc/Documentation/arm64/elf_hwcaps.txt

Thanks,
Bhupesh