[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <380b137b-a611-5c8d-3890-8021084f87fe@redhat.com>
Date: Sat, 4 May 2019 18:23:56 +0530
From: Bhupesh Sharma <bhsharma@...hat.com>
To: James Morse <james.morse@....com>
Cc: linux-kernel@...r.kernel.org, bhupesh.linux@...il.com,
Mark Rutland <mark.rutland@....com>,
Will Deacon <will.deacon@....com>,
Dave Anderson <anderson@...hat.com>,
Kazuhito Hagio <k-hagio@...jp.nec.com>,
kexec@...ts.infradead.org, linux-arm-kernel@...ts.infradead.org,
Steve Capper <Steve.Capper@....com>
Subject: Re: [PATCH v3 1/3] arm64, vmcoreinfo : Append 'PTRS_PER_PGD' to
vmcoreinfo
On 04/03/2019 11:24 PM, Bhupesh Sharma wrote:
> Hi James,
>
> On 04/02/2019 10:56 PM, James Morse wrote:
>> Hi Bhupesh,
>>
>> On 28/03/2019 11:42, Bhupesh Sharma wrote:
>>> On 03/26/2019 10:06 PM, James Morse wrote:
>>>> On 20/03/2019 05:09, Bhupesh Sharma wrote:
>>>>> With ARMv8.2-LVA architecture extension availability, arm64 hardware
>>>>> which supports this extension can support a virtual address-space upto
>>>>> 52-bits.
>>>>>
>>>>> Since at the moment we enable the support of this extension in kernel
>>>>> via CONFIG flags, e.g.
>>>>> - User-space 52-bit LVA via CONFIG_ARM64_USER_VA_BITS_52
>>>>>
>>>>> so, there is no clear mechanism in the user-space right now to
>>>>> determine these CONFIG flag values and hence determine the maximum
>>>>> virtual address space supported by the underlying kernel.
>>>>>
>>>>> User-space tools like 'makedumpfile' therefore are broken currently
>>>>> as they have no proper method to calculate the 'PTRS_PER_PGD' value
>>>>> which is required to perform a page table walk to determine the
>>>>> physical address of a corresponding virtual address found in
>>>>> kcore/vmcoreinfo.
>>>>>
>>>>> If one appends 'PTRS_PER_PGD' number to vmcoreinfo for arm64,
>>>>> it can be used in user-space to determine the maximum virtual address
>>>>> supported by underlying kernel.
>>>>
>>>> I don't think this really solves the problem, it feels fragile.
>>>>
>>>> I can see how vmcoreinfo tells you VA_BITS==48, PAGE_SIZE==64K and
>>>> PTRS_PER_PGD=1024.
>>>> You can use this to work out that the top level page table size
>>>> isn't consistent with a
>>>> 48bit VA, so 52bit VA must be in use...
>>>>
>>>> But wasn't your problem walking the kernel page tables? In
>>>> particular the offset that we
>>>> apply because the tables were based on a 48bit VA shifted up in
>>>> swapper_pg_dir.
>>>>
>>>> Where does the TTBR1_EL1 offset come from with this property? I
>>>> assume makedumpfile
>>>> hard-codes it when it sees 52bit is in use ... somewhere.
>>>> We haven't solved the problem!
>>
>>> But isn't the TTBR1_EL1 offset already appended by the kernel via
>>> e842dfb5a2d3 ("arm64:
>>> mm: Offset TTBR1 to allow 52-bit PTRS_PER_PGD")
>>> in case of kernel configuration where 52-bit userspace VAs are possible.
>>
>>> Accordingly we have the following assembler helper in
>>> 'arch/arm64/include/asm/assembler.h':
>>>
>>> .macro offset_ttbr1, ttbr
>>> #ifdef CONFIG_ARM64_52BIT_VA
>>> orr \ttbr, \ttbr, #TTBR1_BADDR_4852_OFFSET
>>> #endif
>>> .endm
>>>
>>> where:
>>> #ifdef CONFIG_ARM64_52BIT_VA
>>> /* Must be at least 64-byte aligned to prevent corruption of the TTBR */
>>> #define TTBR1_BADDR_4852_OFFSET (((UL(1) << (52 -
>>> PGDIR_SHIFT)) - \
>>> (UL(1) << (48 - PGDIR_SHIFT))) * 8)
>>> #endif
>>
>> Sure, and all this would work today, because there is only one weird
>> combination. But once
>> we support another combination of 52bit-va, you'd either need another
>> value, or to start
>> using PTRS_PER_PGD as a flag for v5.1_FUNNY_BEHAVIOUR_ONE.
>
> I completed my user-space experimentation with 52-bit kernel VA changes
> from Steve today and have shared a detailed review on his patchset (See
> <http://lists.infradead.org/pipermail/kexec/2019-April/022750.html>).
>
> But first let me share some opinion on how we are adding the 52-bit
> address space changes for arm64 in the kernel.
>
> I think we have ended up adding just a bit _too many_ CONFIG and MACRO
> values for the increased address space changes. For e.g. after the
> 52-bit kernel VA changes we have at-least 4 macros which explain the VA
> address range with CONFIG_ARM64_USER_KERNEL_VA_BITS_52=y:
>
> VA_BITS = 52,
> VA_BITS_ACTUAL = vabits_actual = 48,
> VA_BITS_MIN = min (48, VA_BITS) = 48.
> PTRS_PER_PGD = 64 (48-bit) or 1024 (52-bit)
>
> Of these, VA_BITS, VA_BITS_ACTUAL and PTRS_PER_PGD are definitely of
> interest in the userspace as they define:
>
> 1.
> /*
> * VMEMMAP_SIZE - allows the whole linear region to be covered by
> * a struct page array
> */
> #define VMEMMAP_SIZE (UL(1) << (VA_BITS - PAGE_SHIFT - 1 +
> STRUCT_PAGE_MAX_SHIFT))
>
> 2. #define __is_lm_address(addr) (!((addr) & BIT(VA_BITS_ACTUAL - 1)))
>
> We have discussed the usage of PTRS_PER_PGD in userspace already at
> quite some length, so I will focus on the other two below (VA_BITS and
> VA_BITS_ACUAL).
>
> Both are critical for determining VMEMMAP_SIZE and whether a virtual
> address lies in the linear map range respectively.
>
> I don't see any standard mechanism other than the following to achieve a
> working user-space with these changes:
> - a sysfs node (may be a
> '/sys/devices/system/cpu/addressing-capabilities' node?) or HWCAP
> capability export for user-space utilities which perform a live analysis
> and use the above variables.
> - exporting these variables in vmcoreinfo (for analysis of crash dump).
>
> VA_BITS is already exported in vmcoreinfo, whereas I have proposed
> exporting PTRS_PER_PGD to vmcoreinfo via this patch.
>
> For 52-bit kernel VA changes, VA_BITS_ACTUAL will also be needed in
> vmcoreinfo (See
> <http://lists.infradead.org/pipermail/kexec/2019-April/022750.html> for
> details).
>
>>> Note that the above computation holds true both for PTRS_PER_PGD = 64
>>> (48-bit kernel with
>>> 48-bit User VA) and 1024 (48-bit with 52-bit User VA) cases. And
>>> these are the
>>> configurations for which we are trying to fix the user-space
>>> regressions reported (on
>>> arm64) recently.
>>
>> ... and revisit it when there is another combination?
>
> See above.
>
>>>> Today __cpu_setup() sets T0SZ and T1SZ differently for 52bit VA, but
>>>> in the future it
>>>> could set them the same, or different the other-way-round.
>>>>
>>>> Will makedumpfile using this value keep working once T1SZ is 52bit
>>>> VA too? In this case
>>>> there would be no ttbr offset.
>>>>
>>>> If you need another vmcoreinfo flag once that happens, we've done
>>>> something wrong here.
>>>
>>> I am currently experimenting with Steve's patches for 52-bit kernel VA
>>> (<https://lwn.net/Articles/780093/>) and will comment more on the
>>> same when I am able to
>>> get the user-space utilities like makedumpfile and kexec-tools to
>>> work with the same on
>>> both ARMv8 Fast Simulator model and older CPUs which don't support
>>> ARMv8.2 extensions.
>>
>>> However, I think we should not hold up fixes for regressions already
>>> reported, because the
>>> 52-bit kernel VA changes probably still need some more rework.
>>
>> Chucking things into vmcoreinfo isn't free: we need to keep them there
>> forever, otherwise
>> yesterdays version of the tools breaks. Can we take the time to get
>> this right for the
>> cases we know about?
>
> Sure, but exporting variable(s) in vmcoreinfo in directly related to the
> information variable(s) we add in the kernel side without which the
> user-space would break.
>
> I have added the requirements for 52-bit kernel VA above (i.e we need an
> additional VA_BITS_ACTUAL variable export'ed rather than any tinkering
> with already proposed PTRS_PER_PGD).
>
> May be this is a good time to also talk about minimizing the kernel
> interfaces we are proposing to hold and indicate normal (48-bit) and
> extended (52-bit) address spaces on arm64.
>
> Ideally, we would want to simplify it further to be on similar lines as
> x86:
> CONFIG_X86_5LEVEL=y
> vmcoreinfo_append_str("NUMBER(pgtable_l5_enabled)=%d\n",
> pgtable_l5_enabled());
>
> which seems much cleaner..
>
> I am open to any suggestions on the same.
>
>> Yes the kernel code is going to move around, this is why the
>> information we expose via
>> vmcoreinfo needs to be thought through: something we would always
>> need, regardless of how
>> the kernel implements it.
>>
>>
>>>> (Not to mention what happens if the TTBR1_EL1 uses 52bit va, but
>>>> TTBR0_EL1 doesn't)
>>>
>>> I am wondering if there are any real users of the above combination.
>>
>> Heh! Is there any hardware that supports this?
>>
>> Pointer-auth changes all this again, as we may prefer to use the bits
>> for pointer-auth in
>> one TTB or the other. PTRS_PER_PGD may show the 52bit value in this
>> case, but neither TTBR
>> is mapping 52bits of VA.
>>
>>
>>> So far, I have generally come across discussions where the following
>>> variations of the
>>> address spaces have been proposed/requested:
>>> - 48bit kernel VA + 48-bit User VA,
>>> - 48-bit kernel VA + 52-bit User VA,
>>
>> + 52bit kernel, because there is excessive quantities of memory, and
>> the kernel maps it
>> all, but 48-bit user, because it never maps all the memory, and we
>> prefer the bits for
>> pointer-auth.
>>
>>> - 52-bit kernel VA + 52-bit User VA.
>>
>> And... all four may happen with the same built image. I don't see how
>> you can tell these
>> cases apart with the one (build-time-constant!) PTRS_PER_PGD value.
>>
>> I'm sure some of these cases are hypothetical, but by considering it
>> all now, we can avoid
>> three more kernel:vmcoreinfo updates, and three more
>> fix-user-space-to-use-the-new-value.
>
> Agree.
>
>> I think you probably do need PTRS_PER_PGD, as this is the one value
>> the mm is using to
>> generate page tables. I'm pretty sure you also need T0SZ and T1SZ to
>> know if that's
>> actually in use, or the kernel is bodging round it with an offset.
>
> Sure, I am open to suggestions (as I realize that we need an additional
> VA_BITS_ACTUAL variable export'ed for 52-bit kernel VA changes).
>
> Also how do we standardize reading T0SZ and T1SZ in user-space. Do you
> propose I make an enhancement in the cpu-feature-registers interface
> (see [1]) or the HWCAPS interface (see [2]) for the same?
>
> [1].
> https://www.kernel.org/doc/Documentation/arm64/cpu-feature-registers.txt
> [2]. https://www.kernel.org/doc/Documentation/arm64/elf_hwcaps.txt
>
> Thanks,
> Bhupesh
Ping.
Hi James, Steve,
Any comments on the above points? At the moment we have to carry these
fixes in the distribution kernels and I would like to have these fixed
in upstream kernel itself.
Thanks,
Bhupesh
Powered by blists - more mailing lists