lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 4 Jul 2022 22:34:07 +0800
From:   "guanghui.fgh" <guanghuifeng@...ux.alibaba.com>
To:     Will Deacon <will@...nel.org>
Cc:     baolin.wang@...ux.alibaba.com, catalin.marinas@....com,
        akpm@...ux-foundation.org, david@...hat.com, jianyong.wu@....com,
        james.morse@....com, quic_qiancai@...cinc.com,
        christophe.leroy@...roup.eu, jonathan@...ek.ca,
        mark.rutland@....com, thunder.leizhen@...wei.com,
        anshuman.khandual@....com, linux-arm-kernel@...ts.infradead.org,
        linux-kernel@...r.kernel.org, rppt@...nel.org,
        geert+renesas@...der.be, ardb@...nel.org, linux-mm@...ck.org,
        yaohongbo@...ux.alibaba.com, alikernel-developer@...ux.alibaba.com
Subject: Re: [PATCH v4] arm64: mm: fix linear mem mapping access performance
 degradation

Thanks.

在 2022/7/4 22:23, Will Deacon 写道:
> On Mon, Jul 04, 2022 at 10:11:27PM +0800, guanghui.fgh wrote:
>> 在 2022/7/4 21:15, Will Deacon 写道:
>>> On Mon, Jul 04, 2022 at 08:05:59PM +0800, guanghui.fgh wrote:
>>>>>> 1.Quoted messages from arch/arm64/mm/init.c
>>>>>>
>>>>>> "Memory reservation for crash kernel either done early or deferred
>>>>>> depending on DMA memory zones configs (ZONE_DMA) --
>>>>>>
>>>>>> In absence of ZONE_DMA configs arm64_dma_phys_limit initialized
>>>>>> here instead of max_zone_phys().  This lets early reservation of
>>>>>> crash kernel memory which has a dependency on arm64_dma_phys_limit.
>>>>>> Reserving memory early for crash kernel allows linear creation of block
>>>>>> mappings (greater than page-granularity) for all the memory bank rangs.
>>>>>> In this scheme a comparatively quicker boot is observed.
>>>>>>
>>>>>> If ZONE_DMA configs are defined, crash kernel memory reservation
>>>>>> is delayed until DMA zone memory range size initialization performed in
>>>>>> zone_sizes_init().  The defer is necessary to steer clear of DMA zone
>>>>>> memory range to avoid overlap allocation.
>>>>>>
>>>>>> [[[
>>>>>> So crash kernel memory boundaries are not known when mapping all bank memory
>>>>>> ranges, which otherwise means not possible to exclude crash kernel range
>>>>>> from creating block mappings so page-granularity mappings are created for
>>>>>> the entire memory range.
>>>>>> ]]]"
>>>>>>
>>>>>> Namely, the init order: memblock init--->linear mem mapping(4k mapping for
>>>>>> crashkernel, requirinig page-granularity changing))--->zone dma
>>>>>> limit--->reserve crashkernel.
>>>>>> So when enable ZONE DMA and using crashkernel, the mem mapping using 4k
>>>>>> mapping.
>>>>>
>>>>> Yes, I understand that is how things work today but I'm saying that we may
>>>>> as well leave the crashkernel mapped (at block granularity) if
>>>>> !can_set_direct_map() and then I think your patch becomes a lot simpler.
>>>>
>>>> But Page-granularity mapppings are necessary for crash kernel memory range
>>>> for shrinking its size via /sys/kernel/kexec_crash_size interfac(Quoted from
>>>> arch/arm64/mm/init.c).
>>>> So this patch split block/section mapping to 4k page-granularity mapping for
>>>> crashkernel mem.
>>>
>>> Why? I don't see why the mapping granularity is relevant at all if we
>>> always leave the whole thing mapped.
>>>
>> There is another reason.
>>
>> When loading crashkernel finish, the do_kexec_load will use
>> arch_kexec_protect_crashkres to invalid all the pagetable for crashkernel
>> mem(protect crashkernel mem from access).
>>
>> arch_kexec_protect_crashkres--->set_memory_valid--->...--->apply_to_pmd_range
>>
>> In the apply_to_pmd_range, there is a judement: BUG_ON(pud_huge(*pud)). And
>> if the crashkernel use block/section mapping, there will be some error.
>>
>> Namely, it's need to use non block/section mapping for crashkernel mem
>> before shringking.
> 
> Well, yes, but we can change arch_kexec_[un]protect_crashkres() not to do
> that if we're leaving the thing mapped, no?
> 
> Will

I think we should use arch_kexec_[un]protect_crashkres for crashkernel mem.

Because when invalid crashkernel mem pagetable, there is no chance to 
rd/wr the crashkernel mem by mistake.

If we don't use arch_kexec_[un]protect_crashkres to invalid crashkernel 
mem pagetable, there maybe some write operations to these mem by mistake 
which may cause crashkernel boot error and vmcore saving error.

Can we change the arch_kexec_[un]protect_crashkres to support 
block/section mapping?(But we also need to remap when shrinking)

Thanks.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ