[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f1f7f447-1b7b-9c42-f735-906d96b6077b@linux.alibaba.com>
Date: Tue, 5 Jul 2022 10:44:43 +0800
From: "guanghui.fgh" <guanghuifeng@...ux.alibaba.com>
To: Will Deacon <will@...nel.org>
Cc: baolin.wang@...ux.alibaba.com, catalin.marinas@....com,
akpm@...ux-foundation.org, david@...hat.com, jianyong.wu@....com,
james.morse@....com, quic_qiancai@...cinc.com,
christophe.leroy@...roup.eu, jonathan@...ek.ca,
mark.rutland@....com, thunder.leizhen@...wei.com,
anshuman.khandual@....com, linux-arm-kernel@...ts.infradead.org,
linux-kernel@...r.kernel.org, rppt@...nel.org,
geert+renesas@...der.be, ardb@...nel.org, linux-mm@...ck.org,
yaohongbo@...ux.alibaba.com, alikernel-developer@...ux.alibaba.com
Subject: Re: [PATCH v4] arm64: mm: fix linear mem mapping access performance
degradation
在 2022/7/5 0:38, Will Deacon 写道:
> On Mon, Jul 04, 2022 at 10:34:07PM +0800, guanghui.fgh wrote:
>> Thanks.
>>
>> 在 2022/7/4 22:23, Will Deacon 写道:
>>> On Mon, Jul 04, 2022 at 10:11:27PM +0800, guanghui.fgh wrote:
>>>> 在 2022/7/4 21:15, Will Deacon 写道:
>>>>> On Mon, Jul 04, 2022 at 08:05:59PM +0800, guanghui.fgh wrote:
>>>>>>>> 1.Quoted messages from arch/arm64/mm/init.c
>>>>>>>>
>>>>>>>> "Memory reservation for crash kernel either done early or deferred
>>>>>>>> depending on DMA memory zones configs (ZONE_DMA) --
>>>>>>>>
>>>>>>>> In absence of ZONE_DMA configs arm64_dma_phys_limit initialized
>>>>>>>> here instead of max_zone_phys(). This lets early reservation of
>>>>>>>> crash kernel memory which has a dependency on arm64_dma_phys_limit.
>>>>>>>> Reserving memory early for crash kernel allows linear creation of block
>>>>>>>> mappings (greater than page-granularity) for all the memory bank rangs.
>>>>>>>> In this scheme a comparatively quicker boot is observed.
>>>>>>>>
>>>>>>>> If ZONE_DMA configs are defined, crash kernel memory reservation
>>>>>>>> is delayed until DMA zone memory range size initialization performed in
>>>>>>>> zone_sizes_init(). The defer is necessary to steer clear of DMA zone
>>>>>>>> memory range to avoid overlap allocation.
>>>>>>>>
>>>>>>>> [[[
>>>>>>>> So crash kernel memory boundaries are not known when mapping all bank memory
>>>>>>>> ranges, which otherwise means not possible to exclude crash kernel range
>>>>>>>> from creating block mappings so page-granularity mappings are created for
>>>>>>>> the entire memory range.
>>>>>>>> ]]]"
>>>>>>>>
>>>>>>>> Namely, the init order: memblock init--->linear mem mapping(4k mapping for
>>>>>>>> crashkernel, requirinig page-granularity changing))--->zone dma
>>>>>>>> limit--->reserve crashkernel.
>>>>>>>> So when enable ZONE DMA and using crashkernel, the mem mapping using 4k
>>>>>>>> mapping.
>>>>>>>
>>>>>>> Yes, I understand that is how things work today but I'm saying that we may
>>>>>>> as well leave the crashkernel mapped (at block granularity) if
>>>>>>> !can_set_direct_map() and then I think your patch becomes a lot simpler.
>>>>>>
>>>>>> But Page-granularity mapppings are necessary for crash kernel memory range
>>>>>> for shrinking its size via /sys/kernel/kexec_crash_size interfac(Quoted from
>>>>>> arch/arm64/mm/init.c).
>>>>>> So this patch split block/section mapping to 4k page-granularity mapping for
>>>>>> crashkernel mem.
>>>>>
>>>>> Why? I don't see why the mapping granularity is relevant at all if we
>>>>> always leave the whole thing mapped.
>>>>>
>>>> There is another reason.
>>>>
>>>> When loading crashkernel finish, the do_kexec_load will use
>>>> arch_kexec_protect_crashkres to invalid all the pagetable for crashkernel
>>>> mem(protect crashkernel mem from access).
>>>>
>>>> arch_kexec_protect_crashkres--->set_memory_valid--->...--->apply_to_pmd_range
>>>>
>>>> In the apply_to_pmd_range, there is a judement: BUG_ON(pud_huge(*pud)). And
>>>> if the crashkernel use block/section mapping, there will be some error.
>>>>
>>>> Namely, it's need to use non block/section mapping for crashkernel mem
>>>> before shringking.
>>>
>>> Well, yes, but we can change arch_kexec_[un]protect_crashkres() not to do
>>> that if we're leaving the thing mapped, no?
>>>
>> I think we should use arch_kexec_[un]protect_crashkres for crashkernel mem.
>>
>> Because when invalid crashkernel mem pagetable, there is no chance to rd/wr
>> the crashkernel mem by mistake.
>>
>> If we don't use arch_kexec_[un]protect_crashkres to invalid crashkernel mem
>> pagetable, there maybe some write operations to these mem by mistake which
>> may cause crashkernel boot error and vmcore saving error.
>
> I don't really buy this line of reasoning. The entire main kernel is
> writable, so why do we care about protecting the crashkernel so much? The
> _code_ to launch the crash kernel is writable! If you care about preventing
> writes to memory which should not be writable, then you should use
> rodata=full.
>
> Will
Thanks.
1.I think the normal kernel mem can be writeable or readonly
2.But the normal kernel should't access(read/write) crashkernel mem
after the crashkernel is loaded to the mem.(despite the normal kernel
write/read access protect)
So invalid pagetable for crashkernel mem is needed.
With this method, it can guarantee the usability of the crashkernel when
occuring emergency and generating vmcore.
Powered by blists - more mailing lists