linux-kernel - Re: [PATCH V6 09/22] LoongArch: Add boot and setup routines

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAAhV-H6z3H3QbzvG6=fgVJF1z2qEvKVGnyqb--bkqomH3jTXJQ@mail.gmail.com>
Date:   Fri, 4 Mar 2022 20:43:03 +0800
From:   Huacai Chen <chenhuacai@...il.com>
To:     Mike Rapoport <rppt@...nel.org>
Cc:     Huacai Chen <chenhuacai@...ngson.cn>,
        Arnd Bergmann <arnd@...db.de>,
        Andy Lutomirski <luto@...nel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Peter Zijlstra <peterz@...radead.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        David Airlie <airlied@...ux.ie>,
        Jonathan Corbet <corbet@....net>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        linux-arch <linux-arch@...r.kernel.org>,
        "open list:DOCUMENTATION" <linux-doc@...r.kernel.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Xuefeng Li <lixuefeng@...ngson.cn>,
        Yanteng Si <siyanteng@...ngson.cn>,
        Jiaxun Yang <jiaxun.yang@...goat.com>,
        Ard Biesheuvel <ardb@...nel.org>,
        linux-efi <linux-efi@...r.kernel.org>
Subject: Re: [PATCH V6 09/22] LoongArch: Add boot and setup routines

Hi, Mike,

On Fri, Mar 4, 2022 at 6:49 PM Mike Rapoport <rppt@...nel.org> wrote:
>
> Hi,
>
> On Thu, Mar 03, 2022 at 10:47:53PM +0800, Huacai Chen wrote:
> > Hi, Mike,
> >
> > On Thu, Mar 3, 2022 at 7:41 PM Mike Rapoport <rppt@...nel.org> wrote:
> > >
> > > On Sat, Feb 26, 2022 at 07:03:25PM +0800, Huacai Chen wrote:
> > > > This patch adds basic boot, setup and reset routines for LoongArch.
> > > > LoongArch uses UEFI-based firmware. The firmware uses ACPI and DMI/
> > > > SMBIOS to pass configuration information to the Linux kernel (in elf
> > > > format).
> > > >
> > > > Now the boot information passed to kernel is like this:
> > > > 1, kernel get 3 register values (a0, a1 and a2) from bootloader.
> > > > 2, a0 is "argc", a1 is "argv", so "kernel cmdline" comes from a0/a1.
> > > > 3, a2 is "environ", which is a pointer to "struct bootparamsinterface".
> > > > 4, "struct bootparamsinterface" include a "systemtable" pointer, whose
> > > >    type is "efi_system_table_t". Most configuration information, include
> > > >    ACPI tables and SMBIOS tables, come from here.
> > > >
> > > > Cc: Ard Biesheuvel <ardb@...nel.org>
> > > > Cc: linux-efi@...r.kernel.org
> > > > Signed-off-by: Huacai Chen <chenhuacai@...ngson.cn>
> > > > ---
> > >
> > > > +void __init arch_reserve_mem_area(acpi_physical_address addr, size_t size)
> > > > +{
> > > > +     memblock_mark_nomap(addr, size);
> > > > +}
> > >
> > > Is there any problem if the memory ranges used by ACPI will be mapped into
> > > the kernel page tables?
> > >
> > > If not, consider dropping this function.
> >
> > This API is mostly used for ACPI upgrading. ACPI upgrading alloc a
> > normal memory block and then is used as ACPI memory, and this memory
> > block will not be used by the page allocator. Other architectures,
> > such as ARM64, do the same thing here.
>
> ARM64 had quite a lot of issues with NOMAP memory, so I'd recommend to
> avoid using memblock_mark_nomap() unless it is required by MMU constraints
> on loongarch.
>
> I'm not familiar with loongarch MMU details, so I can only give some
> background for NOMAP for you to decide.
>
> Marking memory region NOMAP is required when this region cannot be a part
> of the kernel linear mapping because MMU does not allow aliased mappings
> with different caching modes. E.g. in ARM64 case, ACPI memory that should
> be mapped uncached cannot be mapped as cached in the kernel linear map.
>
> If the memory block should not be used by the page allocator, it should be
> memblock_reserve()'ed rather than marked NOMAP.
Thank you for telling me the background, we will use memblock_reserve() instead.

>
> > > > diff --git a/arch/loongarch/kernel/mem.c b/arch/loongarch/kernel/mem.c
> > > > new file mode 100644
> > > > index 000000000000..361d108a2b82
> > > > --- /dev/null
> > > > +++ b/arch/loongarch/kernel/mem.c
> > > > @@ -0,0 +1,89 @@
> > > > +// SPDX-License-Identifier: GPL-2.0-or-later
> > > > +/*
> > > > + * Copyright (C) 2020-2022 Loongson Technology Corporation Limited
> > > > + */
> > > > +#include <linux/fs.h>
> > > > +#include <linux/mm.h>
> > > > +#include <linux/memblock.h>
> > > > +
> > > > +#include <asm/bootinfo.h>
> > > > +#include <asm/loongson.h>
> > > > +#include <asm/sections.h>
> > > > +
> > > > +void __init early_memblock_init(void)
> > > > +{
> > > > +     int i;
> > > > +     u32 mem_type;
> > > > +     u64 mem_start, mem_end, mem_size;
> > > > +
> > > > +     /* Parse memory information */
> > > > +     for (i = 0; i < loongson_mem_map->map_count; i++) {
> > > > +             mem_type = loongson_mem_map->map[i].mem_type;
> > > > +             mem_start = loongson_mem_map->map[i].mem_start;
> > > > +             mem_size = loongson_mem_map->map[i].mem_size;
> > > > +             mem_end = mem_start + mem_size;
> > > > +
> > > > +             switch (mem_type) {
> > > > +             case ADDRESS_TYPE_SYSRAM:
> > > > +                     memblock_add(mem_start, mem_size);
> > > > +                     if (max_low_pfn < (mem_end >> PAGE_SHIFT))
> > > > +                             max_low_pfn = mem_end >> PAGE_SHIFT;
> > > > +                     break;
> > > > +             }
> > > > +     }
> > > > +     memblock_set_current_limit(PFN_PHYS(max_low_pfn));
> > > > +}
> > > > +
> > > > +void __init fw_init_memory(void)
> > > > +{
> > > > +     int i;
> > > > +     u32 mem_type;
> > > > +     u64 mem_start, mem_end, mem_size;
> > > > +     unsigned long start_pfn, end_pfn;
> > > > +     static unsigned long num_physpages;
> > > > +
> > > > +     /* Parse memory information */
> > > > +     for (i = 0; i < loongson_mem_map->map_count; i++) {
> > > > +             mem_type = loongson_mem_map->map[i].mem_type;
> > > > +             mem_start = loongson_mem_map->map[i].mem_start;
> > > > +             mem_size = loongson_mem_map->map[i].mem_size;
> > > > +             mem_end = mem_start + mem_size;
> > >
> > > I think this loop can be merged with loop in early_memblock_init() then ...
> > >
> > > > +
> > > > +             switch (mem_type) {
> > > > +             case ADDRESS_TYPE_SYSRAM:
> > > > +                     mem_start = PFN_ALIGN(mem_start);
> > > > +                     mem_end = PFN_ALIGN(mem_end - PAGE_SIZE + 1);
> > > > +                     num_physpages += (mem_size >> PAGE_SHIFT);
> > > > +                     memblock_set_node(mem_start, mem_size, &memblock.memory, 0);
> > >
> > > this will become memblock_add_node()
> > >
> > > > +                     break;
> > > > +             case ADDRESS_TYPE_ACPI:
> > > > +                     mem_start = PFN_ALIGN(mem_start);
> > > > +                     mem_end = PFN_ALIGN(mem_end - PAGE_SIZE + 1);
> > > > +                     num_physpages += (mem_size >> PAGE_SHIFT);
> > > > +                     memblock_add(mem_start, mem_size);
> > > > +                     memblock_set_node(mem_start, mem_size, &memblock.memory, 0);
> > >
> > > as well as this.
> > early_memblock_init() only adds the "usable" memory (SYSRAM) for early
> > use and without numa node information. Other types of memory are
> > handled later by fw_init_memory()/fw_init_numa_memory(), depending on
> > whether CONFIG_NUMA is enabled. So, in
> > fw_init_memory()/fw_init_numa_memory() we only need to call
> > memblock_set_node() to add the node information for SYSRAM type.
>
> There are two potential issues here with doing memblock_add() and
> memblock_set_node() and memblock_reserve() separately with a couple of
> functions called in between.
>
> First, and most important is that you must to memblock_reserve() all the
> memory used by the firmware, like ADDRESS_TYPE_ACPI, ADDRESS_TYPE_RESERVED,
> kernel image, initrd etc *before* any call to memblock_alloc*()
> functions. If you add memory to memblock before reserving firmware regions,
> a call to memblock_alloc*() may allocate the used memory and all kinds of
> errors may happen because of that.
>
> Second, presuming you use SRAT for NUMA information, if you set nodes in
> memblock after there were memory allocations from memblock you may impair
> the ability to hot-remove memory banks.
>
> So ideally, the physical memory detection and registration should follow
> something like:
>
> * memblock_reserve() the memory used by firmware, kernel and initrd
> * detect NUMA topology
> * add memory regions along with their node ids to memblock.
>
> s390::setup_arch() is a good example of doing early reservations:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/s390/kernel/setup.c#n988
I have a fast reading of S390, and I think we can do some adjust:
1, call memblock_set_node(0, ULONG_MAX, &memblock.memory, 0) in
early_memblock_init().
2, move memblock_reserve(PHYS_OFFSET, 0x200000) and
memblock_reserve(__pa_symbol(&_text), __pa_symbol(&_end) -
__pa_symbol(&_text)) to early_memblock_init().
3, Reserve initrd memory in the first place.
It is nearly the same as the S390, then.

>
> > > > diff --git a/arch/loongarch/kernel/setup.c b/arch/loongarch/kernel/setup.c
> > > > new file mode 100644
> > > > index 000000000000..8dfe1d9b55f7
> > > > --- /dev/null
> > > > +++ b/arch/loongarch/kernel/setup.c
> > > > +
> > > > +static int usermem __initdata;
> > > > +
> > > > +static int __init early_parse_mem(char *p)
> > > > +{
> > > > +     phys_addr_t start, size;
> > > > +
> > > > +     /*
> > > > +      * If a user specifies memory size, we
> > > > +      * blow away any automatically generated
> > > > +      * size.
> > > > +      */
> > > > +     if (usermem == 0) {
> > > > +             usermem = 1;
> > > > +             memblock_remove(memblock_start_of_DRAM(),
> > > > +                     memblock_end_of_DRAM() - memblock_start_of_DRAM());
> > > > +     }
> > > > +     start = 0;
> > > > +     size = memparse(p, &p);
> > > > +     if (*p == '@')
> > > > +             start = memparse(p + 1, &p);
> > > > +
> > > > +     memblock_add(start, size);
> > > > +
> > > > +     return 0;
> > > > +}
> > > > +early_param("mem", early_parse_mem);
> > > > +
> > > > +static int __init early_parse_memmap(char *p)
> > > > +{
> > > > +     char *oldp;
> > > > +     u64 start_at, mem_size;
> > > > +
> > > > +     if (!p)
> > > > +             return -EINVAL;
> > > > +
> > > > +     if (!strncmp(p, "exactmap", 8)) {
> > > > +             pr_err("\"memmap=exactmap\" invalid on LoongArch\n");
> > > > +             return 0;
> > > > +     }
> > > > +
> > > > +     oldp = p;
> > > > +     mem_size = memparse(p, &p);
> > > > +     if (p == oldp)
> > > > +             return -EINVAL;
> > > > +
> > > > +     if (*p == '@') {
> > > > +             start_at = memparse(p+1, &p);
> > > > +             memblock_add(start_at, mem_size);
> > > > +     } else if (*p == '#') {
> > > > +             pr_err("\"memmap=nn#ss\" (force ACPI data) invalid on LoongArch\n");
> > > > +             return -EINVAL;
> > > > +     } else if (*p == '$') {
> > > > +             start_at = memparse(p+1, &p);
> > > > +             memblock_add(start_at, mem_size);
> > > > +             memblock_reserve(start_at, mem_size);
> > > > +     } else {
> > > > +             pr_err("\"memmap\" invalid format!\n");
> > > > +             return -EINVAL;
> > > > +     }
> > > > +
> > > > +     if (*p == '\0') {
> > > > +             usermem = 1;
> > > > +             return 0;
> > > > +     } else
> > > > +             return -EINVAL;
> > > > +}
> > > > +early_param("memmap", early_parse_memmap);
> > >
> > > The memmap= processing is a hack indented to workaround bugs in firmware
> > > related to the memory detection. Please don't copy if over unless there is
> > > really strong reason.
> >
> > Hmmm, I have read the documents, most archs only support mem=limit,
> > but MIPS support mem=limit@...e. memmap not only supports
> > memmap=limit@...e, but also a lot of advanced syntax. LoongArch needs
> > both limit and limit@...e syntax. So can we make our code to support
> > only mem=limit and memmap=limit@...e, and remove all other syntax
> > here?
>
> The documentation describes what was there historically and both these
> options tend not to play well with complex memory layouts.
>
> If you must have them it's better to use x86 as an example rather than
> MIPS, just take into the account that on x86 memory always starts from 0,
> so they never needed to have a different base.
>
> For what use-cases LoongArch needs options?
The use-case of limit@...e syntax is kdump, because our kernel is not
relocatable. I'll use X86 as an example.

Huacai

>
> --
> Sincerely yours,
> Mike.