linux-kernel - Re: [PATCH v2 2/5] LoongArch: Add kexec

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a15ad5bd-f54d-466c-8bdd-6f5b5936abee@linux.dev>
Date: Fri, 22 Aug 2025 10:56:18 +0800
From: Youling Tang <youling.tang@...ux.dev>
To: Yao Zi <ziyao@...root.org>, Huacai Chen <chenhuacai@...nel.org>
Cc: WANG Xuerui <kernel@...0n.name>, Baoquan He <bhe@...hat.com>,
 kexec@...ts.infradead.org, loongarch@...ts.linux.dev,
 linux-kernel@...r.kernel.org, Youling Tang <tangyouling@...inos.cn>
Subject: Re: [PATCH v2 2/5] LoongArch: Add kexec_file support

On 2025/8/20 17:13, Youling Tang wrote:

> Hi, Yao
>
> On 2025/8/20 14:50, Yao Zi wrote:
>
>> On Wed, Aug 20, 2025 at 01:56:57PM +0800, Youling Tang wrote:
>>> From: Youling Tang <tangyouling@...inos.cn>
>>>
>>> This patch adds support for kexec_file on LoongArch.
>>>
>>> The efi_kexec_load() as two parts:
>>> - the first part loads the kernel image (vmlinuz.efi or vmlinux.efi)
>>> - the second part loads other segments (eg: initrd, cmdline)
>>>
>>> This initrd will be passed to the second kernel via the command line
>>> 'initrd=start,size'.
>>>
>>> Currently, pez(vmlinuz.efi) and pei(vmlinux.efi) format images are 
>>> supported,
>>> but ELF format is not supported.
>>>
>>> Signed-off-by: Youling Tang <tangyouling@...inos.cn>
>>> ---
>>>   arch/loongarch/Kconfig                     |   9 ++
>>>   arch/loongarch/include/asm/image.h         |  17 +++
>>>   arch/loongarch/include/asm/kexec.h         |  12 +++
>>>   arch/loongarch/kernel/Makefile             |   1 +
>>>   arch/loongarch/kernel/kexec_efi.c          | 111 +++++++++++++++++++
>>>   arch/loongarch/kernel/machine_kexec.c      |  33 ++++--
>>>   arch/loongarch/kernel/machine_kexec_file.c | 117 
>>> +++++++++++++++++++++
>>>   7 files changed, 289 insertions(+), 11 deletions(-)
>>>   create mode 100644 arch/loongarch/kernel/kexec_efi.c
>>>   create mode 100644 arch/loongarch/kernel/machine_kexec_file.c
>> ...
>>
>>> diff --git a/arch/loongarch/include/asm/image.h 
>>> b/arch/loongarch/include/asm/image.h
>>> index 1f090736e71d..655d5836c4e8 100644
>>> --- a/arch/loongarch/include/asm/image.h
>>> +++ b/arch/loongarch/include/asm/image.h
>>> @@ -36,5 +36,22 @@ struct loongarch_image_header {
>>>       uint32_t pe_header;
>>>   };
>>>   +static const uint8_t loongarch_image_pe_sig[2] = {'M', 'Z'};
>>> +
>>> +/**
>>> + * loongarch_header_check_pe_sig - Helper to check the loongarch 
>>> image header.
>>> + *
>>> + * Returns non-zero if 'MZ' signature is found.
>>> + */
>>> +
>>> +static inline int loongarch_header_check_pe_sig(const struct 
>>> loongarch_image_header *h)
>>> +{
>>> +    if (!h)
>>> +        return 0;
>>> +
>>> +    return (h->pe_sig[0] == loongarch_image_pe_sig[0]
>>> +        && h->pe_sig[1] == loongarch_image_pe_sig[1]);
>>> +}
>> This check is still too weak and doesn't improve comparing to v1.
>>
>>> This could be simplified with a memcmp(). Also, this check isn't
>>> strict enough: PE files for any architectures, and even legacy MS-DOS
>>> COM executables all start with "MZ".
>> I've pointed this out in my previous reply[1].
> Previously, I had considered adding a specific LoongArch magic
> number (such as "Loongson") in the loongarch_image_header, but
> this is incompatible with older versions of the kernel, so it
> remains the same without further checks.
>>
>>>   #endif /* __ASSEMBLY__ */
>>>   #endif /* __ASM_IMAGE_H */
>> ...
>>
>>> diff --git a/arch/loongarch/kernel/kexec_efi.c 
>>> b/arch/loongarch/kernel/kexec_efi.c
>>> new file mode 100644
>>> index 000000000000..7741f1139a12
>>> --- /dev/null
>>> +++ b/arch/loongarch/kernel/kexec_efi.c
>> ...
>>
>>> +static void *efi_kexec_load(struct kimage *image,
>>> +                char *kernel, unsigned long kernel_len,
>>> +                char *initrd, unsigned long initrd_len,
>>> +                char *cmdline, unsigned long cmdline_len)
>>> +{
>>> +    struct loongarch_image_header *h;
>>> +    struct kexec_buf kbuf;
>>> +    unsigned long text_offset, kernel_segment_number;
>>> +    struct kexec_segment *kernel_segment;
>>> +    int ret;
>>> +
>>> +    h = (struct loongarch_image_header *)kernel;
>>> +    if (!h->image_size)
>>> +        return ERR_PTR(-EINVAL);
>>> +
>>> +    /* Load the kernel */
>>> +    kbuf.image = image;
>>> +    kbuf.buf_max = ULONG_MAX;
>>> +    kbuf.top_down = false;
>>> +
>>> +    kbuf.buffer = kernel;
>>> +    kbuf.bufsz = kernel_len;
>>> +    kbuf.mem = KEXEC_BUF_MEM_UNKNOWN;
>>> +    kbuf.memsz = le64_to_cpu(h->image_size);
>>> +    text_offset = le64_to_cpu(h->text_offset);
>>> +    kbuf.buf_min = text_offset;
>>> +    kbuf.buf_align = SZ_2M;
>>> +
>>> +    kernel_segment_number = image->nr_segments;
>>> +
>>> +    /*
>>> +     * The location of the kernel segment may make it impossible to 
>>> satisfy
>>> +     * the other segment requirements, so we try repeatedly to find a
>>> +     * location that will work.
>>> +     */
>>> +    while ((ret = kexec_add_buffer(&kbuf)) == 0) {
>>> +        /* Try to load additional data */
>>> +        kernel_segment = &image->segment[kernel_segment_number];
>>> +        ret = load_other_segments(image, kernel_segment->mem,
>>> +                      kernel_segment->memsz, initrd,
>>> +                      initrd_len, cmdline, cmdline_len);
>>> +        if (!ret)
>>> +            break;
>>> +
>>> +        /*
>>> +         * We couldn't find space for the other segments; erase the
>>> +         * kernel segment and try the next available hole.
>>> +         */
>>> +        image->nr_segments -= 1;
>>> +        kbuf.buf_min = kernel_segment->mem + kernel_segment->memsz;
>>> +        kbuf.mem = KEXEC_BUF_MEM_UNKNOWN;
>>> +    }
>>> +
>>> +    if (ret) {
>>> +        pr_err("Could not find any suitable kernel location!");
>>> +        return ERR_PTR(ret);
>>> +    }
>>> +
>>> +    kernel_segment = &image->segment[kernel_segment_number];
>>> +
>>> +    /* Make sure the second kernel jumps to the correct 
>>> "kernel_entry". */
>>> +    image->start = kernel_segment->mem + h->kernel_entry - 
>>> text_offset;
>> And this still assumes the loaded, secondary kernel is relocatable,
>> with neither extra check nor comment explaining its limitation.
>>
>> Please see my previous reply[2] that explains why loading a
>> non-relocatble kernel with kexec_file API is reasonable.
> LoongArch is a non-position independent (non-PIE) kernel when
> the RELOCATABLE option is not enabled, the kernel contains certain
> instructions such as la.abs, which prevent it from being relocated to
> arbitrary memory addresses for execution. As a result, limitations
> exist that make features like kdump or kexec_file dependent on
> the RELOCATABLE option.
>
> Strictly speaking, we need to add additional checks: if the kernel is
> non-relocatable, the loading operation should fail directly. For a
> running kernel, we can easily determine this by calling
> kallsyms_lookup_name("relocate_kernel"). However, for a kernel
> that is being loaded but has not yet started execution, it is difficult
> to easily determine whether the currently loaded kernel has the
> RELOCATABLE configuration option enabled.
>
> For ELF format images, we can determine whether the loaded image
> contains the ".la_abs" section in the following way:
> static struct mem_shdr *laabs_section(const struct mem_ehdr *ehdr)
> {
>         struct mem_shdr *shdr, *shdr_end;
>         unsigned char *strtab;
>
>         strtab = (unsigned char *)ehdr->e_shdr[ehdr->e_shstrndx].sh_data;
>         shdr_end = &ehdr->e_shdr[ehdr->e_shnum];
>         for (shdr = ehdr->e_shdr; shdr != shdr_end; shdr++) {
>                 if (shdr->sh_size &&
>                         strcmp((char *)&strtab[shdr->sh_name], 
> ".la_abs") == 0) {
>                         return shdr;
>                 }
>         }
>
>         return NULL;
> }
I attempted to parse the pe header to obtain the sections information
and found that there were only two sections, '.text' and '.data'. We
cannot parse whether there is a '.la_abs' section like in the ELF format.

The reason is that when generating vmlinux.efi, when the ELF vmlinux
is converted to the original binary file through the 'objdump -O binary'
operation (arch/loongarch/boot/Makefile), the remaining sections are
merged into the '.text' and '.data' sections.

Youling.
>
> Thanks,
> Youling.
>>
>>> +    kexec_dprintk("Loaded kernel at 0x%lx bufsz=0x%lx memsz=0x%lx\n",
>>> +              kernel_segment->mem, kbuf.bufsz,
>>> +              kernel_segment->memsz);
>>> +
>>> +    return NULL;
>>> +}
>>> +
>>> +const struct kexec_file_ops kexec_efi_ops = {
>>> +    .probe = efi_kexec_probe,
>>> +    .load = efi_kexec_load,
>>> +};
>> Thanks,
>> Yao Zi
>>
>> [1]: https://lore.kernel.org/all/aJojDiHWi8cgvA2W@pie/
>> [2]: https://lore.kernel.org/all/aJwFa8x5BQMouB1y@pie/