lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 27 May 2021 07:37:05 -0700
From:   Andrii Nakryiko <andrii.nakryiko@...il.com>
To:     Mel Gorman <mgorman@...hsingularity.net>
Cc:     Andrew Morton <akpm@...ux-foundation.org>,
        Christoph Hellwig <hch@...radead.org>,
        Arnaldo Carvalho de Melo <acme@...hat.com>,
        Michal Suchanek <msuchanek@...e.de>,
        Alexei Starovoitov <ast@...nel.org>,
        Daniel Borkmann <daniel@...earbox.net>,
        Martin KaFai Lau <kafai@...com>,
        Song Liu <songliubraving@...com>, Yonghong Song <yhs@...com>,
        John Fastabend <john.fastabend@...il.com>,
        KP Singh <kpsingh@...nel.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Jiri Olsa <jolsa@...nel.org>,
        Hritik Vijay <hritikxx8@...il.com>,
        Linux-BPF <bpf@...r.kernel.org>,
        Linux-Net <netdev@...r.kernel.org>, Linux-MM <linux-mm@...ck.org>
Subject: Re: [PATCH v2] mm/page_alloc: Work around a pahole limitation with
 zero-sized struct pagesets

On Thu, May 27, 2021 at 5:02 AM Mel Gorman <mgorman@...hsingularity.net> wrote:
>
> This patch replaces
> mm-page_alloc-convert-per-cpu-list-protection-to-local_lock-fix.patch in
> Andrew's tree.
>
> Michal Suchanek reported the following problem with linux-next
>
>   [    0.000000] Linux version 5.13.0-rc2-next-20210519-1.g3455ff8-vanilla (geeko@...ldhost) (gcc (SUSE Linux) 10.3.0, GNU ld (GNU Binutils; openSUSE Tumbleweed) 2.36.1.20210326-3) #1 SMP Wed May 19 10:05:10 UTC 2021 (3455ff8)
>   [    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-5.13.0-rc2-next-20210519-1.g3455ff8-vanilla root=UUID=ec42c33e-a2c2-4c61-afcc-93e9527 8f687 plymouth.enable=0 resume=/dev/disk/by-uuid/f1fe4560-a801-4faf-a638-834c407027c7 mitigations=auto earlyprintk initcall_debug nomodeset earlycon ignore_loglevel console=ttyS0,115200
> ...
>   [   26.093364] calling  tracing_set_default_clock+0x0/0x62 @ 1
>   [   26.098937] initcall tracing_set_default_clock+0x0/0x62 returned 0 after 0 usecs
>   [   26.106330] calling  acpi_gpio_handle_deferred_request_irqs+0x0/0x7c @ 1
>   [   26.113033] initcall acpi_gpio_handle_deferred_request_irqs+0x0/0x7c returned 0 after 3 usecs
>   [   26.121559] calling  clk_disable_unused+0x0/0x102 @ 1
>   [   26.126620] initcall clk_disable_unused+0x0/0x102 returned 0 after 0 usecs
>   [   26.133491] calling  regulator_init_complete+0x0/0x25 @ 1
>   [   26.138890] initcall regulator_init_complete+0x0/0x25 returned 0 after 0 usecs
>   [   26.147816] Freeing unused decrypted memory: 2036K
>   [   26.153682] Freeing unused kernel image (initmem) memory: 2308K
>   [   26.165776] Write protecting the kernel read-only data: 26624k
>   [   26.173067] Freeing unused kernel image (text/rodata gap) memory: 2036K
>   [   26.180416] Freeing unused kernel image (rodata/data gap) memory: 1184K
>   [   26.187031] Run /init as init process
>   [   26.190693]   with arguments:
>   [   26.193661]     /init
>   [   26.195933]   with environment:
>   [   26.199079]     HOME=/
>   [   26.201444]     TERM=linux
>   [   26.204152]     BOOT_IMAGE=/boot/vmlinuz-5.13.0-rc2-next-20210519-1.g3455ff8-vanilla
>   [   26.254154] BPF:      type_id=35503 offset=178440 size=4
>   [   26.259125] BPF:
>   [   26.261054] BPF:Invalid offset
>   [   26.264119] BPF:
>   [   26.264119]
>   [   26.267437] failed to validate module [efivarfs] BTF: -22
>
> Andrii Nakryiko bisected the problem to the commit "mm/page_alloc: convert
> per-cpu list protection to local_lock" currently staged in mmotm. In his
> own words
>
>   The immediate problem is two different definitions of numa_node per-cpu
>   variable. They both are at the same offset within .data..percpu ELF
>   section, they both have the same name, but one of them is marked as
>   static and another as global. And one is int variable, while another
>   is struct pagesets. I'll look some more tomorrow, but adding Jiri and
>   Arnaldo for visibility.
>
>   [110907] DATASEC '.data..percpu' size=178904 vlen=303
>   ...
>         type_id=27753 offset=163976 size=4 (VAR 'numa_node')
>         type_id=27754 offset=163976 size=4 (VAR 'numa_node')
>
>   [27753] VAR 'numa_node' type_id=27556, linkage=static
>   [27754] VAR 'numa_node' type_id=20, linkage=global
>
>   [20] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED
>
>   [27556] STRUCT 'pagesets' size=0 vlen=1
>         'lock' type_id=507 bits_offset=0
>
>   [506] STRUCT '(anon)' size=0 vlen=0
>   [507] TYPEDEF 'local_lock_t' type_id=506
>
> The patch in question introduces a zero-sized per-cpu struct and while
> this is not wrong, versions of pahole prior to 1.22 (unreleased) get
> confused during BTF generation with two separate variables occupying the
> same address.
>
> This patch checks for older versions of pahole and only allows
> DEBUG_INFO_BTF_MODULES if pahole supports zero-sized per-cpu structures.
> DEBUG_INFO_BTF is still allowed as a KVM boot test passed with pahole

Unfortunately this won't work. The problem is that vmlinux BTF is
corrupted, which results in module BTFs to be rejected as well, as
they depend on it.

But vmlinux BTF corruption makes BPF subsystem completely unusable. So
even though kernel boots, nothing BPF-related works. So we'd need to
add dependency for DEBUG_INFO_BTF on pahole 1.22+.

> v1.19.  While pahole 1.22 does not exist yet, it is assumed that Hritik's
> fix that allows DEBUG_INFO_BTF_MODULES to work will be included in that
> release.
>
> Reported-by: Michal Suchanek <msuchanek@...e.de>
> Reported-by: Hritik Vijay <hritikxx8@...il.com>
> Debugged-by: Andrii Nakryiko <andrii.nakryiko@...il.com>
> Signed-off-by: Mel Gorman <mgorman@...hsingularity.net>
> ---
>  lib/Kconfig.debug | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index 678c13967580..51b355cbe6d7 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -313,9 +313,12 @@ config DEBUG_INFO_BTF
>  config PAHOLE_HAS_SPLIT_BTF
>         def_bool $(success, test `$(PAHOLE) --version | sed -E 's/v([0-9]+)\.([0-9]+)/\1\2/'` -ge "119")
>
> +config PAHOLE_HAS_ZEROSIZE_PERCPU_SUPPORT
> +       def_bool $(success, test `$(PAHOLE) --version | sed -E 's/v([0-9]+)\.([0-9]+)/\1\2/'` -ge "122")
> +
>  config DEBUG_INFO_BTF_MODULES
>         def_bool y
> -       depends on DEBUG_INFO_BTF && MODULES && PAHOLE_HAS_SPLIT_BTF
> +       depends on DEBUG_INFO_BTF && MODULES && PAHOLE_HAS_SPLIT_BTF && PAHOLE_HAS_ZEROSIZE_PERCPU_SUPPORT
>         help
>           Generate compact split BTF type information for kernel modules.
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ