lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20241213092603.13399-1-laura.nao@collabora.com>
Date: Fri, 13 Dec 2024 10:26:03 +0100
From: Laura Nao <laura.nao@...labora.com>
To: stephen.s.brennan@...cle.com
Cc: alan.maguire@...cle.com,
	bpf@...r.kernel.org,
	chrome-platform@...ts.linux.dev,
	kernel@...labora.com,
	laura.nao@...labora.com,
	linux-kernel@...r.kernel.org,
	olsajiri@...il.com,
	regressions@...ts.linux.dev
Subject: Re: [REGRESSION] module BTF validation failure (Error -22) on

On 12/12/24 22:49, Stephen Brennan wrote:
> Jiri Olsa <olsajiri@...il.com> writes:
>> On Wed, Dec 11, 2024 at 10:10:24PM +0100, Jiri Olsa wrote:
>>> On Tue, Dec 10, 2024 at 02:55:01PM +0100, Laura Nao wrote:
>>>> Hi Jiri,
>>>>
>>>> Thanks for the feedback!
>>>>
>>>> On 12/6/24 13:35, Jiri Olsa wrote:
>>>>> On Fri, Nov 15, 2024 at 06:17:12PM +0100, Laura Nao wrote:
>>>>>> On 11/13/24 10:37, Laura Nao wrote:
>>>>>>>
>>>>>>> Currently, KernelCI only retains the bzImage, not the vmlinux
>>>>>>> binary. The
>>>>>>> bzImage can be downloaded from the same link mentioned above by
>>>>>>> selecting
>>>>>>> 'kernel' from the dropdown menu (modules can also be downloaded
>>>>>>> the
>>>>>>> same
>>>>>>> way). I’ll try to replicate the build on my end and share the
>>>>>>> vmlinux
>>>>>>> with DWARF data stripped for convenience.
>>>>>>>
>>>>>>
>>>>>> I managed to reproduce the issue locally and I've uploaded the
>>>>>> vmlinux[1]
>>>>>> (stripped of DWARF data) and vmlinux.raw[2] files, as well as one
>>>>>> of
>>>>>> the
>>>>>> modules[3] and its btf data[4] extracted with:
>>>>>>
>>>>>> bpftool -B vmlinux btf dump file cros_kbd_led_backlight.ko >
>>>>>> cros_kbd_led_backlight.ko.raw
>>>>>>
>>>>>> Looking again at the logs[5], I've noticed the following is
>>>>>> reported:
>>>>>>
>>>>>> [    0.415885] BPF: 	 type_id=115803 offset=177920 size=1152
>>>>>> [    0.416029] BPF:
>>>>>> [    0.416083] BPF: Invalid offset
>>>>>> [    0.416165] BPF:
>>>>>>
>>>>>> There are two different definitions of rcu_data in
>>>>>> '.data..percpu',
>>>>>> one
>>>>>> is a struct and the other is an integer:
>>>>>>
>>>>>> type_id=115801 offset=177920 size=1152 (VAR 'rcu_data')
>>>>>> type_id=115803 offset=177920 size=1152 (VAR 'rcu_data')
>>>>>>
>>>>>> [115801] VAR 'rcu_data' type_id=115572, linkage=static
>>>>>> [115803] VAR 'rcu_data' type_id=1, linkage=static
>>>>>>
>>>>>> [115572] STRUCT 'rcu_data' size=1152 vlen=69
>>>>>> [1] INT 'long unsigned int' size=8 bits_offset=0 nr_bits=64
>>>>>> encoding=(none)
>>>>>>
>>>>>> I assume that's not expected, correct?
>>>>>
>>>>> yes, that seems wrong.. but I can't reproduce with your config
>>>>> together with pahole 1.24 .. could you try with latest one?
>>>>
>>>> I just tested next-20241210 with the latest pahole version (1.28
>>>> from
>>>> the master branch[1]), and the issue does not occur with this
>>>> version
>>>> (I can see only one instance of rcu_data in the BTF data, as
>>>> expected).
>>>>
>>>> I can confirm that the same kernel revision still exhibits the
>>>> issue
>>>> with pahole 1.24.
>>>>
>>>> If helpful, I can also test versions between 1.24 and 1.28 to
>>>> identify
>>>> which ones work.
>>>
>>> I managed to reproduce finally with gcc-12, but had to use pahole
>>> 1.25,
>>> 1.24 failed with unknown attribute
>>>
>>> 	[95096] VAR 'rcu_data' type_id=94868, linkage=static
>>> 	[95098] VAR 'rcu_data' type_id=4, linkage=static
>>> 	type_id=95096 offset=177088 size=1152 (VAR 'rcu_data')
>>> 	type_id=95098 offset=177088 size=1152 (VAR 'rcu_data')
>>
>> so for me the difference seems to be using gcc-12 and this commit in
>> linux tree:
>>    dabddd687c9e percpu: cast percpu pointer in PERCPU_PTR() via
>>    unsigned long
>>
>> which adds extra __pcpu_ptr variable into dwarf, and it has the same
>> address as the per cpu variable and that confuses pahole
>>
>> it ends up with adding per cpu variable twice.. one with real type
>> (type_id=94868) and the other with unsigned long type (type_id=4)
>>
>> however this got fixed in pahole 1.28 commit:
>>    47dcb534e253 btf_encoder: Stop indexing symbols for VARs
>>
>> which filters out __pcpu_ptr variable completely, adding Stephen to
>> the loop
> 
> Thanks for sharing this. Your analysis is spot-on, but I can fill in
> the
> details a bit. I just grabbed 6.13-rc2 and built it with gcc 11 and
> pahole 1.27, and observed the same issue:
> 
>    $ bpftool btf dump file vmlinux | grep "VAR 'rcu_data"
>    [4045] VAR 'rcu_data' type_id=3962, linkage=static
>    [4047] VAR 'rcu_data' type_id=1, linkage=static
>            type_id=4045 offset=196608 size=520 (VAR 'rcu_data')
>            type_id=4047 offset=196608 size=520 (VAR 'rcu_data')
> 
> In pahole 1.27, the (simplified) process for generating variables for
> BTF is:
> 
> 1. Look through the ELF symbol table, and find all symbols whose
> addresses are within the percpu section, and add them to a list.
> 
> 2. Look through the DWARF: for each tag of type DW_TAG_variable,
> determine if the variable is "global". If so, and if the address
> matches
> one of the symbols found in Step 1, continue.
> 
> 3. Except for one special case, pahole doesn't check whether the DWARF
> variable's name matches the symbol name. It simply emits a variable
> using the name of the symbol from Step 1, and the type information
> from
> Step 2.
> 
> The result of this process, in this case, is:
> 
> 1. kernel/rcu/tree.c contains a declaration of "rcu_data". This
> results
> in an ELF symbol in vmlinux of the same name. Great!
> 
>    $ eu-readelf -s vmlinux | grep '\brcu_data\b'
>    12319: 0000000000030000    520 OBJECT  LOCAL  DEFAULT       21
>    rcu_data
> 
> 
> 2. A DWARF entry is emitted for "rcu_data" which has a matching
> location
> (DW_AT_location has value DW_OP_addr 0x30000, matching the ELF
> symbol).
> So far so good - pahole emits a BTF variable with the expected type.
> 
>    $ llvm-dwarfdump --name=rcu_data
>    ...
>    0x01af03f1: DW_TAG_variable
>                  DW_AT_name        ("rcu_data")
>                  DW_AT_decl_file
>                  ("/home/stepbren/repos/linux-upstream/kernel/rcu/tree.c")
>                  DW_AT_decl_line   (80)
>                  DW_AT_decl_column (8)
>                  DW_AT_type        (0x01aefb38 "rcu_data")
>                  DW_AT_alignment   (0x40)
>                  DW_AT_location    (DW_OP_addr 0x30000)
> 
> 3. In kernel/rcu/tree.c, we also have the following declaration at
> line
> 5227 which uses per_cpu_ptr() on &rcu_data:
> 
> 5222 void rcutree_migrate_callbacks(int cpu)
> 5223 {
> 5224 	unsigned long flags;
> 5225 	struct rcu_data *my_rdp;
> 5226 	struct rcu_node *my_rnp;
> 5227 	struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
>                                 ^^^^^^^^^^^
> 
> With the new changes in dabddd687c9e ("percpu: cast percpu pointer in
> PERCPU_PTR() via unsigned long"), this expands to a lexical block
> which
> contains a variable named "__pcpu_ptr", of type unsigned long. The
> compiler emits the following DW_TAG_variable in the DWARF:
> 
> 0x01b05d20:         DW_TAG_variable
>                        DW_AT_name        ("__pcpu_ptr")
>                        DW_AT_decl_file
>                        ("/home/stepbren/repos/linux-upstream/kernel/rcu/tree.c")
>                        DW_AT_decl_line   (5227)
>                        DW_AT_decl_column (25)
>                        DW_AT_type        (0x01adb52e "long unsigned
>                        int")
>                        DW_AT_location    (DW_OP_addr 0x30000,
>                        DW_OP_stack_value)
> 
> Since the DW_AT_location has a DW_OP_addr - pahole understands this to
> mean that the variable is located in global memory, and thus has
> VSCOPE_GLOBAL. But of course, the actual "scope" of this variable is
> not
> global, it is limited to the lexical block, which is completely hidden
> away by the macro. But pahole 1.27 does not consider this, and since
> the
> address matches the "rcu_data" symbol, it emits a variable of type
> "long
> unsigned int" under the name "rcu_data" -- despite the fact that the
> DWARF info has a name of "__pcpu_ptr".
> 
> The changes I made in 1.28 address this (unintentionally) by:
> 
> 1. Requiring global variables be both "in the global scope" (i.e. in
> the
> CU-level, rather than any function or other lexical block.
> 2. Requiring global variables have global memory (some of them could
> be
> register variables, despite having global scope -- e.g.
> current_stack_pointer).
> 3. No longer using the ELF symbol table, and instead using the DWARF
> names for variables.
> 
> With #1, we would filter this variable. And with #3, even if the
> variable were not filtered, we would output (a bunch of) variables
> with
> the correct __pcpu_ptr variable name, which is unhelpful but at least
> helps us understand where these things come from.
> 
> Rebuilding with GCC 14, we can see that the "__pcpu_ptr" variable no
> longer has a DW_AT_location:
> 
> 0x01afa82f:         DW_TAG_variable
>                        DW_AT_name        ("__pcpu_ptr")
>                        DW_AT_decl_file
>                        ("/home/stepbren/repos/linux-upstream/kernel/rcu/tree.c")
>                        DW_AT_decl_line   (5227)
>                        DW_AT_decl_column (25)
>                        DW_AT_type        (0x01ad0267 "long unsigned
>                        int")
> 
> This is the reason that pahole 1.27 now recognizes it as
> VSCOPE_OPTIMIZED. Without a memory location pahole can't do anything
> to
> match it against the "rcu_data" variable so nothing is emitted, and we
> don't get the issue.
> 
> I'm not sure if this adds at all to the discussion, since the overall
> answer is the same, an upgrade of pahole and/or gcc. (Pahole would be
> recommended; GCC just changed the generated DWARF and I could imagine
> other situations popping up elsewhere).
>

Thank you for the help with debugging and for the detailed explanation!  
                                                                         
We'll proceed with updating pahole to v1.28 in the KernelCI build        
environment.                                                             
                                                                         
Best,                                                                    
                                                                         
Laura                                                                    
                                                                         
#regzbot resolve: fixed by changes in pahole 1.28
 
> 
> Thanks,
> Stephen
> 
>> with gcc-14 the __pcpu_ptr variable has VSCOPE_OPTIMIZED scope, so it
>> won't
>> get into btf even without above pahole fix
>>
>> I suggest gcc/pahole upgrade ;-)
>>
>> thanks,
>> jirka


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ