lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f08678b1-260f-4200-889b-a4ec016fc7e1@suse.com>
Date: Thu, 1 Aug 2024 13:22:21 +0200
From: Petr Pavlu <petr.pavlu@...e.com>
To: Sami Tolvanen <samitolvanen@...gle.com>
Cc: Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
 Masahiro Yamada <masahiroy@...nel.org>, Luis Chamberlain
 <mcgrof@...nel.org>, Miguel Ojeda <ojeda@...nel.org>,
 Matthew Maurer <mmaurer@...gle.com>, Alex Gaynor <alex.gaynor@...il.com>,
 Wedson Almeida Filho <wedsonaf@...il.com>, Gary Guo <gary@...yguo.net>,
 linux-kbuild@...r.kernel.org, linux-kernel@...r.kernel.org,
 linux-modules@...r.kernel.org, rust-for-linux@...r.kernel.org
Subject: Re: [PATCH 00/15] Implement MODVERSIONS for Rust

On 7/26/24 23:05, Sami Tolvanen wrote:
> On Mon, Jul 22, 2024 at 8:20 AM Petr Pavlu <petr.pavlu@...e.com> wrote:
>>
>> From my perspective, I'm okay if gendwarfksyms doesn't provide
>> functionality to compare a new object file with its reference symtypes
>> file.
>>
>> As mentioned, genksyms has this functionality but I actually think the
>> way it works is not ideal. Its design is to operate on one compilation
>> unit at the time. This has the advantage that a comparison of each file
>> is performed in parallel during the build, simply because of the make
>> job system. On the other hand, it has two problems.
>>
>> The first one is that genksyms doesn't provide a comparison of the
>> kernel as a whole. This means that the tool gives rather scattered and
>> duplicated output about changed structs in the build log. Ideally, one
>> would like to see a single compact report about what changed at the end
>> of the build.
> 
> Sure, that makes sense. Android uses STG for this, which might be
> useful to other folks too:
> 
> https://android.googlesource.com/platform/external/stg/
> https://android.googlesource.com/platform/external/stg/+/refs/heads/main/doc/stgdiff.md#output-formats

STG is an interesting tool. I've played with it a bit last year. To be
frank, I was surprised to see a new tool being proposed by Google to
generate modversion CRCs from DWARF instead of potentially extending
your STG project for this purpose. I'm not sure if it is something that
you folks have considered and evaluated.

>> A few months ago, I also started working on a tool inspired by this
>> script. The goal is to have similar functionality but hopefully with
>> a much faster implementation. Hence, this tool is written in a compiled
>> language (Rust at the moment) and should also become multi-threaded. I'm
>> hoping to find some time to make progress on it and make the code
>> public. It could later be added to the upstream kernel to replace the
>> comparison functionality implemented by genksyms, if there is interest.
>>
>> So as mentioned, I'm fine if gendwarfksyms doesn't have this
>> functionality. However, for distributions that rely on the symtypes
>> format, I'd be interested in having gendwarfksyms output its dump data
>> in this format as well.
> 
> We can definitely tweak the output format, but I'm not sure if making
> it fully compatible with the genksyms symtypes format is feasible,
> especially for Rust code. I also intentionally decided to use DWARF
> tag names in the output instead of shorthands like s# etc. to make it
> a bit more readable.

Sure, it might be necessary to extend the symtypes format a bit, for
example, by allowing spaces in type names. What other problems do you
see?

The example I showed preserves the DWARF tag names in type descriptions.
Cross-references and the target type names use the s# prefix as they
they need to be distinguished from other tokens.

>> For example, instead of producing:
>>
>> gendwarfksyms: process_exported_symbols: _some_mangled_func_name (@ XYZ)
>> subprogram(
>>    [formal parameters...]
>> )
>> -> structure_type core::result::Result<(), core::fmt::Error> {
>>    [a description of the structure...]
>> };
>>
>> .. the output could be something like this:
>>
>> S#'core::result::Result<(), core::fmt::Error>' structure_type core::result::Result<(), core::fmt::Error> { [a description of the structure...] }
>> _some_mangled_func_name subprogram _some_mangled_func_name ( [formal parameters...] ) -> S#'core::result::Result<(), core::fmt::Error>'
> 
> This wouldn't be enough to make the output format compatible with
> symtypes though. genksyms basically produces a simple key-value pair
> database while gendwarfksyms currently outputs the fully expanded type
> string for each symbol. If you need the tool to produce a type
> database, it might also be worth discussing if we should use a bit
> less ad hoc format in that case.

What I think is needed is the ability to compare an updated kernel with
some previous reference and have an output that clearly and accurately
shows why CRCs of some symbols changed. The previous reference should be
possible to store in Git together with the kernel source. It means it
should be ideally some text format and limited in size. This is what
distributions that care about stable kABI do in some form currently.

This functionality would be needed if some distribution wants to
maintain stable Rust kABI (not sure if it is actually feasible), or if
the idea is for gendwarfksyms to be a general tool that could replace
genksyms. I assume for the sake of argument that this is the case.

Gendwarfksyms could implement this functionality on its own, or as
discussed, I believe it could provide a symtypes-like dump and a second
tool could be used to work with this format and for comparing it.

>From my point of view, the current --debug format is not suitable for
this purpose because its expanded and unstructured form means it is
bloated and hard to compare with a previous reference.

I'm also not quite yet sold on using separate DWARF tooling, such as
libabigail or STG, to actually understand why gendwarfksyms produced
a different CRC for some symbol. Using these tools makes sense in the
genksyms world, where genksyms operates on the source code level and
this additional tooling can only work on debug data.

With gendwarfksyms working directly with DWARF data, it doesn't seem
appealing to me to first run gendwarfksyms to produce CRCs, compare them
with their reference, and if they are different, use a second tool to
process the same DWARF data again and with some luck hopefully get an
actual answer why the CRCs changed. I'm worried that users might
encounter inaccurate answers if the two tools interpret the input data
differently.

> 
> One more thing to note about the current --debug output is that it
> directly correlates with the debugging information and thus may not
> contain all aliases. For example, the Rust compiler deduplicates
> identical function implementations (e.g. Deref::deref and
> DerefMut::deref_mut etc.), but only one of the symbol names appears in
> DWARF. We use symbol addresses to print out #SYMVERs also for the
> aliases, but they don't show up in the debugging output right now.

Thanks,
Petr

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ