[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200214222046.bkafub6dbtapgter@google.com>
Date: Fri, 14 Feb 2020 14:20:46 -0800
From: Fangrui Song <maskray@...gle.com>
To: Arvind Sankar <nivedita@...m.mit.edu>
Cc: Nick Desaulniers <ndesaulniers@...gle.com>, jpoimboe@...hat.com,
peterz@...radead.org, clang-built-linux@...glegroups.com,
Nathan Chancellor <natechancellor@...il.com>,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH] objtool: ignore .L prefixed local symbols
On 2020-02-14, Arvind Sankar wrote:
>On Fri, Feb 14, 2020 at 10:05:27AM -0800, Fangrui Song wrote:
>> I know little about objtool, but if it may be used by other
>> architectures, hope the following explanations don't appear to be too
>> off-topic:)
>>
>> On 2020-02-14, Arvind Sankar wrote:
>> >Can you describe what case the clang change is supposed to optimize?
>> >AFAICT, it kicks in when the symbol is known by the compiler to be local
>> >to the DSO and defined in the same translation unit.
>> >
>> >But then there are two cases:
>> >(a) we have call foo, where foo is defined in the same section as the
>> >call instruction. In this case the assembler should be able to fully
>> >resolve foo and not generate any relocation, regardless of whether foo
>> >is global or local.
>>
>> If foo is STB_GLOBAL or STB_WEAK, the assembler cannot fully resolve a
>> reference to foo in the same section, unless the assembler can assume
>> (the codegen tells it) the call to foo cannot be interposed by another
>> foo definition at runtime.
>
>I was testing with hidden/protected visibility, I see you want this for
>the no-semantic-interposition case. Actually a bit more testing shows
>some peculiarities even with hidden visibility. With the below, the call
>and lea create relocations in the object file, but the jmp doesn't. ld
>does avoid creating a plt for this though.
>
> .text
> .globl foo, bar
> .hidden foo
> bar:
> call foo
> leaq foo(%rip), %rax
> jmp foo
>
> foo: ret
Yes, GNU as is inconsistent here. While fixing
https://sourceware.org/ml/binutils/2020-02/msg00243.html , I noticed
that the rule is quite complex. There are definitely lots of places to
improve. clang 10 emits relocations consistently.
call foo # R_X86_64_PLT32
leaq foo(%rip), %rax # R_X86_64_PC32
jmp foo # R_X86_64_PLT32
We can teach the assembler to not emit relocations referencing STV_HIDDEN or
STV_INTERNAL symbols, but I favor the simpler rule that only relocations
referencing STB_LOCAL non-STT_GNU_IFUNC symbols defined in the same section are resolved.
Leave the visibility jobs to the linker.
If we ever teach GNU objcopy or llvm-objcopt an option to set
visibility, resolving relocations may disallow such use cases.
Unfortunately gcc>=5 x86 and GNU ld>=2.26 x86 are in a bad status
regarding STV_PROTECTED (https://reviews.llvm.org/D72197#1866384).
(Now I retest it, I think I may add a special -no-integrated-as rule to
clang just to work around GNU ld x86>=2.26.)
>> >(b) we have call foo, where foo is defined in a different section from
>> >the call instruction. In this case the assembler must generate a
>> >relocation regardless of whether foo is global or local, and the linker
>> >should eliminate it.
>> >In what case does does replacing call foo with call .Lfoo$local help?
>>
>> For -fPIC -fno-semantic-interposition, the assembly emitter can perform
>> the following optimization:
>>
>> void foo() {}
>> void bar() { foo(); }
>>
>> .globl foo, bar
>> foo:
>> .Lfoo$local:
>> ret
>> bar:
>> call foo --> call .Lfoo$local
>> ret
>>
>> call foo generates an R_X86_64_PLT32. In a -shared link, it creates an
>> unneeded PLT entry for foo.
>>
>> call .Lfoo$local generates an R_X86_64_PLT32. In a -shared link, .Lfoo$local is
>> non-preemptible => no PLT entry is created.
>>
>> For -fno-PIC and -fPIE, the final link is expected to be -no-pie or
>> -pie. This optimization does not save anything, because PLT entries will
>> not be generated. With clang's integrated assembler, it may increase the
>> number of STT_SECTION symbols (because .Lfoo$local will be turned to a
>> STT_SECTION relative relocation), but the size increase is very small.
>>
>>
>> I want to teach clang -fPIC to use -fno-semantic-interposition by
>> default. (It is currently an LLVM optimization, not realized in clang.)
>> clang traditionally makes various -fno-semantic-interposition
>> assumptions and can perform interprocedural optimizations even if the
>> strict ELF rule disallows them.
>
>FWIW, gcc with no-semantic-interposition also uses local aliases, but
>rather than using .L labels, it creates a local alias by
> .set foo.localalias, foo
>This makes the type of foo.localalias the same as foo, which I gather
>should placate objtool as it'll still see an STT_FUNC no matter whether
>it picks up foo.localalias or foo.
The GCC approach costs more bytes. foo.localalias is not prefixed by .L,
thus it wastes sizeof(Elf*_Sym) bytes for each such function.
5: 0000000000401000 7 FUNC LOCAL DEFAULT 1 foo.localalias
Call/jump relocations on ARM and MIPS treat STT_FUNC differently.
If eventually we use the clang optimization for ARM and MIPS, we
probably should consider changing `.Lfoo$local:` to `.set .Lfoo$local, foo`
The assembler is quite complex. I need to investigate more into LLVM MC.
R_ARM_CALL/R_ARM_THM_CALL can be used against STT_NOTYPE symbols.
That disables interwork thunks (https://reviews.llvm.org/D73542).
If objtool is used by ARM and such disabling semantic is ever needed,
the rule should be loosened to allow STT_NOTYPE.
Powered by blists - more mailing lists