[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200214222046.bkafub6dbtapgter@google.com>
Date:   Fri, 14 Feb 2020 14:20:46 -0800
From:   Fangrui Song <maskray@...gle.com>
To:     Arvind Sankar <nivedita@...m.mit.edu>
Cc:     Nick Desaulniers <ndesaulniers@...gle.com>, jpoimboe@...hat.com,
        peterz@...radead.org, clang-built-linux@...glegroups.com,
        Nathan Chancellor <natechancellor@...il.com>,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH] objtool: ignore .L prefixed local symbols
On 2020-02-14, Arvind Sankar wrote:
>On Fri, Feb 14, 2020 at 10:05:27AM -0800, Fangrui Song wrote:
>> I know little about objtool, but if it may be used by other
>> architectures, hope the following explanations don't appear to be too
>> off-topic:)
>>
>> On 2020-02-14, Arvind Sankar wrote:
>> >Can you describe what case the clang change is supposed to optimize?
>> >AFAICT, it kicks in when the symbol is known by the compiler to be local
>> >to the DSO and defined in the same translation unit.
>> >
>> >But then there are two cases:
>> >(a) we have call foo, where foo is defined in the same section as the
>> >call instruction. In this case the assembler should be able to fully
>> >resolve foo and not generate any relocation, regardless of whether foo
>> >is global or local.
>>
>> If foo is STB_GLOBAL or STB_WEAK, the assembler cannot fully resolve a
>> reference to foo in the same section, unless the assembler can assume
>> (the codegen tells it) the call to foo cannot be interposed by another
>> foo definition at runtime.
>
>I was testing with hidden/protected visibility, I see you want this for
>the no-semantic-interposition case. Actually a bit more testing shows
>some peculiarities even with hidden visibility. With the below, the call
>and lea create relocations in the object file, but the jmp doesn't. ld
>does avoid creating a plt for this though.
>
>	.text
>	.globl foo, bar
>	.hidden foo
>	bar:
>		call	foo
>		leaq	foo(%rip), %rax
>		jmp	foo
>
>	foo:	ret
Yes, GNU as is inconsistent here.  While fixing
https://sourceware.org/ml/binutils/2020-02/msg00243.html , I noticed
that the rule is quite complex. There are definitely lots of places to
improve.  clang 10 emits relocations consistently.
   call	foo              # R_X86_64_PLT32
   leaq	foo(%rip), %rax  # R_X86_64_PC32
   jmp	foo              # R_X86_64_PLT32
We can teach the assembler to not emit relocations referencing STV_HIDDEN or
STV_INTERNAL symbols, but I favor the simpler rule that only relocations
referencing STB_LOCAL non-STT_GNU_IFUNC symbols defined in the same section are resolved.
Leave the visibility jobs to the linker.
If we ever teach GNU objcopy or llvm-objcopt an option to set
visibility, resolving relocations may disallow such use cases.
Unfortunately gcc>=5 x86 and GNU ld>=2.26 x86 are in a bad status
regarding STV_PROTECTED (https://reviews.llvm.org/D72197#1866384).
(Now I retest it, I think I may add a special -no-integrated-as rule to
clang just to work around GNU ld x86>=2.26.)
>> >(b) we have call foo, where foo is defined in a different section from
>> >the call instruction. In this case the assembler must generate a
>> >relocation regardless of whether foo is global or local, and the linker
>> >should eliminate it.
>> >In what case does does replacing call foo with call .Lfoo$local help?
>>
>> For -fPIC -fno-semantic-interposition, the assembly emitter can perform
>> the following optimization:
>>
>>    void foo() {}
>>    void bar() { foo(); }
>>
>>    .globl foo, bar
>>    foo:
>>    .Lfoo$local:
>>      ret
>>    bar:
>>      call foo  --> call .Lfoo$local
>>      ret
>>
>> call foo generates an R_X86_64_PLT32. In a -shared link, it creates an
>> unneeded PLT entry for foo.
>>
>> call .Lfoo$local generates an R_X86_64_PLT32. In a -shared link, .Lfoo$local is
>> non-preemptible => no PLT entry is created.
>>
>> For -fno-PIC and -fPIE, the final link is expected to be -no-pie or
>> -pie. This optimization does not save anything, because PLT entries will
>> not be generated. With clang's integrated assembler, it may increase the
>> number of STT_SECTION symbols (because .Lfoo$local will be turned to a
>> STT_SECTION relative relocation), but the size increase is very small.
>>
>>
>> I want to teach clang -fPIC to use -fno-semantic-interposition by
>> default. (It is currently an LLVM optimization, not realized in clang.)
>> clang traditionally makes various -fno-semantic-interposition
>> assumptions and can perform interprocedural optimizations even if the
>> strict ELF rule disallows them.
>
>FWIW, gcc with no-semantic-interposition also uses local aliases, but
>rather than using .L labels, it creates a local alias by
>	.set foo.localalias, foo
>This makes the type of foo.localalias the same as foo, which I gather
>should placate objtool as it'll still see an STT_FUNC no matter whether
>it picks up foo.localalias or foo.
The GCC approach costs more bytes. foo.localalias is not prefixed by .L,
thus it wastes sizeof(Elf*_Sym) bytes for each such function.
      5: 0000000000401000     7 FUNC    LOCAL  DEFAULT    1 foo.localalias
Call/jump relocations on ARM and MIPS treat STT_FUNC differently.
If eventually we use the clang optimization for ARM and MIPS, we
probably should consider changing `.Lfoo$local:` to `.set .Lfoo$local, foo`
The assembler is quite complex. I need to investigate more into LLVM MC.
R_ARM_CALL/R_ARM_THM_CALL can be used against STT_NOTYPE symbols.
That disables interwork thunks (https://reviews.llvm.org/D73542).
If objtool is used by ARM and such disabling semantic is ever needed,
the rule should be loosened to allow STT_NOTYPE.
Powered by blists - more mailing lists
 
