[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALCETrXWE8Z206q_5RbGWW+gG8qZiXnGFoG5PjHiZksSmQVG8Q@mail.gmail.com>
Date: Fri, 16 Jun 2017 09:38:18 -0700
From: Andy Lutomirski <luto@...nel.org>
To: "H.J. Lu" <hjl.tools@...il.com>
Cc: Andy Lutomirski <luto@...nel.org>,
Dave Hansen <dave.hansen@...el.com>,
"Robert O'Callahan" <robert@...llahan.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
X86 ML <x86@...nel.org>
Subject: Re: xgetbv nondeterminism
On Fri, Jun 16, 2017 at 9:17 AM, H.J. Lu <hjl.tools@...il.com> wrote:
> On Fri, Jun 16, 2017 at 9:01 AM, Andy Lutomirski <luto@...nel.org> wrote:
>> On Thu, Jun 15, 2017 at 9:34 PM, H.J. Lu <hjl.tools@...il.com> wrote:
>>> On Thu, Jun 15, 2017 at 8:05 PM, Andy Lutomirski <luto@...nel.org> wrote:
>>>> On Thu, Jun 15, 2017 at 7:17 PM, H.J. Lu <hjl.tools@...il.com> wrote:
>>>>> On Thu, Jun 15, 2017 at 4:28 PM, Andy Lutomirski <luto@...nel.org> wrote:
>>>>>> On Thu, Jun 15, 2017 at 4:11 PM, H.J. Lu <hjl.tools@...il.com> wrote:
>>>>>>> It is used for lazy binding the first time when an external function is called.
>>>>>>>
>>>>>>
>>>>>> Maybe I'm just being dense, but why? What does ld.so need to do to
>>>>>> resolve a symbol and update the GOT that requires using extended
>>>>>> state?
>>>>>
>>>>> Since the first 8 vector registers are used to pass function parameters
>>>>> and ld.so uses vector registers, _dl_runtime_resolve needs to preserve
>>>>> the first 8 vector registers when transferring control to ld.so.
>>>>>
>>>>
>>>> Wouldn't it be faster and more future-proof to recompile the relevant
>>>> parts of ld.so to avoid using extended state?
>>>>
>>>
>>> Are you suggesting not to use vector in ld.so?
>>
>> Yes, exactly.
>>
>>> We used to do that
>>> several years ago, which leads to some subtle bugs, like
>>>
>>> https://sourceware.org/bugzilla/show_bug.cgi?id=15128
>>
>> I don't think x86_64 has the issue that ARM has there. The Linux
>> kernel, for example, has always been compiled to not use vector or
>> floating point registers on x86 (32 and 64), and it works fine. Linux
>> doesn't save extended regs on kernel entry and it doesn't restore them
>> on exit.
>>
>> I would suggest that ld.so be compiled without use of vector
>> registers, that the normal lazy binding path not try to save any extra
>> regs, and that ifuncs be called through a thunk that saves whatever
>> registers need saving, possibly just using XSAVEOPT. After all, ifunc
>> is used for only a tiny fraction of symbols.
>
> x86-64 was the only target which used FOREIGN_CALL macros
> in ld.so, FOREIGN_CALL macros were the cause of race condition
> in ld.so:
>
> https://sourceware.org/bugzilla/show_bug.cgi?id=11214
>
> Not to save and restore the first 8 vector registers means that
> FOREIGN_CALL macros have to be used. We don't want to
> do that on x86-64.
>
>
You're talking about this, right:
commit f3dcae82d54e5097e18e1d6ef4ff55c2ea4e621e
Author: H.J. Lu <hjl.tools@...il.com>
Date: Tue Aug 25 04:33:54 2015 -0700
Save and restore vector registers in x86-64 ld.so
It seems to me that the problem wasn't that the save/restore happened
on some of the time -- it was that the save and restore code used a
TLS variable to track its own state. Shouldn't it have been a stack
variable or even just implicit in the control flow?
In any case, glibc is effectively doing a foreign call anyway, right?
It's doing the foreign call to itself on every lazy binding
resolution, though, which seems quite expensive. I'm saying that it
seems like it would be more sensible to do the complicated foreign
call logic only when doing the dangerous case, which is when lazy
binding calls an ifunc.
If I were to rewrite this, I would do it like this:
void *call_runtime_ifunc(void (*ifunc)()); // or whatever the
signature needs to be
call_runtime_ifunc would be implemented in asm (or maybe even C!) and
would use XSAVEOPT or similar to save the state to a buffer on the
stack. Then it would call the ifunc and restore the state. No TLS
needed, so there wouldn't be any races. In fact, it would work very
much like your current save/restore code, except that it wouldn't need
to be as highly optimized because it would be called much less
frequently. This should improve performance and could be quite a bit
simpler.
As an aside, why is saving the first eight registers enough? I don't
think there's any particular guarantee that a call through the GOT
uses the psABI, is there? Compilers can and do produce custom calling
conventions, and ISTM that some day a compiler might do that between
DSOs. Or those DSOs might not be written in C in the first place.
Powered by blists - more mailing lists