linux-kernel - Re: selftests/x86/fsgsbase

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CALCETrUi7Ub2TbFy3Cvj+j4VXZeYULPY+mgL7OX7bz9L8GO9ew@mail.gmail.com>
Date:   Fri, 26 Jan 2018 10:59:36 -0800
From:   Andy Lutomirski <luto@...nel.org>
To:     Andy Lutomirski <luto@...nel.org>
Cc:     Dan Rue <dan.rue@...aro.org>, Shuah Khan <shuah@...nel.org>,
        Ingo Molnar <mingo@...nel.org>,
        Dmitry Safonov <dsafonov@...tuozzo.com>,
        Borislav Petkov <bp@...en8.de>,
        "open list:KERNEL SELFTEST FRAMEWORK" 
        <linux-kselftest@...r.kernel.org>,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: selftests/x86/fsgsbase_64 test problem

On Fri, Jan 26, 2018 at 8:22 AM, Andy Lutomirski <luto@...nel.org> wrote:
> On Fri, Jan 26, 2018 at 7:36 AM, Dan Rue <dan.rue@...aro.org> wrote:
>>
>> We've noticed that fsgsbase_64 can fail intermittently with the
>> following error:
>>
>>         [RUN]   ARCH_SET_GS(0x0) and clear gs, then schedule to 0x1
>>                 Before schedule, set selector to 0x1
>>                 other thread: ARCH_SET_GS(0x1) -- sel is 0x0
>>         [FAIL]  GS/BASE changed from 0x1/0x0 to 0x0/0x0
>>
>> This can be reliably reproduced by running fsgsbase_64 in a loop. i.e.
>>
>>     for i in $(seq 1 10000); do ./fsgsbase_64 || break; done
>>
>> This problem isn't new - I've reproduced it on latest mainline and every
>> release going back to v4.12 (I did not try earlier). This was tested on
>> a Supermicro board with a Xeon E3-1220 as well as an Intel Nuc with an
>> i3-5010U.
>>
>
> Hmm, I can reproduce it, too.  I'll look in a bit.

I'm triggering a different error, and I think what's going on is that
the kernel doesn't currently re-save GSBASE when a task switches out
and that task has save gsbase != 0 and in-register GS == 0.  This is
arguably a bug, but it's not an infoleak, and fixing it could be a wee
bit expensive.  I'm not sure what, if anything, to do about this.  I
suppose I could add some gross perf hackery to the test to detect this
case and suppress the error.

I can also trigger the problem you're seeing, and I don't know what's
up.  It may be related to and old problem I've seen that causes signal
delivery to sometimes corrupt %gs.  It's deterministic, but it depends
in some odd way on register state.  I can currently reproduce that
issue 100% of the time, and I'm trying to see if I can figure out
what's happening.