linux-kernel - Re: [PATCH] x86: Optimize variable_test

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5543FCB6.8060003@redhat.com>
Date:	Fri, 01 May 2015 18:22:46 -0400
From:	Vladimir Makarov <vmakarov@...hat.com>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
CC:	Jakub Jelinek <jakub@...hat.com>,
	Richard Henderson <rth@...hat.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Ingo Molnar <mingo@...nel.org>,
	"H. Peter Anvin" <hpa@...or.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Borislav Petkov <bp@...en8.de>
Subject: Re: [PATCH] x86: Optimize variable_test_bit()



On 01/05/15 04:49 PM, Linus Torvalds wrote:
> On Fri, May 1, 2015 at 12:02 PM, Vladimir Makarov <vmakarov@...hat.com> wrote:
>>    GCC RA is a major reason to prohibit output operands for asm goto.
> Hmm.. Thinking some more about it, I think that what would actually
> work really well at least for the kernel is:
>
> (a) allow *memory* operands (ie "=m") as outputs and having them be
> meaningful even at any output labels (obviously with the caveat that
> the asm instructions that write to memory would have to happen before
> the branch ;)
>
> This covers the somewhat common case of having magic instructions that
> result in conditions that can't be tested at a C level. Things like
> "bit clear and test" on x86 (with or without the lock) .
>
>   (b) allow other operands to be meaningful onlty for the fallthrough case.
>
>  From a register allocation standpoint, these should be the easy cases.
> (a) doesn't need any register allocation of the output (only on the
> input to set up the effective address of the memory location), and (b)
> would explicitly mean that an "asm goto" would leave any non-memory
> outputs undefined in any of the goto cases, so from a RA standpoint it
> ends up being equivalent to a non-goto asm..
Thanks for explanation what you need in the most common case.

Big part of GCC RA (at least local register allocators -- reload pass 
and LRA) besides assigning hard registers to pseudos is to make 
transformations to satisfy insn constraints.  If there is not enough 
hard registers, a pseudo can be allocated to a stack slot and if insn 
using the pseudo needs a hard register, load or/and store should be 
generated before/after the insn.  And the problem for the old (reload 
pass) and new RA (LRA) is that they were not designed to put new insns 
after an insn changing control flow.  Assigning hard registers itself is 
not an issue for asm goto case.

If I understood you correctly, you assume that just permitting =m will 
make GCC generates the correct code. Unfortunately, it is more 
complicated.  The operand can be not a memory or memory not satisfying 
memory constraint 'm'.  So still insns for moving memory satisfying 'm' 
into output operand location might be necessary after the asm goto.

We could make asm goto semantics requiring that a user should provide 
memory for such output operand (e.g. a pointer dereferrencing in your 
case) and generate an error otherwise.  By the way the same could be 
done for output *register* operand.  And user to avoid the error should 
use a local register variable (a GCC extension) as an operand. But it 
might be a bad idea with code performance point of view.

Unfortunately, the operand can be substituted by an equiv. value during 
different transformations and even if an user think it will be a memory 
before RA, it might be wrong.  Although I believe there are some cases 
where we can be sure that it will be memory (e.g. dereferrencing pointer 
which is a function argument and is not used anywhere else in 
function).  Still it makes asm goto semantics complicated imho.

We could prevent equiv. substitution for output memory operand of asm 
goto through all the optimizations but it is probably even harder task 
than implementing output reloads in *reload* pass (it is 28-year old 
pass with so many changes during its life that practically nobody can 
understand it now well and change w/o introducing a new bug).  As for 
LRA, I wrote implementing output reloads is a double task.

> Hmm?
>
> So as an example of something that the kernel does and which wants to
> have an output register. is to do a load from user space that can
> fault. When it faults, we obviously simply don't *have* an actual
> result, and we return an error. But for the successful fallthrough
> case, we get a value in a register.
>
> I'd love to be able to write it as (this is simplified, and doesn't
> worry about all the different access sizes, or the "stac/clac"
> sequence to enable user accesses on modern Intel CPU's):
>
>          asm goto(
>              "1:"
>              "\tmovl %0,%1\n"
>              _ASM_EXTABLE(1b,%l[error])
>              : "=r" (val)
>              : "m" (*userptr)
>              : : error);
>
> where that "_ASM_EXTABLE()" is our magic macro for generating an
> exception entry for that instruction, so that if the load takes an
> exception, it will instead to to the "error" label.
>
> But if it goes to the error label, the "val" output register really
> doesn't contain anything, so we wouldn't even *want* gcc to try to do
> any register allocation for the "jump to label from assembly" case.
>
> So at least for one of the major cases that I'd like to use "asm goto"
> with an output, I actually don't *want* any register allocation for
> anything but the fallthrough case. And I suspect that's a
> not-too-uncommon pattern - it's probably often about error handling.
>
>
As I wrote already if we implement output reloads after the control flow 
insn, it does not matter what operand constraint should be (memory or 
register).  Implementing it only for fall-through case simplify the task 
but not so much.  For LRA it is doable and I can do this, for reload 
pass it is very hard (requirement only memory operand can simplify the 
implementation in reload although I am not sure about it).

But may be somebody will agree to do it for reload, sorry only not me -- 
i can not think about this without flinching.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/