lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20200907125843.GC12237@willie-the-truck>
Date:   Mon, 7 Sep 2020 13:58:44 +0100
From:   Will Deacon <will@...nel.org>
To:     Sebastian Andrzej Siewior <bigeasy@...utronix.de>
Cc:     peterz@...radead.org, linux-kernel@...r.kernel.org,
        io-uring@...r.kernel.org, Ingo Molnar <mingo@...hat.com>,
        Juri Lelli <juri.lelli@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
        Jens Axboe <axboe@...nel.dk>,
        Thomas Gleixner <tglx@...utronix.de>,
        Catalin Marinas <catalin.marinas@....com>
Subject: Re: [PATCH 1/2] sched: Bring the PF_IO_WORKER and PF_WQ_WORKER bits
 closer together

On Wed, Aug 19, 2020 at 09:55:05PM +0200, Sebastian Andrzej Siewior wrote:
> The bits PF_IO_WORKER and PF_WQ_WORKER are tested together in
> sched_submit_work() which is considered to be a hot path.
> If the two bits cross the 8 or 16 bit boundary then most architecture
> require multiple load instructions in order to create the constant
> value. Also, such a value can not be encoded within the compare opcode.
> 
> By moving the bit definition within the same block, the compiler can
> create/use one immediate value.
> 
> For some reason gcc-10 on ARM64 requires both bits to be next to each
> other in order to issue "tst reg, val; bne label". Otherwise the result
> is "mov reg1, val; tst reg, reg1; bne label".
> 
> Move PF_VCPU out of the way so that PF_IO_WORKER can be next to
> PF_WQ_WORKER.
> 
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
> ---
> 
> Could someone from the ARM64 camp please verify if this a gcc "bug" or
> opcode/arch limitation? With PF_IO_WORKER as 1 (without the PF_VCPU
> swap) I get for ARM:
> 
> | tst     r2, #33 @ task_flags,
> | beq     .L998           @,
> 
> however ARM64 does here:
> | mov     w0, 33  // tmp117,
> | tst     w19, w0 // task_flags, tmp117
> | bne     .L453           //,
> 
> the extra mov operation. Moving PF_IO_WORKER next to PF_WQ_WORKER as
> this patch gives me:
> | tst     w19, 48 // task_flags,
> | bne     .L453           //,

Moving an immediate into a register really shouldn't be a performance
issue, so I don't think this is a problem. However, the reason GCC does
this is because of the slightly weird way in which immediates are encoded,
meaning that '33' can't be packed into the 'tst' alias. You can try to
decipher the "DecodeBitMasks()" pseudocode in the Arm ARM if you're
interested.

Will

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ