lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 27 Sep 2013 23:20:15 +0800
From:	Jiang Liu <liuj97@...il.com>
To:	Will Deacon <will.deacon@....com>
CC:	Catalin Marinas <catalin.marinas@....com>,
	Jiang Liu <jiang.liu@...wei.com>,
	"linux-arm-kernel@...ts.infradead.org" 
	<linux-arm-kernel@...ts.infradead.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [RFT PATCH v1 0/7] enable FPSIMD lazy save and restore for arm64

On 09/27/2013 07:23 PM, Will Deacon wrote:
> On Fri, Sep 27, 2013 at 11:50:46AM +0100, Catalin Marinas wrote:
>> On Fri, Sep 27, 2013 at 09:04:40AM +0100, Jiang Liu wrote:
>>> From: Jiang Liu <jiang.liu@...wei.com>
>>>
>>> This patchset enables FPSIMD lazy save and restore for ARM64, you could
>>> apply it against v3.12-rc2.
>>>
>>> We have done basic functional tests on ARM fast model, but still lack
>>> of detail performance benchmark on real hardware platforms. We would
>>> appreciate if you could help to test it on really hardware platforms!
>>
>> That's my issue as well, I would like to see some benchmarks before
>> merging such patches.
> 
> Furthermore, with GCC's register allocator starting to use vector registers to
> optimise *integer* code instead of spilling to the stack, it's going to become
> more and more common to tasks to have live FP state at context switch. Lazy
> switching might simply introduce overhead in the form of additional trapping.
> 
> Will
> 
Hi Will,
	The patchset actually includes three optimizations.

The first one uses PF_USED_MATH to track whether the thread has
accessed FPSIMD registers since it has been created. If the thread
hasn't accessed FPSIMD registers since it's birth, we don't need to
save and restore FPSIMD context on thread context switching.

The second one uses a percpu variable to track the owner of the
FPSIMD hardware. When switching a thread, if it's the owner of
the FPSIMD hardware, we don't need to load FPSIMD registers again.
This is useful when context switching between user thread and
kernel(idle) threads.

The third one disable access to FPSIMD registers when switching a
thread. When the thread tries to access FPSIMD registers the first
time since it has been switched in, an exception is raised and then
we will load FPSIMD context onto hardware.

The overhead (penalty) of the first and second optimizations is
relatively small, so we could always enable them. The overhead
of the third one is relatively high and the optimization effect
depends on many factors, such as workload, glibc etc. So we
provide a kernel boot option "eagerfpu" to enable/disable the
third optimization.

So what's your thought about the first and second optimizations?
Should we always enable them? I do need to do some benchmark for
this, but still lack of hardware.

Thanks!
Gerry
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists