lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 21 May 2015 21:26:50 +0800
From:	Huang Rui <ray.huang@....com>
To:	Borislav Petkov <bp@...e.de>
CC:	Len Brown <lenb@...nel.org>,
	"Rafael J. Wysocki" <rjw@...ysocki.net>,
	Thomas Gleixner <tglx@...utronix.de>, <x86@...nel.org>,
	<linux-kernel@...r.kernel.org>,
	Fengguang Wu <fengguang.wu@...el.com>,
	"Aaron Lu" <aaron.lu@...el.com>, Tony Li <tony.li@....com>
Subject: Re: [RFC PATCH 2/4] x86, mwaitt: introduce mwaitx idle with a
 configurable timer

On Tue, May 19, 2015 at 01:31:21PM +0200, Borislav Petkov wrote:
> On Tue, May 19, 2015 at 04:01:10PM +0800, Huang Rui wrote:
> > MWAITX/MWAIT does not let the cpu core go into C1 state on AMD processors.
> > The cpu core still consumes less power while waiting, and has faster exit
> > from waiting than "Halt". This patch implements an interface using the
> > kernel parameter "idle=" to configure mwaitx type and timer value.
> > 
> > If "idle=mwaitx", the timeout will be set as the maximum value
> > ((2^64 - 1) * TSC cycle).
> > If "idle=mwaitx,100", the timeout will be set as 100ns.
> > If the processor doesn't support MWAITX, then halt is used.
> 
> Ok, I see what you're trying here and I think this is not the optimal
> approach.
> 
> So let me explain how I see it, you correct me if I'm wrong:
> 
> So we want to do MWAITX so that we can save us idle entry/exit overhead
> with HLT. Because MWAITX is faster, reportedly.
> 
> Now, if we want to do that, we want to do it dynamically and adjust the
> MWAITX sleep interval depending on the system, usage pattern, system
> load and so on.

Yes, that's right. Maybe, even we can also find any other use case on
kernel other components.

> 
> And for that we would need an adaptive scheme which approximates each
> idle interval. Simply taking TSC before we enter idle and after we come
> out would give us each idle residency duration and we can do some simple
> math to approximate it.
> 
> Now, what would that bring us: faster wakeup times.
> 
> And here comes the 10^6 $ question: why are we doing all the fun?
> 

Do you mention the below codes:

/*
 * TSC loops (EBX input) = Timer(nsec) *
 * TSC freq(khz) / 1000000
 */
timeout = timeout * tsc_freq;
do_div(timeout, 1000000);

That's because the unit of tsc_freq is KHz and the unit of timeout is
nanosecond. Then I do div 10^6 to figure out the corresponding loops.

> I'm thinking we want to find a cutoff duration where for smaller
> durations it is worth to do MWAITX and have faster entry/exit times and
> for bigger durations we want to do HLT because it'll get into C1E and
> give us higher power savings.
> 
> We don't want to do MWAITX too long because that'll burn more power
> relatively to HLT but we don't want to do HLT for shorter periods
> because then entry/exit costs.
> 
> Am I on the right track at least?

Yes, that's totally right.

> 
> > Signed-off-by: Huang Rui <ray.huang@....com>
> > ---
> >  arch/x86/include/asm/mwait.h     |  2 +
> >  arch/x86/include/asm/processor.h |  2 +-
> >  arch/x86/kernel/process.c        | 79 ++++++++++++++++++++++++++++++++++++++++
> >  3 files changed, 82 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/x86/include/asm/mwait.h b/arch/x86/include/asm/mwait.h
> > index b91136f..c4e51e7 100644
> > --- a/arch/x86/include/asm/mwait.h
> > +++ b/arch/x86/include/asm/mwait.h
> > @@ -14,6 +14,8 @@
> >  #define CPUID5_ECX_INTERRUPT_BREAK	0x2
> >  
> >  #define MWAIT_ECX_INTERRUPT_BREAK	0x1
> > +#define MWAITX_ECX_TIMER_ENABLE		0x2
> 
> 						Use BIT(1) here.

OK.

> 
> > +#define MWAITX_EBX_WAIT_TIMEOUT		0xffffffff
> >  
> >  static inline void __monitor(const void *eax, unsigned long ecx,
> >  			     unsigned long edx)
> > diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
> > index 23ba676..0f60e94 100644
> > --- a/arch/x86/include/asm/processor.h
> > +++ b/arch/x86/include/asm/processor.h
> > @@ -733,7 +733,7 @@ extern unsigned long		boot_option_idle_override;
> >  extern bool			amd_e400_c1e_detected;
> >  
> >  enum idle_boot_override {IDLE_NO_OVERRIDE=0, IDLE_HALT, IDLE_NOMWAIT,
> > -			 IDLE_POLL};
> > +			 IDLE_POLL, IDLE_MWAITX};
> >  
> >  extern void enable_sep_cpu(void);
> >  extern int sysenter_setup(void);
> > diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
> > index 6e338e3..9d68193 100644
> > --- a/arch/x86/kernel/process.c
> > +++ b/arch/x86/kernel/process.c
> > @@ -30,6 +30,7 @@
> >  #include <asm/debugreg.h>
> >  #include <asm/nmi.h>
> >  #include <asm/tlbflush.h>
> > +#include <asm/x86_init.h>
> >  
> >  /*
> >   * per-CPU TSS segments. Thre ads are completely 'soft' on Linux,
> > @@ -276,6 +277,7 @@ unsigned long boot_option_idle_override = IDLE_NO_OVERRIDE;
> >  EXPORT_SYMBOL(boot_option_idle_override);
> >  
> >  static void (*x86_idle)(void);
> > +static unsigned long idle_param;
> >  
> >  #ifndef CONFIG_SMP
> >  static inline void play_dead(void)
> > @@ -444,6 +446,17 @@ static int prefer_mwait_c1_over_halt(const struct cpuinfo_x86 *c)
> >  	return 1;
> >  }
> >  
> > +static int not_support_mwaitx(const struct cpuinfo_x86 *c)
> > +{
> > +	if (c->x86_vendor != X86_VENDOR_AMD)
> > +		return 1;
> > +
> > +	if (!cpu_has(c, X86_FEATURE_MWAITT))
> > +		return 1;
> > +
> > +	return 0;
> > +}
> > +
> >  /*
> >   * MONITOR/MWAIT with no hints, used for default default C1 state.
> >   * This invokes MWAIT with interrutps enabled and no flags,
> > @@ -470,12 +483,45 @@ static void mwait_idle(void)
> >  	__current_clr_polling();
> >  }
> >  
> > +/*
> > + * AMD Excavator processors support the new MONITORX/MWAITX instructions.
> 
> No need for that especially when newer than XV processors start
> supporting those too.
> 

OK, got it.

Thanks,
Rui
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ