linux-kernel - Re: [RFC PATCH v1 2/3] Make calls to account_scaled

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20080527150244.GF5181@dirshya.in.ibm.com>
Date:	Tue, 27 May 2008 20:32:44 +0530
From:	Vaidyanathan Srinivasan <svaidy@...ux.vnet.ibm.com>
To:	Balbir Singh <balbir@...ux.vnet.ibm.com>
Cc:	Linux Kernel <linux-kernel@...r.kernel.org>,
	venkatesh.pallipadi@...el.com, suresh.b.siddha@...el.com,
	Michael Neuling <mikey@...ling.org>,
	"Amit K. Arora" <aarora@...ux.vnet.ibm.com>
Subject: Re: [RFC PATCH v1 2/3] Make calls to account_scaled_stats

* Balbir Singh <balbir@...ux.vnet.ibm.com> [2008-05-26 23:48:04]:

> Vaidyanathan Srinivasan wrote:
> > Hook various accounting functions to call scaled stats
> > 
> > * Hook porcess contect switch: __switch_to()
> > * Hook IRQ handling account_system_vtime() in hardirq.hA
> > * Update __delayacct_add_tsk() to take care of scaling by 1000
> > * Update bacct_add_tsk() to take care of scaling by 1000
> > 
> > Signed-off-by: Amit K. Arora <aarora@...ux.vnet.ibm.com>
> > Signed-off-by: Vaidyanathan Srinivasan <svaidy@...ux.vnet.ibm.com>
> > ---
> > 
> >  arch/x86/kernel/process_32.c |    8 ++++++++
> >  include/linux/hardirq.h      |    4 ++++
> >  kernel/delayacct.c           |    7 ++++++-
> >  kernel/timer.c               |    2 --
> >  kernel/tsacct.c              |   10 ++++++++--
> >  5 files changed, 26 insertions(+), 5 deletions(-)
> > 
> > diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
> > index f8476df..c81a783 100644
> > --- a/arch/x86/kernel/process_32.c
> > +++ b/arch/x86/kernel/process_32.c
> > @@ -56,6 +56,9 @@
> >  #include <asm/cpu.h>
> >  #include <asm/kdebug.h>
> > 
> > +extern void account_scaled_stats(struct task_struct *tsk);
> > +extern void reset_for_scaled_stats(struct task_struct *tsk);
> > +
> >  asmlinkage void ret_from_fork(void) __asm__("ret_from_fork");
> > 
> >  static int hlt_counter;
> > @@ -660,6 +663,11 @@ struct task_struct * __switch_to(struct task_struct *prev_p, struct task_struct
> >  		loadsegment(gs, next->gs);
> > 
> >  	x86_write_percpu(current_task, next_p);
> > +	/* Account scaled statistics for the task leaving CPU */
> > +	account_scaled_stats(prev_p);
> > +	barrier();
> > +	/* Initialise stats counter for new task */
> > +	reset_for_scaled_stats(next_p);
> > 
> 
> This is a bad place to hook into. Can't we do scaled accounting they way we do
> it for powerpc (SPURR)? I would rather hook into account_*time

Agreed and I have also documented it in the intro.  I will try to
remove them in the next iteration.

> 
> >  	return prev_p;
> >  }
> > diff --git a/include/linux/hardirq.h b/include/linux/hardirq.h
> > index 181006c..4458736 100644
> > --- a/include/linux/hardirq.h
> > +++ b/include/linux/hardirq.h
> > @@ -7,6 +7,9 @@
> >  #include <asm/hardirq.h>
> >  #include <asm/system.h>
> > 
> > +/* TBD: Add config option */
> > +extern void account_scaled_stats(struct task_struct *tsk);
> > +
> >  /*
> >   * We put the hardirq and softirq counter into the preemption
> >   * counter. The bitmask has the following meaning:
> > @@ -115,6 +118,7 @@ struct task_struct;
> >  #ifndef CONFIG_VIRT_CPU_ACCOUNTING
> >  static inline void account_system_vtime(struct task_struct *tsk)
> >  {
> > +	account_scaled_stats(tsk);
> >  }
> >  #endif
> > 
> > diff --git a/kernel/delayacct.c b/kernel/delayacct.c
> > index 10e43fd..3e2938f 100644
> > --- a/kernel/delayacct.c
> > +++ b/kernel/delayacct.c
> > @@ -117,7 +117,12 @@ int __delayacct_add_tsk(struct taskstats *d, struct task_struct *tsk)
> > 
> >  	tmp = (s64)d->cpu_scaled_run_real_total;
> >  	cputime_to_timespec(tsk->utimescaled + tsk->stimescaled, &ts);
> > -	tmp += timespec_to_ns(&ts);
> > +	/* HACK: Remember, we multipled the cputime_t by 1000 to include
> > +	 * fraction.  Now it is time to scale it back to correct 'ns' value.
> > +	 * Perhaps, we should use nano second unit (u64 type) for utimescaled 
> > +	 * and stimescaled?
> > +	 */
> > +	tmp += div_s64(timespec_to_ns(&ts),1000);
> 
> 1000 is a bit magical, please use a meaningful #define


Sure I can use a #define.... anyway this needs to be solved.
This is documented in the intro and explained in other email.
 
> >  	d->cpu_scaled_run_real_total =
> >  		(tmp < (s64)d->cpu_scaled_run_real_total) ? 0 : tmp;
> > 
> > diff --git a/kernel/timer.c b/kernel/timer.c
> > index ceacc66..de8a615 100644
> > --- a/kernel/timer.c
> > +++ b/kernel/timer.c
> > @@ -964,10 +964,8 @@ void account_process_tick(struct task_struct *p, int user_tick)
> > 
> >  	if (user_tick) {
> >  		account_user_time(p, one_jiffy);
> > -		account_user_time_scaled(p, cputime_to_scaled(one_jiffy));
> >  	} else {
> >  		account_system_time(p, HARDIRQ_OFFSET, one_jiffy);
> > -		account_system_time_scaled(p, cputime_to_scaled(one_jiffy));
> >  	}
> 
> Why couldn't we leverage these functions here?

We don't know the scaling ratio at this time.  The ratio is being
calculated at every context switch by sampling APERF/MPERF when the
process occupies the CPU and when it leave the CPU.

We should use these functions once I get centralised scaling ratio
computation done from cpufreq driver.  Out-of-band frequency variation
needs to be solved for the other method.  Current implementation will
do correct accounting even if the freq variation is out-of-band.

--Vaidy

> >  }
> >  #endif
> > diff --git a/kernel/tsacct.c b/kernel/tsacct.c
> > index 4ab1b58..ee0d93b 100644
> > --- a/kernel/tsacct.c
> > +++ b/kernel/tsacct.c
> > @@ -62,10 +62,16 @@ void bacct_add_tsk(struct taskstats *stats, struct task_struct *tsk)
> >  	rcu_read_unlock();
> >  	stats->ac_utime	 = cputime_to_msecs(tsk->utime) * USEC_PER_MSEC;
> >  	stats->ac_stime	 = cputime_to_msecs(tsk->stime) * USEC_PER_MSEC;
> > +	/* HACK: cputime unit is jiffies on x86 and not good for fractional 
> > +	 * additional.  cputime_t type {u,s}timescaled is multiplied by 
> > +	 * 1000 for scaled accounting.  Hence, cputime_to_msecs will actually 
> > +	 * give the required micro second value.  The multiplier 
> > +	 * USEC_PER_MSEC has been dropped.
> > +	 */
> >  	stats->ac_utimescaled =
> > -		cputime_to_msecs(tsk->utimescaled) * USEC_PER_MSEC;
> > +		cputime_to_msecs(tsk->utimescaled);
> >  	stats->ac_stimescaled =
> > -		cputime_to_msecs(tsk->stimescaled) * USEC_PER_MSEC;
> > +		cputime_to_msecs(tsk->stimescaled);
> >  	stats->ac_minflt = tsk->min_flt;
> >  	stats->ac_majflt = tsk->maj_flt;
> > 
> > 
> 
> 
> -- 
> 	Warm Regards,
> 	Balbir Singh
> 	Linux Technology Center
> 	IBM, ISTL
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/