lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20161122134550.GA21436@lerouge>
Date:   Tue, 22 Nov 2016 14:45:56 +0100
From:   Frederic Weisbecker <fweisbec@...il.com>
To:     Martin Schwidefsky <schwidefsky@...ibm.com>
Cc:     LKML <linux-kernel@...r.kernel.org>,
        Tony Luck <tony.luck@...el.com>,
        Wanpeng Li <wanpeng.li@...mail.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Michael Ellerman <mpe@...erman.id.au>,
        Heiko Carstens <heiko.carstens@...ibm.com>,
        Benjamin Herrenschmidt <benh@...nel.crashing.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Paul Mackerras <paulus@...ba.org>,
        Ingo Molnar <mingo@...nel.org>,
        Fenghua Yu <fenghua.yu@...el.com>,
        Rik van Riel <riel@...hat.com>,
        Stanislaw Gruszka <sgruszka@...hat.com>
Subject: Re: [PATCH 00/36] cputime: Convert core use of cputime_t to nsecs

On Mon, Nov 21, 2016 at 11:17:28AM +0100, Martin Schwidefsky wrote:
> On Mon, 21 Nov 2016 07:59:56 +0100
> Martin Schwidefsky <schwidefsky@...ibm.com> wrote:
> 
> > On Fri, 18 Nov 2016 15:47:02 +0100
> > Frederic Weisbecker <fweisbec@...il.com> wrote:
> > 
> > > On Fri, Nov 18, 2016 at 01:08:46PM +0100, Martin Schwidefsky wrote:
> > > > On Thu, 17 Nov 2016 19:08:07 +0100
> > > > Frederic Weisbecker <fweisbec@...il.com> wrote:
> > > > 
> > > > Now it has been proposed to implement lazy accounting to accumulate deltas
> > > > and do the expensive conversions only infrequently. This is pretty straight-
> > > > forward for account_user_time but to do this for the account_system_time
> > > > function is more complicated. The function has to differentiate between
> > > > guest/hardirq/softirq and pure system time. We would need to keep sums for
> > > > each bucket and provide a separate function to add to each bucket. Like
> > > > account_guest_time(), account_hardirq_time(), account_softirq_time() and
> > > > account_system_time(). Then it is up to the arch code to sort out the details
> > > > and call the accounting code once per jiffy for each of the buckets.
> > > 
> > > That wouldn't be too hard really. The s390 code in vtime.c already does that.
> > 
> > Yes, I agree that the accumulating change would not be too hard. Can I make the
> > request that we try to get that done first before doing the cleanup ?
> 
> Played with the idea a bit, here is a prototype patch to do the delay system time
> accounting for s390. It applies against the latest s390 features tree which you'll
> find here
> 
> 	git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux.git features
> 
> The details probably needs some more work but it works.
> 
> --
> From 1b5ef9ddf899da81a48de826f783b15e6fc45d25 Mon Sep 17 00:00:00 2001
> From: Martin Schwidefsky <schwidefsky@...ibm.com>
> Date: Mon, 21 Nov 2016 10:44:10 +0100
> Subject: [PATCH] s390/cputime: delayed accounting of system time
> 
> The account_system_time() function is called with a cputime that
> occurred while running in the kernel. The function detects which
> context the CPU is currently running in and accounts the time to
> the correct bucket. This forces the arch code to account the
> cputime for hardirq and softirq immediately.
> 
> Make account_guest_time non-static and add account_sys_time,
> account_hardirq_time and account_softirq_time. With these functions
> the arch code can delay the accounting for system time. For s390
> the accounting is done once per timer tick and for each task switch.
> 
> Signed-off-by: Martin Schwidefsky <schwidefsky@...ibm.com>

Thanks a lot for taking care of that! I'll give a try to do the same
on powerpc.

A few comments below:

> ---
>  arch/s390/include/asm/lowcore.h   |  65 ++++++++++++-----------
>  arch/s390/include/asm/processor.h |   3 ++
>  arch/s390/kernel/vtime.c          | 106 ++++++++++++++++++++++----------------
>  include/linux/kernel_stat.h       |  13 +++--
>  kernel/sched/cputime.c            |  22 +++++++-
>  5 files changed, 129 insertions(+), 80 deletions(-)
> 
> diff --git a/arch/s390/include/asm/lowcore.h b/arch/s390/include/asm/lowcore.h
> index 62a5cf1..8a5b082 100644
> --- a/arch/s390/include/asm/lowcore.h
> +++ b/arch/s390/include/asm/lowcore.h
[...]
> @@ -110,34 +119,48 @@ static int do_account_vtime(struct task_struct *tsk, int hardirq_offset)
>  #endif
>  		: "=m" (S390_lowcore.last_update_timer),
>  		  "=m" (S390_lowcore.last_update_clock));
> -	S390_lowcore.system_timer += timer - S390_lowcore.last_update_timer;
> -	S390_lowcore.steal_timer += S390_lowcore.last_update_clock - clock;
> +	clock = S390_lowcore.last_update_clock - clock;
> +	timer -= S390_lowcore.last_update_timer;
> +
> +	if ((tsk->flags & PF_VCPU) && (irq_count() - hardirq_offset == 0))
> +		S390_lowcore.guest_timer += timer;
> +	else if (hardirq_count() - hardirq_offset)
> +		S390_lowcore.hardirq_timer += timer;
> +	else if (in_serving_softirq())
> +		S390_lowcore.softirq_timer += timer;
> +	else
> +		S390_lowcore.system_timer += timer;

I initially thought that some code could be shared for that whole accumulation. Now I
don't know if it would be a good idea. An example would be to deal with the contexts above
in order to store the accumulation to the appropriate place.

>  
>  	/* Update MT utilization calculation */
>  	if (smp_cpu_mtid &&
>  	    time_after64(jiffies_64, this_cpu_read(mt_scaling_jiffies)))
>  		update_mt_scaling();
>  
> +	/* Calculate cputime delta */
>  	user = S390_lowcore.user_timer - tsk->thread.user_timer;
> -	S390_lowcore.steal_timer -= user;
>  	tsk->thread.user_timer = S390_lowcore.user_timer;
> -
> +	guest = S390_lowcore.guest_timer - tsk->thread.guest_timer;
> +	tsk->thread.guest_timer = S390_lowcore.guest_timer;
>  	system = S390_lowcore.system_timer - tsk->thread.system_timer;
> -	S390_lowcore.steal_timer -= system;
>  	tsk->thread.system_timer = S390_lowcore.system_timer;
> -
> -	user_scaled = user;
> -	system_scaled = system;
> -	/* Do MT utilization scaling */
> -	if (smp_cpu_mtid) {
> -		u64 mult = __this_cpu_read(mt_scaling_mult);
> -		u64 div = __this_cpu_read(mt_scaling_div);
> -
> -		user_scaled = (user_scaled * mult) / div;
> -		system_scaled = (system_scaled * mult) / div;
> -	}
> -	account_user_time(tsk, user, user_scaled);
> -	account_system_time(tsk, hardirq_offset, system, system_scaled);
> +	hardirq = S390_lowcore.hardirq_timer - tsk->thread.hardirq_timer;
> +	tsk->thread.hardirq_timer = S390_lowcore.hardirq_timer;
> +	softirq = S390_lowcore.softirq_timer - tsk->thread.softirq_timer;
> +	tsk->thread.softirq_timer = S390_lowcore.softirq_timer;
> +	S390_lowcore.steal_timer +=
> +		clock - user - guest - system - hardirq - softirq;
> +
> +	/* Push account value */
> +	if (user)
> +		account_user_time(tsk, user, scale_vtime(user));
> +	if (guest)
> +		account_guest_time(tsk, guest, scale_vtime(guest));
> +	if (system)
> +		account_sys_time(tsk, system, scale_vtime(system));
> +	if (hardirq)
> +		account_hardirq_time(tsk, hardirq, scale_vtime(hardirq));
> +	if (softirq)
> +		account_softirq_time(tsk, softirq, scale_vtime(softirq));

And doing that would be another part of the shared code.

>  
>  	steal = S390_lowcore.steal_timer;
>  	if ((s64) steal > 0) {
> @@ -145,16 +168,22 @@ static int do_account_vtime(struct task_struct *tsk, int hardirq_offset)
>  		account_steal_time(steal);
>  	}
>  
> -	return virt_timer_forward(user + system);
> +	return virt_timer_forward(user + guest + system + hardirq + softirq);
>  }
>  
>  void vtime_task_switch(struct task_struct *prev)
>  {
>  	do_account_vtime(prev, 0);
>  	prev->thread.user_timer = S390_lowcore.user_timer;
> +	prev->thread.guest_timer = S390_lowcore.guest_timer;
>  	prev->thread.system_timer = S390_lowcore.system_timer;
> +	prev->thread.hardirq_timer = S390_lowcore.hardirq_timer;
> +	prev->thread.softirq_timer = S390_lowcore.softirq_timer;
>  	S390_lowcore.user_timer = current->thread.user_timer;
> +	S390_lowcore.guest_timer = current->thread.guest_timer;
>  	S390_lowcore.system_timer = current->thread.system_timer;
> +	S390_lowcore.hardirq_timer = current->thread.hardirq_timer;
> +	S390_lowcore.softirq_timer = current->thread.softirq_timer;
>  }

Ditto.

>  
>  /*
> @@ -174,31 +203,22 @@ void vtime_account_user(struct task_struct *tsk)
>   */
>  void vtime_account_irq_enter(struct task_struct *tsk)
>  {
> -	u64 timer, system, system_scaled;
> +	u64 timer;
>  
>  	timer = S390_lowcore.last_update_timer;
>  	S390_lowcore.last_update_timer = get_vtimer();
> -	S390_lowcore.system_timer += timer - S390_lowcore.last_update_timer;
> -
> -	/* Update MT utilization calculation */
> -	if (smp_cpu_mtid &&
> -	    time_after64(jiffies_64, this_cpu_read(mt_scaling_jiffies)))
> -		update_mt_scaling();
> -
> -	system = S390_lowcore.system_timer - tsk->thread.system_timer;
> -	S390_lowcore.steal_timer -= system;
> -	tsk->thread.system_timer = S390_lowcore.system_timer;
> -	system_scaled = system;
> -	/* Do MT utilization scaling */
> -	if (smp_cpu_mtid) {
> -		u64 mult = __this_cpu_read(mt_scaling_mult);
> -		u64 div = __this_cpu_read(mt_scaling_div);
> -
> -		system_scaled = (system_scaled * mult) / div;
> -	}
> -	account_system_time(tsk, 0, system, system_scaled);
> -
> -	virt_timer_forward(system);
> +	timer -= S390_lowcore.last_update_timer;
> +
> +	if ((tsk->flags & PF_VCPU) && (irq_count() == 0))
> +		S390_lowcore.guest_timer += timer;
> +	else if (hardirq_count())
> +		S390_lowcore.hardirq_timer += timer;
> +	else if (in_serving_softirq())
> +		S390_lowcore.softirq_timer += timer;
> +	else
> +		S390_lowcore.system_timer += timer;

And Ditto.

We could put together the accumulation in a common struct in s390_lowcore,
and its mirror in thread struct then have helpers take care of the contexts.

How does that sound to you, would it help or hurt?

Thanks.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ