lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20161121111728.13a0a3db@mschwide>
Date:   Mon, 21 Nov 2016 11:17:28 +0100
From:   Martin Schwidefsky <schwidefsky@...ibm.com>
To:     Martin Schwidefsky <schwidefsky@...ibm.com>
Cc:     Frederic Weisbecker <fweisbec@...il.com>,
        LKML <linux-kernel@...r.kernel.org>,
        Tony Luck <tony.luck@...el.com>,
        Wanpeng Li <wanpeng.li@...mail.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Michael Ellerman <mpe@...erman.id.au>,
        Heiko Carstens <heiko.carstens@...ibm.com>,
        Benjamin Herrenschmidt <benh@...nel.crashing.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Paul Mackerras <paulus@...ba.org>,
        Ingo Molnar <mingo@...nel.org>,
        Fenghua Yu <fenghua.yu@...el.com>,
        Rik van Riel <riel@...hat.com>,
        Stanislaw Gruszka <sgruszka@...hat.com>
Subject: Re: [PATCH 00/36] cputime: Convert core use of cputime_t to nsecs

On Mon, 21 Nov 2016 07:59:56 +0100
Martin Schwidefsky <schwidefsky@...ibm.com> wrote:

> On Fri, 18 Nov 2016 15:47:02 +0100
> Frederic Weisbecker <fweisbec@...il.com> wrote:
> 
> > On Fri, Nov 18, 2016 at 01:08:46PM +0100, Martin Schwidefsky wrote:
> > > On Thu, 17 Nov 2016 19:08:07 +0100
> > > Frederic Weisbecker <fweisbec@...il.com> wrote:
> > > 
> > > Now it has been proposed to implement lazy accounting to accumulate deltas
> > > and do the expensive conversions only infrequently. This is pretty straight-
> > > forward for account_user_time but to do this for the account_system_time
> > > function is more complicated. The function has to differentiate between
> > > guest/hardirq/softirq and pure system time. We would need to keep sums for
> > > each bucket and provide a separate function to add to each bucket. Like
> > > account_guest_time(), account_hardirq_time(), account_softirq_time() and
> > > account_system_time(). Then it is up to the arch code to sort out the details
> > > and call the accounting code once per jiffy for each of the buckets.
> > 
> > That wouldn't be too hard really. The s390 code in vtime.c already does that.
> 
> Yes, I agree that the accumulating change would not be too hard. Can I make the
> request that we try to get that done first before doing the cleanup ?

Played with the idea a bit, here is a prototype patch to do the delay system time
accounting for s390. It applies against the latest s390 features tree which you'll
find here

	git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux.git features

The details probably needs some more work but it works.

--
>From 1b5ef9ddf899da81a48de826f783b15e6fc45d25 Mon Sep 17 00:00:00 2001
From: Martin Schwidefsky <schwidefsky@...ibm.com>
Date: Mon, 21 Nov 2016 10:44:10 +0100
Subject: [PATCH] s390/cputime: delayed accounting of system time

The account_system_time() function is called with a cputime that
occurred while running in the kernel. The function detects which
context the CPU is currently running in and accounts the time to
the correct bucket. This forces the arch code to account the
cputime for hardirq and softirq immediately.

Make account_guest_time non-static and add account_sys_time,
account_hardirq_time and account_softirq_time. With these functions
the arch code can delay the accounting for system time. For s390
the accounting is done once per timer tick and for each task switch.

Signed-off-by: Martin Schwidefsky <schwidefsky@...ibm.com>
---
 arch/s390/include/asm/lowcore.h   |  65 ++++++++++++-----------
 arch/s390/include/asm/processor.h |   3 ++
 arch/s390/kernel/vtime.c          | 106 ++++++++++++++++++++++----------------
 include/linux/kernel_stat.h       |  13 +++--
 kernel/sched/cputime.c            |  22 +++++++-
 5 files changed, 129 insertions(+), 80 deletions(-)

diff --git a/arch/s390/include/asm/lowcore.h b/arch/s390/include/asm/lowcore.h
index 62a5cf1..8a5b082 100644
--- a/arch/s390/include/asm/lowcore.h
+++ b/arch/s390/include/asm/lowcore.h
@@ -85,53 +85,56 @@ struct lowcore {
 	__u64	mcck_enter_timer;		/* 0x02c0 */
 	__u64	exit_timer;			/* 0x02c8 */
 	__u64	user_timer;			/* 0x02d0 */
-	__u64	system_timer;			/* 0x02d8 */
-	__u64	steal_timer;			/* 0x02e0 */
-	__u64	last_update_timer;		/* 0x02e8 */
-	__u64	last_update_clock;		/* 0x02f0 */
-	__u64	int_clock;			/* 0x02f8 */
-	__u64	mcck_clock;			/* 0x0300 */
-	__u64	clock_comparator;		/* 0x0308 */
+	__u64	guest_timer;			/* 0x02d8 */
+	__u64	system_timer;			/* 0x02e0 */
+	__u64	hardirq_timer;			/* 0x02e8 */
+	__u64	softirq_timer;			/* 0x02f0 */
+	__u64	steal_timer;			/* 0x02f8 */
+	__u64	last_update_timer;		/* 0x0300 */
+	__u64	last_update_clock;		/* 0x0308 */
+	__u64	int_clock;			/* 0x0310 */
+	__u64	mcck_clock;			/* 0x0318 */
+	__u64	clock_comparator;		/* 0x0320 */
 
 	/* Current process. */
-	__u64	current_task;			/* 0x0310 */
-	__u8	pad_0x318[0x320-0x318];		/* 0x0318 */
-	__u64	kernel_stack;			/* 0x0320 */
+	__u64	current_task;			/* 0x0328 */
+	__u8	pad_0x318[0x320-0x318];		/* 0x0330 */
+	__u64	kernel_stack;			/* 0x0338 */
 
 	/* Interrupt, panic and restart stack. */
-	__u64	async_stack;			/* 0x0328 */
-	__u64	panic_stack;			/* 0x0330 */
-	__u64	restart_stack;			/* 0x0338 */
+	__u64	async_stack;			/* 0x0340 */
+	__u64	panic_stack;			/* 0x0348 */
+	__u64	restart_stack;			/* 0x0350 */
 
 	/* Restart function and parameter. */
-	__u64	restart_fn;			/* 0x0340 */
-	__u64	restart_data;			/* 0x0348 */
-	__u64	restart_source;			/* 0x0350 */
+	__u64	restart_fn;			/* 0x0358 */
+	__u64	restart_data;			/* 0x0360 */
+	__u64	restart_source;			/* 0x0368 */
 
 	/* Address space pointer. */
-	__u64	kernel_asce;			/* 0x0358 */
-	__u64	user_asce;			/* 0x0360 */
+	__u64	kernel_asce;			/* 0x0370 */
+	__u64	user_asce;			/* 0x0378 */
 
 	/*
 	 * The lpp and current_pid fields form a
 	 * 64-bit value that is set as program
 	 * parameter with the LPP instruction.
 	 */
-	__u32	lpp;				/* 0x0368 */
-	__u32	current_pid;			/* 0x036c */
+	__u32	lpp;				/* 0x0380 */
+	__u32	current_pid;			/* 0x0384 */
 
 	/* SMP info area */
-	__u32	cpu_nr;				/* 0x0370 */
-	__u32	softirq_pending;		/* 0x0374 */
-	__u64	percpu_offset;			/* 0x0378 */
-	__u64	vdso_per_cpu_data;		/* 0x0380 */
-	__u64	machine_flags;			/* 0x0388 */
-	__u32	preempt_count;			/* 0x0390 */
-	__u8	pad_0x0394[0x0398-0x0394];	/* 0x0394 */
-	__u64	gmap;				/* 0x0398 */
-	__u32	spinlock_lockval;		/* 0x03a0 */
-	__u32	fpu_flags;			/* 0x03a4 */
-	__u8	pad_0x03a8[0x0400-0x03a8];	/* 0x03a8 */
+	__u32	cpu_nr;				/* 0x0388 */
+	__u32	softirq_pending;		/* 0x038c */
+	__u64	percpu_offset;			/* 0x0390 */
+	__u64	vdso_per_cpu_data;		/* 0x0398 */
+	__u64	machine_flags;			/* 0x03a0 */
+	__u32	preempt_count;			/* 0x03a8 */
+	__u8	pad_0x03ac[0x03b0-0x03ac];	/* 0x03ac */
+	__u64	gmap;				/* 0x03b0 */
+	__u32	spinlock_lockval;		/* 0x03b8 */
+	__u32	fpu_flags;			/* 0x03bc */
+	__u8	pad_0x03c0[0x0400-0x03c0];	/* 0x03c0 */
 
 	/* Per cpu primary space access list */
 	__u32	paste[16];			/* 0x0400 */
diff --git a/arch/s390/include/asm/processor.h b/arch/s390/include/asm/processor.h
index bf8b2e2..0234eea 100644
--- a/arch/s390/include/asm/processor.h
+++ b/arch/s390/include/asm/processor.h
@@ -111,7 +111,10 @@ struct thread_struct {
 	unsigned int  acrs[NUM_ACRS];
         unsigned long ksp;              /* kernel stack pointer             */
 	unsigned long user_timer;	/* task cputime in user space */
+	unsigned long guest_timer;	/* task cputime in kvm guest */
 	unsigned long system_timer;	/* task cputime in kernel space */
+	unsigned long hardirq_timer;	/* task cputime in hardirq context */
+	unsigned long softirq_timer;	/* task cputime in softirq context */
 	unsigned long sys_call_table;	/* system call table address */
 	mm_segment_t mm_segment;
 	unsigned long gmap_addr;	/* address of last gmap fault. */
diff --git a/arch/s390/kernel/vtime.c b/arch/s390/kernel/vtime.c
index 9a6c957..fa262c2 100644
--- a/arch/s390/kernel/vtime.c
+++ b/arch/s390/kernel/vtime.c
@@ -90,14 +90,23 @@ static void update_mt_scaling(void)
 	__this_cpu_write(mt_scaling_jiffies, jiffies_64);
 }
 
+static inline u64 scale_vtime(u64 vtime)
+{
+	u64 mult = __this_cpu_read(mt_scaling_mult);
+	u64 div = __this_cpu_read(mt_scaling_div);
+
+	if (smp_cpu_mtid)
+		return vtime * mult / div;
+	return vtime;
+}
+
 /*
  * Update process times based on virtual cpu times stored by entry.S
  * to the lowcore fields user_timer, system_timer & steal_clock.
  */
 static int do_account_vtime(struct task_struct *tsk, int hardirq_offset)
 {
-	u64 timer, clock, user, system, steal;
-	u64 user_scaled, system_scaled;
+	u64 timer, clock, user, guest, system, hardirq, softirq, steal;
 
 	timer = S390_lowcore.last_update_timer;
 	clock = S390_lowcore.last_update_clock;
@@ -110,34 +119,48 @@ static int do_account_vtime(struct task_struct *tsk, int hardirq_offset)
 #endif
 		: "=m" (S390_lowcore.last_update_timer),
 		  "=m" (S390_lowcore.last_update_clock));
-	S390_lowcore.system_timer += timer - S390_lowcore.last_update_timer;
-	S390_lowcore.steal_timer += S390_lowcore.last_update_clock - clock;
+	clock = S390_lowcore.last_update_clock - clock;
+	timer -= S390_lowcore.last_update_timer;
+
+	if ((tsk->flags & PF_VCPU) && (irq_count() - hardirq_offset == 0))
+		S390_lowcore.guest_timer += timer;
+	else if (hardirq_count() - hardirq_offset)
+		S390_lowcore.hardirq_timer += timer;
+	else if (in_serving_softirq())
+		S390_lowcore.softirq_timer += timer;
+	else
+		S390_lowcore.system_timer += timer;
 
 	/* Update MT utilization calculation */
 	if (smp_cpu_mtid &&
 	    time_after64(jiffies_64, this_cpu_read(mt_scaling_jiffies)))
 		update_mt_scaling();
 
+	/* Calculate cputime delta */
 	user = S390_lowcore.user_timer - tsk->thread.user_timer;
-	S390_lowcore.steal_timer -= user;
 	tsk->thread.user_timer = S390_lowcore.user_timer;
-
+	guest = S390_lowcore.guest_timer - tsk->thread.guest_timer;
+	tsk->thread.guest_timer = S390_lowcore.guest_timer;
 	system = S390_lowcore.system_timer - tsk->thread.system_timer;
-	S390_lowcore.steal_timer -= system;
 	tsk->thread.system_timer = S390_lowcore.system_timer;
-
-	user_scaled = user;
-	system_scaled = system;
-	/* Do MT utilization scaling */
-	if (smp_cpu_mtid) {
-		u64 mult = __this_cpu_read(mt_scaling_mult);
-		u64 div = __this_cpu_read(mt_scaling_div);
-
-		user_scaled = (user_scaled * mult) / div;
-		system_scaled = (system_scaled * mult) / div;
-	}
-	account_user_time(tsk, user, user_scaled);
-	account_system_time(tsk, hardirq_offset, system, system_scaled);
+	hardirq = S390_lowcore.hardirq_timer - tsk->thread.hardirq_timer;
+	tsk->thread.hardirq_timer = S390_lowcore.hardirq_timer;
+	softirq = S390_lowcore.softirq_timer - tsk->thread.softirq_timer;
+	tsk->thread.softirq_timer = S390_lowcore.softirq_timer;
+	S390_lowcore.steal_timer +=
+		clock - user - guest - system - hardirq - softirq;
+
+	/* Push account value */
+	if (user)
+		account_user_time(tsk, user, scale_vtime(user));
+	if (guest)
+		account_guest_time(tsk, guest, scale_vtime(guest));
+	if (system)
+		account_sys_time(tsk, system, scale_vtime(system));
+	if (hardirq)
+		account_hardirq_time(tsk, hardirq, scale_vtime(hardirq));
+	if (softirq)
+		account_softirq_time(tsk, softirq, scale_vtime(softirq));
 
 	steal = S390_lowcore.steal_timer;
 	if ((s64) steal > 0) {
@@ -145,16 +168,22 @@ static int do_account_vtime(struct task_struct *tsk, int hardirq_offset)
 		account_steal_time(steal);
 	}
 
-	return virt_timer_forward(user + system);
+	return virt_timer_forward(user + guest + system + hardirq + softirq);
 }
 
 void vtime_task_switch(struct task_struct *prev)
 {
 	do_account_vtime(prev, 0);
 	prev->thread.user_timer = S390_lowcore.user_timer;
+	prev->thread.guest_timer = S390_lowcore.guest_timer;
 	prev->thread.system_timer = S390_lowcore.system_timer;
+	prev->thread.hardirq_timer = S390_lowcore.hardirq_timer;
+	prev->thread.softirq_timer = S390_lowcore.softirq_timer;
 	S390_lowcore.user_timer = current->thread.user_timer;
+	S390_lowcore.guest_timer = current->thread.guest_timer;
 	S390_lowcore.system_timer = current->thread.system_timer;
+	S390_lowcore.hardirq_timer = current->thread.hardirq_timer;
+	S390_lowcore.softirq_timer = current->thread.softirq_timer;
 }
 
 /*
@@ -174,31 +203,22 @@ void vtime_account_user(struct task_struct *tsk)
  */
 void vtime_account_irq_enter(struct task_struct *tsk)
 {
-	u64 timer, system, system_scaled;
+	u64 timer;
 
 	timer = S390_lowcore.last_update_timer;
 	S390_lowcore.last_update_timer = get_vtimer();
-	S390_lowcore.system_timer += timer - S390_lowcore.last_update_timer;
-
-	/* Update MT utilization calculation */
-	if (smp_cpu_mtid &&
-	    time_after64(jiffies_64, this_cpu_read(mt_scaling_jiffies)))
-		update_mt_scaling();
-
-	system = S390_lowcore.system_timer - tsk->thread.system_timer;
-	S390_lowcore.steal_timer -= system;
-	tsk->thread.system_timer = S390_lowcore.system_timer;
-	system_scaled = system;
-	/* Do MT utilization scaling */
-	if (smp_cpu_mtid) {
-		u64 mult = __this_cpu_read(mt_scaling_mult);
-		u64 div = __this_cpu_read(mt_scaling_div);
-
-		system_scaled = (system_scaled * mult) / div;
-	}
-	account_system_time(tsk, 0, system, system_scaled);
-
-	virt_timer_forward(system);
+	timer -= S390_lowcore.last_update_timer;
+
+	if ((tsk->flags & PF_VCPU) && (irq_count() == 0))
+		S390_lowcore.guest_timer += timer;
+	else if (hardirq_count())
+		S390_lowcore.hardirq_timer += timer;
+	else if (in_serving_softirq())
+		S390_lowcore.softirq_timer += timer;
+	else
+		S390_lowcore.system_timer += timer;
+
+	virt_timer_forward(timer);
 }
 EXPORT_SYMBOL_GPL(vtime_account_irq_enter);
 
diff --git a/include/linux/kernel_stat.h b/include/linux/kernel_stat.h
index 44fda64..ec3e900 100644
--- a/include/linux/kernel_stat.h
+++ b/include/linux/kernel_stat.h
@@ -78,10 +78,15 @@ static inline unsigned int kstat_cpu_irqs_sum(unsigned int cpu)
 	return kstat_cpu(cpu).irqs_sum;
 }
 
-extern void account_user_time(struct task_struct *, cputime_t, cputime_t);
-extern void account_system_time(struct task_struct *, int, cputime_t, cputime_t);
-extern void account_steal_time(cputime_t);
-extern void account_idle_time(cputime_t);
+void account_user_time(struct task_struct *, cputime_t, cputime_t);
+void account_guest_time(struct task_struct *, cputime_t, cputime_t);
+void account_sys_time(struct task_struct *, cputime_t, cputime_t);
+void account_hardirq_time(struct task_struct *, cputime_t, cputime_t);
+void account_softirq_time(struct task_struct *, cputime_t, cputime_t);
+void account_steal_time(cputime_t);
+void account_idle_time(cputime_t);
+
+void account_system_time(struct task_struct *, int, cputime_t, cputime_t);
 
 #ifdef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
 static inline void account_process_tick(struct task_struct *tsk, int user)
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 5ebee31..042f1a3 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -155,8 +155,8 @@ void account_user_time(struct task_struct *p, cputime_t cputime,
  * @cputime: the cpu time spent in virtual machine since the last update
  * @cputime_scaled: cputime scaled by cpu frequency
  */
-static void account_guest_time(struct task_struct *p, cputime_t cputime,
-			       cputime_t cputime_scaled)
+void account_guest_time(struct task_struct *p, cputime_t cputime,
+			cputime_t cputime_scaled)
 {
 	u64 *cpustat = kcpustat_this_cpu->cpustat;
 
@@ -226,6 +226,24 @@ void account_system_time(struct task_struct *p, int hardirq_offset,
 	__account_system_time(p, cputime, cputime_scaled, index);
 }
 
+void account_sys_time(struct task_struct *p, cputime_t cputime,
+		      cputime_t cputime_scaled)
+{
+	__account_system_time(p, cputime, cputime_scaled, CPUTIME_SYSTEM);
+}
+
+void account_hardirq_time(struct task_struct *p, cputime_t cputime,
+			  cputime_t cputime_scaled)
+{
+	__account_system_time(p, cputime, cputime_scaled, CPUTIME_IRQ);
+}
+
+void account_softirq_time(struct task_struct *p, cputime_t cputime,
+			  cputime_t cputime_scaled)
+{
+	__account_system_time(p, cputime, cputime_scaled, CPUTIME_SOFTIRQ);
+}
+
 /*
  * Account for involuntary wait time.
  * @cputime: the cpu time spent in involuntary wait
-- 
2.8.4


-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ