linux-kernel - Re: [PATCH v2 0/11] remove cpu

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5302E6FA.508@linux.vnet.ibm.com>
Date:	Tue, 18 Feb 2014 12:52:10 +0800
From:	Michael wang <wangyun@...ux.vnet.ibm.com>
To:	Alex Shi <alex.shi@...aro.org>, mingo@...hat.com,
	peterz@...radead.org, morten.rasmussen@....com
CC:	vincent.guittot@...aro.org, daniel.lezcano@...aro.org,
	fweisbec@...il.com, linux@....linux.org.uk, tony.luck@...el.com,
	fenghua.yu@...el.com, james.hogan@...tec.com, jason.low2@...com,
	viresh.kumar@...aro.org, hanjun.guo@...aro.org,
	linux-kernel@...r.kernel.org, tglx@...utronix.de,
	akpm@...ux-foundation.org, arjan@...ux.intel.com, pjt@...gle.com,
	fengguang.wu@...el.com, linaro-kernel@...ts.linaro.org
Subject: Re: [PATCH v2 0/11] remove cpu_load in rq

On 02/17/2014 09:55 AM, Alex Shi wrote:
> The cpu_load decays on time according past cpu load of rq. The sched_avg also decays tasks' load on time. Now we has 2 kind decay for cpu_load. That is a kind of redundancy. And increase the system load by decay calculation. This patch try to remove the cpu_load decay.
> 
> There are 5 load_idx used for cpu_load in sched_domain. busy_idx and idle_idx are not zero usually, but newidle_idx, wake_idx and forkexec_idx are all zero on every arch. A shortcut to remove cpu_Load decay in the first patch. just one line patch for this change.
> 
> V2,
> 1, This version do some tuning on load bias of target load, to maximum match current code logical.
> 2, Got further to remove the cpu_load in rq.
> 3, Revert the patch 'Limit sd->*_idx range on sysctl' since no needs
> 
> Any testing/comments are appreciated.

Tested on 12-cpu-x86 box with tip/master, ebizzy and hackbench
works fine, show little improvements for each time's testing.

ebizzy default:

BASE					PATCHED

32506 records/s                         |32785 records/s                        
real 10.00 s                            |real 10.00 s                           
user 50.32 s                            |user 49.66 s                           
sys  69.46 s                            |sys  70.19 s                           
32552 records/s                         |32946 records/s                        
real 10.00 s                            |real 10.00 s                           
user 50.11 s                            |user 50.70 s                           
sys  69.68 s                            |sys  69.15 s                           
32265 records/s                         |32824 records/s                        
real 10.00 s                            |real 10.00 s                           
user 49.46 s                            |user 50.46 s                           
sys  70.28 s                            |sys  69.34 s                           
32489 records/s                         |32735 records/s                        
real 10.00 s                            |real 10.00 s                           
user 49.67 s                            |user 50.21 s                           
sys  70.12 s                            |sys  69.54 s                           
32490 records/s                         |32662 records/s                        
real 10.00 s                            |real 10.00 s                           
user 50.01 s                            |user 50.07 s                           
sys  69.79 s                            |sys  69.68 s                           
32471 records/s                         |32784 records/s                        
real 10.00 s                            |real 10.00 s 
32471 records/s                         |32784 records/s                        
real 10.00 s                            |real 10.00 s                           
user 49.73 s                            |user 49.88 s                           
sys  70.07 s                            |sys  69.87 s                           
32596 records/s                         |32783 records/s                        
real 10.00 s                            |real 10.00 s                           
user 49.81 s                            |user 49.42 s                           
sys  70.00 s                            |sys  70.30 s

hackbench 10000 loops:

BASE					PATCHED

Running with 48*40 (== 1920) tasks.     |Running with 48*40 (== 1920) tasks.    
Time: 30.934                            |Time: 29.965                           
Running with 48*40 (== 1920) tasks.     |Running with 48*40 (== 1920) tasks.    
Time: 31.603                            |Time: 30.410                           
Running with 48*40 (== 1920) tasks.     |Running with 48*40 (== 1920) tasks.    
Time: 31.724                            |Time: 30.627                           
Running with 48*40 (== 1920) tasks.     |Running with 48*40 (== 1920) tasks.    
Time: 31.648                            |Time: 30.596                           
Running with 48*40 (== 1920) tasks.     |Running with 48*40 (== 1920) tasks.    
Time: 31.799                            |Time: 30.763                           
Running with 48*40 (== 1920) tasks.     |Running with 48*40 (== 1920) tasks.    
Time: 31.847                            |Time: 30.532                           
Running with 48*40 (== 1920) tasks.     |Running with 48*40 (== 1920) tasks.    
Time: 31.828                            |Time: 30.871                           
Running with 24*40 (== 960) tasks.      |Running with 24*40 (== 960) tasks.     
Time: 15.768                            |Time: 15.284                           
Running with 24*40 (== 960) tasks.      |Running with 24*40 (== 960) tasks.     
Time: 15.720                            |Time: 15.228                           
Running with 24*40 (== 960) tasks.      |Running with 24*40 (== 960) tasks.     
Time: 15.819                            |Time: 15.373                           
Running with 24*40 (== 960) tasks.      |Running with 24*40 (== 960) tasks.     
Time: 15.888                            |Time: 15.184
Running with 24*40 (== 960) tasks.      |Running with 24*40 (== 960) tasks.     
Time: 15.888                            |Time: 15.184                           
Running with 24*40 (== 960) tasks.      |Running with 24*40 (== 960) tasks.     
Time: 15.660                            |Time: 15.525                           
Running with 24*40 (== 960) tasks.      |Running with 24*40 (== 960) tasks.     
Time: 15.934                            |Time: 15.337                           
Running with 24*40 (== 960) tasks.      |Running with 24*40 (== 960) tasks.     
Time: 15.669                            |Time: 15.357                           
Running with 12*40 (== 480) tasks.      |Running with 12*40 (== 480) tasks.     
Time: 7.699                             |Time: 7.458                            
Running with 12*40 (== 480) tasks.      |Running with 12*40 (== 480) tasks.     
Time: 7.693                             |Time: 7.498                            
Running with 12*40 (== 480) tasks.      |Running with 12*40 (== 480) tasks.     
Time: 7.705                             |Time: 7.439                            
Running with 12*40 (== 480) tasks.      |Running with 12*40 (== 480) tasks.     
Time: 7.664                             |Time: 7.553                            
Running with 12*40 (== 480) tasks.      |Running with 12*40 (== 480) tasks.     
Time: 7.603                             |Time: 7.470                            
Running with 12*40 (== 480) tasks.      |Running with 12*40 (== 480) tasks.     
Time: 7.651                             |Time: 7.491                            
Running with 12*40 (== 480) tasks.      |Running with 12*40 (== 480) tasks.     
Time: 7.647                             |Time: 7.535                        
Running with 12*40 (== 480) tasks.      |Running with 12*40 (== 480) tasks.     
Time: 7.647                             |Time: 7.535                            
Running with 6*40 (== 240) tasks.       |Running with 6*40 (== 240) tasks.      
Time: 6.054                             |Time: 5.293                            
Running with 6*40 (== 240) tasks.       |Running with 6*40 (== 240) tasks.      
Time: 5.417                             |Time: 5.701                            
Running with 6*40 (== 240) tasks.       |Running with 6*40 (== 240) tasks.      
Time: 5.287                             |Time: 5.240                            
Running with 6*40 (== 240) tasks.       |Running with 6*40 (== 240) tasks.      
Time: 5.594                             |Time: 5.571                            
Running with 6*40 (== 240) tasks.       |Running with 6*40 (== 240) tasks.      
Time: 5.347                             |Time: 6.136                            
Running with 6*40 (== 240) tasks.       |Running with 6*40 (== 240) tasks.      
Time: 5.430                             |Time: 5.323                            
Running with 6*40 (== 240) tasks.       |Running with 6*40 (== 240) tasks.      
Time: 5.691                             |Time: 5.481                            
Running with 1*40 (== 40) tasks.        |Running with 1*40 (== 40) tasks.       
Time: 1.192                             |Time: 1.140                            
Running with 1*40 (== 40) tasks.        |Running with 1*40 (== 40) tasks.       
Time: 1.190                             |Time: 1.125                            
Running with 1*40 (== 40) tasks.        |Running with 1*40 (== 40) tasks.       
Time: 1.189                             |Time: 1.013                       
Running with 1*40 (== 40) tasks.        |Running with 1*40 (== 40) tasks.       
Time: 1.189                             |Time: 1.013                            
Running with 1*40 (== 40) tasks.        |Running with 1*40 (== 40) tasks.       
Time: 1.163                             |Time: 1.060                            
Running with 1*40 (== 40) tasks.        |Running with 1*40 (== 40) tasks.       
Time: 1.186                             |Time: 1.131                            
Running with 1*40 (== 40) tasks.        |Running with 1*40 (== 40) tasks.       
Time: 1.175                             |Time: 1.125                            
Running with 1*40 (== 40) tasks.        |Running with 1*40 (== 40) tasks.       
Time: 1.157                             |Time: 0.998 


BTW, I got panic while rebooting, but should not caused by
this patch set, will recheck and post the report later.

Regards,
Michael Wang



INFO: rcu_sched detected stalls on CPUs/tasks: { 7} (detected by 1, t=21002 jiffies, g=6707, c=6706, q=227)
Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 7
CPU: 7 PID: 1040 Comm: bioset Not tainted 3.14.0-rc2-test+ #402
Hardware name: IBM System x3650 M3 -[794582A]-/94Y7614, BIOS -[D6E154AUS-1.13]- 09/23/2011
 0000000000000000 ffff88097f2e7bd8 ffffffff8156b38a 0000000000004f27
 ffffffff817ecb90 ffff88097f2e7c58 ffffffff81561d8d ffff88097f2e7c08
 ffffffff00000010 ffff88097f2e7c68 ffff88097f2e7c08 ffff88097f2e7c78
Call Trace:
 <NMI>  [<ffffffff8156b38a>] dump_stack+0x46/0x58
 [<ffffffff81561d8d>] panic+0xbe/0x1ce
 [<ffffffff810e6b03>] watchdog_overflow_callback+0xb3/0xc0
 [<ffffffff8111e928>] __perf_event_overflow+0x98/0x220
 [<ffffffff8111f224>] perf_event_overflow+0x14/0x20
 [<ffffffff8101eef2>] intel_pmu_handle_irq+0x1c2/0x2c0
 [<ffffffff81089af9>] ? load_balance+0xf9/0x590
 [<ffffffff81089b0d>] ? load_balance+0x10d/0x590
 [<ffffffff81562ac2>] ? printk+0x4d/0x4f
 [<ffffffff815763b4>] perf_event_nmi_handler+0x34/0x60
 [<ffffffff81575b6e>] nmi_handle+0x7e/0x140
 [<ffffffff81575d1a>] default_do_nmi+0x5a/0x250
 [<ffffffff81575fa0>] do_nmi+0x90/0xd0
 [<ffffffff815751e7>] end_repeat_nmi+0x1e/0x2e
 [<ffffffff81089340>] ? find_busiest_group+0x120/0x7e0
 [<ffffffff81089340>] ? find_busiest_group+0x120/0x7e0
 [<ffffffff81089340>] ? find_busiest_group+0x120/0x7e0
 <<EOE>>  [<ffffffff81089b7c>] load_balance+0x17c/0x590
 [<ffffffff8108a49f>] idle_balance+0x10f/0x1c0
 [<ffffffff8108a66e>] pick_next_task_fair+0x11e/0x2a0
 [<ffffffff8107ba53>] ? dequeue_task+0x73/0x90
 [<ffffffff815712b7>] __schedule+0x127/0x670
 [<ffffffff815718d9>] schedule+0x29/0x70
 [<ffffffff8104e3b5>] do_exit+0x2a5/0x470
 [<ffffffff81066c90>] ? process_scheduled_works+0x40/0x40
 [<ffffffff8106e78a>] kthread+0xba/0xe0
 [<ffffffff8106e6d0>] ? flush_kthread_worker+0xb0/0xb0
 [<ffffffff8157d0ec>] ret_from_fork+0x7c/0xb0
 [<ffffffff8106e6d0>] ? flush_kthread_worker+0xb0/0xb0
Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff)


> 
> This patch rebase on latest tip/master.
> The git tree for this patchset at:
>  git@...hub.com:alexshi/power-scheduling.git noload
> 
> Thanks
> Alex
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/