lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20080805150434.BF32.E1E9C6FF@jp.fujitsu.com>
Date:	Tue, 05 Aug 2008 15:39:58 +0900
From:	Yasunori Goto <y-goto@...fujitsu.com>
To:	Christoph Lameter <cl@...ux-foundation.org>
Cc:	Badari Pulavarty <pbadari@...ibm.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Mel Gorman <mel@....ul.ie>, linux-mm <linux-mm@...ck.org>,
	Linux Kernel ML <linux-kernel@...r.kernel.org>
Subject: Re: [RFC:Patch: 000/008](memory hotplug) rough idea of pgdat removing


> >> Duh. Then the use of RCU would also mean that all of reclaim must
> >>  be in a rcu period. So  reclaim cannot sleep anymore.
> > 
> > I use srcu_read_lock() (sleepable rcu lock) if kernel must be sleep for
> > page reclaim. So, my patch basic idea is followings.
> 
> But that introduces more overhead in __alloc_pages.

Hmmm. I think SRCU should be used when kernel has to sleep, and sleep time
will be bigger than SRCU's overhead.....

The followings are results of unixbench and lmbench.
I suppose my patch impacts lantency rather than throghput.
In these results, 100fd select and page fault latencies of lmbench became worse.
So I can't say there is no problem in my patches.

Anyway, I'll retry to find other less impact way if there is,
and compare benchmark results with this way.

Bye.

------------

Unixbench
-----

Normal 2.6.27-rc1-mm1


  BYTE UNIX Benchmarks (Version 4.1.0)
  System -- Linux localhost.localdomain 2.6.27-rc1-mm1 #1 SMP Mon Aug 4 16:08:48 JST 2008 ia64 ia64 ia64 GNU/Linux
  Start Benchmark Run: 2008年  8月  5日 火曜日 10:24:35 JST
   1 interactive users.
   10:24:35 up 9 min,  1 user,  load average: 0.16, 0.08, 0.03
  lrwxrwxrwx 1 root root 4 2008-02-25 15:48 /bin/sh -> bash
  /bin/sh: symbolic link to `bash'
  /dev/sda5             33792348  18360424  13687672  58% /home
Execl Throughput                           2954.0 lps   (29.8 secs, 3 samples)
File Read 1024 bufsize 2000 maxblocks    1211570.0 KBps  (30.0 secs, 3 samples)
File Write 1024 bufsize 2000 maxblocks   281599.0 KBps  (30.0 secs, 3 samples)
File Copy 1024 bufsize 2000 maxblocks    218859.0 KBps  (30.0 secs, 3 samples)
File Read 256 bufsize 500 maxblocks      328725.0 KBps  (30.0 secs, 3 samples)
File Write 256 bufsize 500 maxblocks      72850.0 KBps  (30.0 secs, 3 samples)
File Copy 256 bufsize 500 maxblocks       57095.0 KBps  (30.0 secs, 3 samples)
File Read 4096 bufsize 8000 maxblocks    3883690.0 KBps  (30.0 secs, 3 samples)
File Write 4096 bufsize 8000 maxblocks   1050752.0 KBps  (30.0 secs, 3 samples)
File Copy 4096 bufsize 8000 maxblocks    564703.0 KBps  (30.0 secs, 3 samples)
Pipe Throughput                          462027.5 lps   (10.0 secs, 10 samples)
Pipe-based Context Switching             105824.3 lps   (10.0 secs, 10 samples)
Process Creation                           2242.9 lps   (30.0 secs, 3 samples)
System Call Overhead                     1320907.8 lps   (10.0 secs, 10 samples)
Shell Scripts (1 concurrent)               4442.1 lpm   (60.0 secs, 3 samples)
Shell Scripts (8 concurrent)               1810.0 lpm   (60.0 secs, 3 samples)
Shell Scripts (16 concurrent)              1042.7 lpm   (60.0 secs, 3 samples)


                     INDEX VALUES            
TEST                                        BASELINE     RESULT      INDEX

Execl Throughput                                43.0     2954.0      687.0
File Copy 1024 bufsize 2000 maxblocks         3960.0   218859.0      552.7
File Copy 256 bufsize 500 maxblocks           1655.0    57095.0      345.0
File Copy 4096 bufsize 8000 maxblocks         5800.0   564703.0      973.6
Pipe Throughput                              12440.0   462027.5      371.4
Pipe-based Context Switching                  4000.0   105824.3      264.6
Process Creation                               126.0     2242.9      178.0
Shell Scripts (8 concurrent)                     6.0     1810.0     3016.7
System Call Overhead                         15000.0  1320907.8      880.6
                                                                 =========
     FINAL SCORE                                                     565.6



2.6.27-rc1-mm1 with my patch 


  BYTE UNIX Benchmarks (Version 4.1.0)
  System -- Linux localhost.localdomain 2.6.27-rc1-mm1-goto-test #2 SMP Mon Aug 4 18:50:56 JST 2008 ia64 ia64 ia64 GNU/Linux
  Start Benchmark Run: 2008年  8月  4日 月曜日 20:35:11 JST
   1 interactive users.
   20:35:11 up  1:37,  1 user,  load average: 0.00, 0.29, 0.71
  lrwxrwxrwx 1 root root 4 2008-02-25 15:48 /bin/sh -> bash
  /bin/sh: symbolic link to `bash'
  /dev/sda5             33792348  18360420  13687676  58% /home
Execl Throughput                           2949.0 lps   (29.7 secs, 3 samples)
File Read 1024 bufsize 2000 maxblocks    1317211.0 KBps  (30.0 secs, 3 samples)
File Write 1024 bufsize 2000 maxblocks   282643.0 KBps  (30.0 secs, 3 samples)
File Copy 1024 bufsize 2000 maxblocks    220360.0 KBps  (30.0 secs, 3 samples)
File Read 256 bufsize 500 maxblocks      361448.0 KBps  (30.0 secs, 3 samples)
File Write 256 bufsize 500 maxblocks      73172.0 KBps  (30.0 secs, 3 samples)
File Copy 256 bufsize 500 maxblocks       57489.0 KBps  (30.0 secs, 3 samples)
File Read 4096 bufsize 8000 maxblocks    3819448.0 KBps  (30.0 secs, 3 samples)
File Write 4096 bufsize 8000 maxblocks   1026563.0 KBps  (30.0 secs, 3 samples)
File Copy 4096 bufsize 8000 maxblocks    585218.0 KBps  (30.0 secs, 3 samples)
Pipe Throughput                          482681.7 lps   (10.0 secs, 10 samples)
Pipe-based Context Switching             101437.7 lps   (10.0 secs, 10 samples)
Process Creation                           2237.5 lps   (30.0 secs, 3 samples)
System Call Overhead                     1282198.4 lps   (10.0 secs, 10 samples)
Shell Scripts (1 concurrent)               4447.7 lpm   (60.0 secs, 3 samples)
Shell Scripts (8 concurrent)               1812.7 lpm   (60.0 secs, 3 samples)
Shell Scripts (16 concurrent)              1041.7 lpm   (60.0 secs, 3 samples)


                     INDEX VALUES            
TEST                                        BASELINE     RESULT      INDEX

Execl Throughput                                43.0     2949.0      685.8
File Copy 1024 bufsize 2000 maxblocks         3960.0   220360.0      556.5
File Copy 256 bufsize 500 maxblocks           1655.0    57489.0      347.4
File Copy 4096 bufsize 8000 maxblocks         5800.0   585218.0     1009.0
Pipe Throughput                              12440.0   482681.7      388.0
Pipe-based Context Switching                  4000.0   101437.7      253.6
Process Creation                               126.0     2237.5      177.6
Shell Scripts (8 concurrent)                     6.0     1812.7     3021.2
System Call Overhead                         15000.0  1282198.4      854.8
                                                                 =========
     FINAL SCORE                                                     566.8





LMBENCH

The first lines are results of normal 2.6.27-rc1-mm1.
The second lines are results with my patch.



                 L M B E N C H  3 . 0   S U M M A R Y
                 ------------------------------------
                 (Alpha software, do not distribute)

Basic system parameters
------------------------------------------------------------------------------
Host                 OS Description              Mhz  tlb  cache  mem   scal
                                                     pages line   par   load
                                                           bytes  
--------- ------------- ----------------------- ---- ----- ----- ------ ----
localhost Linux 2.6.27-          ia64-linux-gnu 1600         128           1
localhost Linux 2.6.27-          ia64-linux-gnu 1600         128           1

Processor, Processes - times in microseconds - smaller is better
------------------------------------------------------------------------------
Host                 OS  Mhz null null      open slct sig  sig  fork exec sh  
                             call  I/O stat clos TCP  inst hndl proc proc proc
--------- ------------- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ----
localhost Linux 2.6.27- 1600 0.03 0.23 3.12 4.45 6.73 0.27 1.75 227. 463. 2219
localhost Linux 2.6.27- 1600 0.03 0.23 3.13 4.44 6.74 0.27 1.73 207. 448. 2230

Context switching - times in microseconds - smaller is better
-------------------------------------------------------------------------
Host                 OS  2p/0K 2p/16K 2p/64K 8p/16K 8p/64K 16p/16K 16p/64K
                         ctxsw  ctxsw  ctxsw ctxsw  ctxsw   ctxsw   ctxsw
--------- ------------- ------ ------ ------ ------ ------ ------- -------
localhost Linux 2.6.27-   11.3   11.4   11.5   11.5   12.7    11.8    14.6
localhost Linux 2.6.27-   11.5   11.4   11.5   11.6   12.8    11.9    14.7

*Local* Communication latencies in microseconds - smaller is better
---------------------------------------------------------------------
Host                 OS 2p/0K  Pipe AF     UDP  RPC/   TCP  RPC/ TCP
                        ctxsw       UNIX         UDP         TCP conn
--------- ------------- ----- ----- ---- ----- ----- ----- ----- ----
localhost Linux 2.6.27-  11.3 8.464 28.3  13.4        28.7        46.
localhost Linux 2.6.27-  11.5 8.470 28.3  13.4        32.2        46.

File & VM system latencies in microseconds - smaller is better
-------------------------------------------------------------------------------
Host                 OS   0K File      10K File     Mmap    Prot   Page   100fd
                        Create Delete Create Delete Latency Fault  Fault  selct
--------- ------------- ------ ------ ------ ------ ------- ----- ------- -----
localhost Linux 2.6.27-   15.1   13.4   45.6   25.4   24.0K 0.384 0.23850 2.804
localhost Linux 2.6.27-   15.8   13.3   43.0   26.0   24.1K 0.401 0.25150 2.835

*Local* Communication bandwidths in MB/s - bigger is better
------------------------------------------------------------------------------
Host                 OS Description              Mhz  tlb  cache  mem   scal
                                                     pages line   par   load
                                                           bytes  
--------- ------------- ----------------------- ---- ----- ----- ------ ----
localhost Linux 2.6.27-          ia64-linux-gnu 1600         128           1
localhost Linux 2.6.27-          ia64-linux-gnu 1600         128           1

Processor, Processes - times in microseconds - smaller is better
------------------------------------------------------------------------------
Host                 OS  Mhz null null      open slct sig  sig  fork exec sh  
                             call  I/O stat clos TCP  inst hndl proc proc proc
--------- ------------- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ----
localhost Linux 2.6.27- 1600 0.03 0.23 3.12 4.45 6.73 0.27 1.75 227. 463. 2219
localhost Linux 2.6.27- 1600 0.03 0.23 3.13 4.44 6.74 0.27 1.73 207. 448. 2230

Context switching - times in microseconds - smaller is better
-------------------------------------------------------------------------
Host                 OS  2p/0K 2p/16K 2p/64K 8p/16K 8p/64K 16p/16K 16p/64K
                         ctxsw  ctxsw  ctxsw ctxsw  ctxsw   ctxsw   ctxsw
--------- ------------- ------ ------ ------ ------ ------ ------- -------
localhost Linux 2.6.27-   11.3   11.4   11.5   11.5   12.7    11.8    14.6
localhost Linux 2.6.27-   11.5   11.4   11.5   11.6   12.8    11.9    14.7

*Local* Communication latencies in microseconds - smaller is better
---------------------------------------------------------------------
Host                 OS 2p/0K  Pipe AF     UDP  RPC/   TCP  RPC/ TCP
                        ctxsw       UNIX         UDP         TCP conn
--------- ------------- ----- ----- ---- ----- ----- ----- ----- ----
localhost Linux 2.6.27-  11.3 8.464 28.3  13.4        28.7        46.
localhost Linux 2.6.27-  11.5 8.470 28.3  13.4        32.2        46.

File & VM system latencies in microseconds - smaller is better
-------------------------------------------------------------------------------
Host                 OS   0K File      10K File     Mmap    Prot   Page   100fd
                        Create Delete Create Delete Latency Fault  Fault  selct
--------- ------------- ------ ------ ------ ------ ------- ----- ------- -----
localhost Linux 2.6.27-   15.1   13.4   45.6   25.4   24.0K 0.384 0.23850 2.804  <---!!!
localhost Linux 2.6.27-   15.8   13.3   43.0   26.0   24.1K 0.401 0.25150 2.835  <----!!!

*Local* Communication bandwidths in MB/s - bigger is better
-----------------------------------------------------------------------------
Host                OS  Pipe AF    TCP  File   Mmap  Bcopy  Bcopy  Mem   Mem
                             UNIX      reread reread (libc) (hand) read write
--------- ------------- ---- ---- ---- ------ ------ ------ ------ ---- -----
localhost Linux 2.6.27- 4814 4100 1188 2087.4  523.2  549.6  274.9 458. 523.5
localhost Linux 2.6.27- 4811 4111 1219 2090.8  523.1  549.4  276.1 458. 523.5
(END) 



-- 
Yasunori Goto 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ