lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20160929031831.GA1175@swordfish>
Date:   Thu, 29 Sep 2016 12:18:31 +0900
From:   Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>
To:     Minchan Kim <minchan@...nel.org>
Cc:     Andrew Morton <akpm@...ux-foundation.org>,
        Sergey Senozhatsky <sergey.senozhatsky@...il.com>,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH 2/3] zram: support page-based parallel write

Hello Minchan,

On (09/22/16 15:42), Minchan Kim wrote:
> zram supports stream-based parallel compression. IOW, it can support
> parallel compression on SMP system only if each cpus has streams.
> For example, assuming 4 CPU system, there are 4 sources for compressing
> in system and each source must be located in each CPUs for full
> parallel compression.
> 
> So, if there is *one* stream in the system, it cannot be compressed
> in parallel although the system supports multiple CPUs. This patch
> aims to overcome such weakness.
> 
> The idea is to use multiple background threads to compress pages
> in idle CPU and foreground just queues BIOs without interrupting
> while other CPUs consumes pages in BIO to compress.
> It means zram begins to support asynchronous writeback to increase
> write bandwidth.
> 
> 1) test cp A to B as an example of single stream compression and
> enhanced 36%.
> 
> x86_64, 4 CPU
> Copy kernel source to zram
> old: 3.4s, new: 2.2s
> 
> 2) test per-process reclaim to swap: 524M
> x86_64, 4 CPU:
> old: 1.2s new: 0.3s
> 
> 3) FIO benchamrk
> random read was worse so it supports only write at the moment.
> Later, We might revisit asynchronous read.


sorry for long reply.

frankly speaking, sorry, I'm very skeptical about the patch set.

from your tests it seems that only a tiny corner case can gain some
extra performance: when we have SMP system with multiple CPUs, but
*guaranteed* only one process doing *only* one type of requests.
as soon as this process starts to do things simultaneously (like
mixed READ-WRITE) _or_ there are several processes: we are done. and
for that tiny corner case we are about to add a complex logic and a
big pile of code. I'm quite sure I'll never enable CONFIG_ZRAM_ASYNC_IO.
why would you enable it? I mean what setups you are looking at that will
benefit? hosting a CVS repository? :) just kidding.

are there any block devices being specifically optimized for a "one
process doing one OP" cases?

my tests show a dramatic performance drop down with NEW zram.
even "one" process case (one fio job) is almost x3 slower.
somtimes WRITE test case even go from MB/s to KB/s

	WRITE:          3181.4MB/s→      948111KB/s


I've attached the .config


ENV
===

  x86_64 SMP (4 CPUs), "bare zram" 2g, lzo, static compression buffer.


TEST COMMAND
============

  ZRAM_SIZE=2G ZRAM_COMP_ALG=lzo LOG_SUFFIX={NEW, OLD} FIO_LOOPS=2 ./zram-fio-test.sh


EXECUTED TESTS
==============

 - [seq-read]
 - [rand-read]
 - [seq-write]
 - [rand-write]
 - [mixed-seq]
 - [mixed-rand]


RESULTS
=======

# ./fio-perf-o-meter.sh test-fio-zram-OLD test-fio-zram-NEW
Processing test-fio-zram-OLD
Processing test-fio-zram-NEW


                OLD              NEW

#jobs1                         	                
READ:           2345.1MB/s	 2373.2MB/s
READ:           1948.2MB/s	 1987.7MB/s
WRITE:          1292.7MB/s	 275277KB/s
WRITE:          1047.5MB/s	 257140KB/s
READ:           429530KB/s	 175450KB/s
WRITE:          429840KB/s	 175576KB/s
READ:           414074KB/s	 164091KB/s
WRITE:          414402KB/s	 164221KB/s
#jobs2                         	                
READ:           4484.7MB/s	 4532.7MB/s
READ:           3705.7MB/s	 3744.6MB/s
WRITE:          2170.7MB/s	 492404KB/s
WRITE:          1864.4MB/s	 470723KB/s
READ:           829949KB/s	 340146KB/s
WRITE:          830065KB/s	 340194KB/s
READ:           805639KB/s	 336380KB/s
WRITE:          807140KB/s	 337006KB/s
#jobs3                         	                
READ:           5920.1MB/s	 6025.6MB/s
READ:           4845.5MB/s	 5037.5MB/s
WRITE:          2956.3MB/s	 777683KB/s
WRITE:          2525.7MB/s	 727868KB/s
READ:           1083.6MB/s	 507481KB/s
WRITE:          1085.1MB/s	 508634KB/s
READ:           1114.2MB/s	 493014KB/s
WRITE:          1114.7MB/s	 492849KB/s
#jobs4                         	                
READ:           7819.3MB/s	 7897.2MB/s
READ:           6445.4MB/s	 6604.7MB/s
WRITE:          3737.3MB/s	 1002.6MB/s
WRITE:          3232.5MB/s	 974777KB/s
READ:           1447.1MB/s	 592012KB/s
WRITE:          1448.5MB/s	 592205KB/s
READ:           1427.5MB/s	 569307KB/s
WRITE:          1428.1MB/s	 569881KB/s
#jobs5                         	                
READ:           7201.2MB/s	 7560.1MB/s
READ:           5710.4MB/s	 6078.4MB/s
WRITE:          3635.1MB/s	 989502KB/s
WRITE:          3131.6MB/s	 949969KB/s
READ:           1428.4MB/s	 650856KB/s
WRITE:          1429.7MB/s	 651182KB/s
READ:           1413.9MB/s	 644587KB/s
WRITE:          1412.6MB/s	 644328KB/s
#jobs6                         	                
READ:           7252.5MB/s	 7248.2MB/s
READ:           6150.2MB/s	 6396.7MB/s
WRITE:          3583.3MB/s	 954890KB/s
WRITE:          2994.2MB/s	 921172KB/s
READ:           1444.6MB/s	 768636KB/s
WRITE:          1445.7MB/s	 769178KB/s
READ:           1350.3MB/s	 652676KB/s
WRITE:          1349.3MB/s	 652063KB/s
#jobs7                         	                
READ:           7681.4MB/s	 7579.9MB/s
READ:           6018.5MB/s	 6247.6MB/s
WRITE:          3819.3MB/s	 978.54MB/s
WRITE:          3143.7MB/s	 962585KB/s
READ:           1473.1MB/s	 815388KB/s
WRITE:          1473.9MB/s	 814944KB/s
READ:           1389.6MB/s	 610843KB/s
WRITE:          1388.9MB/s	 610764KB/s
#jobs8                         	                
READ:           7658.5MB/s	 7818.4MB/s
READ:           6047.2MB/s	 6021.4MB/s
WRITE:          3690.7MB/s	 1059.6MB/s
WRITE:          3092.7MB/s	 1024.6MB/s
READ:           1435.4MB/s	 826314KB/s
WRITE:          1435.2MB/s	 826125KB/s
READ:           1426.7MB/s	 569216KB/s
WRITE:          1428.1MB/s	 569817KB/s
#jobs9                         	                
READ:           7642.9MB/s	 7982.7MB/s
READ:           5941.4MB/s	 6293.7MB/s
WRITE:          3790.6MB/s	 1050.2MB/s
WRITE:          3181.4MB/s	 948111KB/s
READ:           1430.8MB/s	 758947KB/s
WRITE:          1431.4MB/s	 759260KB/s
READ:           1420.8MB/s	 449894KB/s
WRITE:          1420.2MB/s	 449912KB/s
#jobs10                        	                
READ:           7552.6MB/s	 7853.8MB/s
READ:           5979.6MB/s	 6049.3MB/s
WRITE:          3690.8MB/s	 985210KB/s
WRITE:          3047.2MB/s	 971323KB/s
READ:           1466.7MB/s	 750863KB/s
WRITE:          1467.6MB/s	 751322KB/s
READ:           1390.7MB/s	 431071KB/s
WRITE:          1391.4MB/s	 431267KB/s


                                                   OLD                           NEW


jobs1                              perfstat         	                          
stalled-cycles-frontend      42,179,294,111 (  42.27%)	   69,980,596,543 (  54.76%)
stalled-cycles-backend       20,291,324,679 (  20.33%)	   42,209,369,439 (  33.03%)
instructions                115,949,023,077 (    1.16)	  108,226,927,382 (    0.85)
branches                     22,915,506,669 ( 726.915)	   20,930,779,988 ( 455.148)
branch-misses                   157,490,582 (   0.69%)	      393,100,266 (   1.88%)
jobs2                              perfstat         	                          
stalled-cycles-frontend      99,808,718,071 (  47.20%)	  138,353,381,157 (  54.46%)
stalled-cycles-backend       50,740,071,798 (  23.99%)	   86,000,378,224 (  33.85%)
instructions                231,953,824,813 (    1.10)	  215,166,725,962 (    0.85)
branches                     45,819,311,222 ( 683.280)	   41,765,459,724 ( 417.576)
branch-misses                   367,871,064 (   0.80%)	      793,989,808 (   1.90%)
jobs3                              perfstat         	                          
stalled-cycles-frontend     143,472,445,917 (  46.50%)	  207,584,197,915 (  54.61%)
stalled-cycles-backend       70,928,315,293 (  22.99%)	  126,440,378,366 (  33.26%)
instructions                348,003,016,792 (    1.13)	  320,968,072,847 (    0.84)
branches                     68,787,283,790 ( 619.178)	   62,580,295,200 ( 411.530)
branch-misses                   449,811,959 (   0.65%)	    1,113,447,333 (   1.78%)
jobs4                              perfstat         	                          
stalled-cycles-frontend     201,950,202,659 (  47.96%)	  278,741,702,134 (  55.39%)
stalled-cycles-backend      101,955,523,018 (  24.21%)	  171,537,536,649 (  34.08%)
instructions                463,875,933,843 (    1.10)	  418,163,782,630 (    0.83)
branches                     91,720,839,796 ( 604.464)	   81,267,313,414 ( 416.350)
branch-misses                   701,009,770 (   0.76%)	    1,328,101,057 (   1.63%)
jobs5                              perfstat         	                          
stalled-cycles-frontend     244,426,118,305 (  47.17%)	  338,770,490,424 (  55.06%)
stalled-cycles-backend      121,688,433,877 (  23.48%)	  206,537,478,646 (  33.57%)
instructions                580,617,471,008 (    1.12)	  518,727,560,729 (    0.84)
branches                    114,998,494,737 ( 619.217)	  100,587,852,486 ( 424.034)
branch-misses                   755,197,302 (   0.66%)	    1,486,131,250 (   1.48%)
jobs6                              perfstat         	                          
stalled-cycles-frontend     306,426,786,501 (  48.24%)	  418,675,686,722 (  55.28%)
stalled-cycles-backend      155,564,868,859 (  24.49%)	  261,774,760,749 (  34.57%)
instructions                698,910,704,460 (    1.10)	  640,996,755,296 (    0.85)
branches                    138,734,721,168 ( 607.838)	  126,542,894,264 ( 437.418)
branch-misses                 1,027,094,720 (   0.74%)	    1,687,254,447 (   1.33%)
jobs7                              perfstat         	                          
stalled-cycles-frontend     344,634,632,539 (  47.55%)	  524,620,607,090 (  55.98%)
stalled-cycles-backend      171,605,233,567 (  23.68%)	  326,480,253,386 (  34.84%)
instructions                817,561,790,625 (    1.13)	  789,953,093,271 (    0.84)
branches                    162,523,822,416 ( 623.700)	  160,686,217,996 ( 452.977)
branch-misses                 1,001,767,491 (   0.62%)	    1,896,930,415 (   1.18%)
jobs8                              perfstat         	                          
stalled-cycles-frontend     415,018,039,937 (  48.55%)	  640,148,604,571 (  56.58%)
stalled-cycles-backend      210,534,663,913 (  24.63%)	  405,646,634,128 (  35.85%)
instructions                938,099,496,074 (    1.10)	  944,604,889,411 (    0.83)
branches                    186,977,649,076 ( 607.420)	  198,183,413,437 ( 466.645)
branch-misses                 1,309,555,010 (   0.70%)	    2,119,164,279 (   1.07%)
jobs9                              perfstat         	                          
stalled-cycles-frontend     449,612,872,179 (  47.61%)	  844,651,795,120 (  57.11%)
stalled-cycles-backend      225,730,057,301 (  23.90%)	  541,881,702,721 (  36.64%)
instructions              1,056,378,279,974 (    1.12)	1,235,055,226,542 (    0.84)
branches                    210,682,445,933 ( 620.027)	  278,801,098,961 ( 503.482)
branch-misses                 1,284,610,267 (   0.61%)	    2,386,321,491 (   0.86%)
jobs10                             perfstat         	                          
stalled-cycles-frontend     523,925,463,468 (  48.64%)	  972,381,342,660 (  58.01%)
stalled-cycles-backend      269,122,542,565 (  24.99%)	  633,864,439,755 (  37.81%)
instructions              1,178,756,566,770 (    1.09)	1,370,037,474,489 (    0.82)
branches                    235,713,310,396 ( 607.558)	  309,456,445,052 ( 493.085)
branch-misses                 1,657,381,559 (   0.70%)	    2,732,046,780 (   0.88%)



                       OLD              NEW

seconds elapsed        33.875828126	92.385042584
seconds elapsed        35.248880307	97.051962536
seconds elapsed        38.719622551	96.216104080
seconds elapsed        39.759294197	102.058599765
seconds elapsed        51.040574490	124.109075314
seconds elapsed        61.531148007	146.364090962
seconds elapsed        69.264584324	166.700161114
seconds elapsed        79.817910029	185.053367327
seconds elapsed        88.781317384	229.905476947
seconds elapsed        99.912528127	262.960880001


	-ss

View attachment ".config" of type "text/plain" (92568 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ