[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20160929031831.GA1175@swordfish>
Date: Thu, 29 Sep 2016 12:18:31 +0900
From: Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>
To: Minchan Kim <minchan@...nel.org>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
Sergey Senozhatsky <sergey.senozhatsky@...il.com>,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH 2/3] zram: support page-based parallel write
Hello Minchan,
On (09/22/16 15:42), Minchan Kim wrote:
> zram supports stream-based parallel compression. IOW, it can support
> parallel compression on SMP system only if each cpus has streams.
> For example, assuming 4 CPU system, there are 4 sources for compressing
> in system and each source must be located in each CPUs for full
> parallel compression.
>
> So, if there is *one* stream in the system, it cannot be compressed
> in parallel although the system supports multiple CPUs. This patch
> aims to overcome such weakness.
>
> The idea is to use multiple background threads to compress pages
> in idle CPU and foreground just queues BIOs without interrupting
> while other CPUs consumes pages in BIO to compress.
> It means zram begins to support asynchronous writeback to increase
> write bandwidth.
>
> 1) test cp A to B as an example of single stream compression and
> enhanced 36%.
>
> x86_64, 4 CPU
> Copy kernel source to zram
> old: 3.4s, new: 2.2s
>
> 2) test per-process reclaim to swap: 524M
> x86_64, 4 CPU:
> old: 1.2s new: 0.3s
>
> 3) FIO benchamrk
> random read was worse so it supports only write at the moment.
> Later, We might revisit asynchronous read.
sorry for long reply.
frankly speaking, sorry, I'm very skeptical about the patch set.
from your tests it seems that only a tiny corner case can gain some
extra performance: when we have SMP system with multiple CPUs, but
*guaranteed* only one process doing *only* one type of requests.
as soon as this process starts to do things simultaneously (like
mixed READ-WRITE) _or_ there are several processes: we are done. and
for that tiny corner case we are about to add a complex logic and a
big pile of code. I'm quite sure I'll never enable CONFIG_ZRAM_ASYNC_IO.
why would you enable it? I mean what setups you are looking at that will
benefit? hosting a CVS repository? :) just kidding.
are there any block devices being specifically optimized for a "one
process doing one OP" cases?
my tests show a dramatic performance drop down with NEW zram.
even "one" process case (one fio job) is almost x3 slower.
somtimes WRITE test case even go from MB/s to KB/s
WRITE: 3181.4MB/s→ 948111KB/s
I've attached the .config
ENV
===
x86_64 SMP (4 CPUs), "bare zram" 2g, lzo, static compression buffer.
TEST COMMAND
============
ZRAM_SIZE=2G ZRAM_COMP_ALG=lzo LOG_SUFFIX={NEW, OLD} FIO_LOOPS=2 ./zram-fio-test.sh
EXECUTED TESTS
==============
- [seq-read]
- [rand-read]
- [seq-write]
- [rand-write]
- [mixed-seq]
- [mixed-rand]
RESULTS
=======
# ./fio-perf-o-meter.sh test-fio-zram-OLD test-fio-zram-NEW
Processing test-fio-zram-OLD
Processing test-fio-zram-NEW
OLD NEW
#jobs1
READ: 2345.1MB/s 2373.2MB/s
READ: 1948.2MB/s 1987.7MB/s
WRITE: 1292.7MB/s 275277KB/s
WRITE: 1047.5MB/s 257140KB/s
READ: 429530KB/s 175450KB/s
WRITE: 429840KB/s 175576KB/s
READ: 414074KB/s 164091KB/s
WRITE: 414402KB/s 164221KB/s
#jobs2
READ: 4484.7MB/s 4532.7MB/s
READ: 3705.7MB/s 3744.6MB/s
WRITE: 2170.7MB/s 492404KB/s
WRITE: 1864.4MB/s 470723KB/s
READ: 829949KB/s 340146KB/s
WRITE: 830065KB/s 340194KB/s
READ: 805639KB/s 336380KB/s
WRITE: 807140KB/s 337006KB/s
#jobs3
READ: 5920.1MB/s 6025.6MB/s
READ: 4845.5MB/s 5037.5MB/s
WRITE: 2956.3MB/s 777683KB/s
WRITE: 2525.7MB/s 727868KB/s
READ: 1083.6MB/s 507481KB/s
WRITE: 1085.1MB/s 508634KB/s
READ: 1114.2MB/s 493014KB/s
WRITE: 1114.7MB/s 492849KB/s
#jobs4
READ: 7819.3MB/s 7897.2MB/s
READ: 6445.4MB/s 6604.7MB/s
WRITE: 3737.3MB/s 1002.6MB/s
WRITE: 3232.5MB/s 974777KB/s
READ: 1447.1MB/s 592012KB/s
WRITE: 1448.5MB/s 592205KB/s
READ: 1427.5MB/s 569307KB/s
WRITE: 1428.1MB/s 569881KB/s
#jobs5
READ: 7201.2MB/s 7560.1MB/s
READ: 5710.4MB/s 6078.4MB/s
WRITE: 3635.1MB/s 989502KB/s
WRITE: 3131.6MB/s 949969KB/s
READ: 1428.4MB/s 650856KB/s
WRITE: 1429.7MB/s 651182KB/s
READ: 1413.9MB/s 644587KB/s
WRITE: 1412.6MB/s 644328KB/s
#jobs6
READ: 7252.5MB/s 7248.2MB/s
READ: 6150.2MB/s 6396.7MB/s
WRITE: 3583.3MB/s 954890KB/s
WRITE: 2994.2MB/s 921172KB/s
READ: 1444.6MB/s 768636KB/s
WRITE: 1445.7MB/s 769178KB/s
READ: 1350.3MB/s 652676KB/s
WRITE: 1349.3MB/s 652063KB/s
#jobs7
READ: 7681.4MB/s 7579.9MB/s
READ: 6018.5MB/s 6247.6MB/s
WRITE: 3819.3MB/s 978.54MB/s
WRITE: 3143.7MB/s 962585KB/s
READ: 1473.1MB/s 815388KB/s
WRITE: 1473.9MB/s 814944KB/s
READ: 1389.6MB/s 610843KB/s
WRITE: 1388.9MB/s 610764KB/s
#jobs8
READ: 7658.5MB/s 7818.4MB/s
READ: 6047.2MB/s 6021.4MB/s
WRITE: 3690.7MB/s 1059.6MB/s
WRITE: 3092.7MB/s 1024.6MB/s
READ: 1435.4MB/s 826314KB/s
WRITE: 1435.2MB/s 826125KB/s
READ: 1426.7MB/s 569216KB/s
WRITE: 1428.1MB/s 569817KB/s
#jobs9
READ: 7642.9MB/s 7982.7MB/s
READ: 5941.4MB/s 6293.7MB/s
WRITE: 3790.6MB/s 1050.2MB/s
WRITE: 3181.4MB/s 948111KB/s
READ: 1430.8MB/s 758947KB/s
WRITE: 1431.4MB/s 759260KB/s
READ: 1420.8MB/s 449894KB/s
WRITE: 1420.2MB/s 449912KB/s
#jobs10
READ: 7552.6MB/s 7853.8MB/s
READ: 5979.6MB/s 6049.3MB/s
WRITE: 3690.8MB/s 985210KB/s
WRITE: 3047.2MB/s 971323KB/s
READ: 1466.7MB/s 750863KB/s
WRITE: 1467.6MB/s 751322KB/s
READ: 1390.7MB/s 431071KB/s
WRITE: 1391.4MB/s 431267KB/s
OLD NEW
jobs1 perfstat
stalled-cycles-frontend 42,179,294,111 ( 42.27%) 69,980,596,543 ( 54.76%)
stalled-cycles-backend 20,291,324,679 ( 20.33%) 42,209,369,439 ( 33.03%)
instructions 115,949,023,077 ( 1.16) 108,226,927,382 ( 0.85)
branches 22,915,506,669 ( 726.915) 20,930,779,988 ( 455.148)
branch-misses 157,490,582 ( 0.69%) 393,100,266 ( 1.88%)
jobs2 perfstat
stalled-cycles-frontend 99,808,718,071 ( 47.20%) 138,353,381,157 ( 54.46%)
stalled-cycles-backend 50,740,071,798 ( 23.99%) 86,000,378,224 ( 33.85%)
instructions 231,953,824,813 ( 1.10) 215,166,725,962 ( 0.85)
branches 45,819,311,222 ( 683.280) 41,765,459,724 ( 417.576)
branch-misses 367,871,064 ( 0.80%) 793,989,808 ( 1.90%)
jobs3 perfstat
stalled-cycles-frontend 143,472,445,917 ( 46.50%) 207,584,197,915 ( 54.61%)
stalled-cycles-backend 70,928,315,293 ( 22.99%) 126,440,378,366 ( 33.26%)
instructions 348,003,016,792 ( 1.13) 320,968,072,847 ( 0.84)
branches 68,787,283,790 ( 619.178) 62,580,295,200 ( 411.530)
branch-misses 449,811,959 ( 0.65%) 1,113,447,333 ( 1.78%)
jobs4 perfstat
stalled-cycles-frontend 201,950,202,659 ( 47.96%) 278,741,702,134 ( 55.39%)
stalled-cycles-backend 101,955,523,018 ( 24.21%) 171,537,536,649 ( 34.08%)
instructions 463,875,933,843 ( 1.10) 418,163,782,630 ( 0.83)
branches 91,720,839,796 ( 604.464) 81,267,313,414 ( 416.350)
branch-misses 701,009,770 ( 0.76%) 1,328,101,057 ( 1.63%)
jobs5 perfstat
stalled-cycles-frontend 244,426,118,305 ( 47.17%) 338,770,490,424 ( 55.06%)
stalled-cycles-backend 121,688,433,877 ( 23.48%) 206,537,478,646 ( 33.57%)
instructions 580,617,471,008 ( 1.12) 518,727,560,729 ( 0.84)
branches 114,998,494,737 ( 619.217) 100,587,852,486 ( 424.034)
branch-misses 755,197,302 ( 0.66%) 1,486,131,250 ( 1.48%)
jobs6 perfstat
stalled-cycles-frontend 306,426,786,501 ( 48.24%) 418,675,686,722 ( 55.28%)
stalled-cycles-backend 155,564,868,859 ( 24.49%) 261,774,760,749 ( 34.57%)
instructions 698,910,704,460 ( 1.10) 640,996,755,296 ( 0.85)
branches 138,734,721,168 ( 607.838) 126,542,894,264 ( 437.418)
branch-misses 1,027,094,720 ( 0.74%) 1,687,254,447 ( 1.33%)
jobs7 perfstat
stalled-cycles-frontend 344,634,632,539 ( 47.55%) 524,620,607,090 ( 55.98%)
stalled-cycles-backend 171,605,233,567 ( 23.68%) 326,480,253,386 ( 34.84%)
instructions 817,561,790,625 ( 1.13) 789,953,093,271 ( 0.84)
branches 162,523,822,416 ( 623.700) 160,686,217,996 ( 452.977)
branch-misses 1,001,767,491 ( 0.62%) 1,896,930,415 ( 1.18%)
jobs8 perfstat
stalled-cycles-frontend 415,018,039,937 ( 48.55%) 640,148,604,571 ( 56.58%)
stalled-cycles-backend 210,534,663,913 ( 24.63%) 405,646,634,128 ( 35.85%)
instructions 938,099,496,074 ( 1.10) 944,604,889,411 ( 0.83)
branches 186,977,649,076 ( 607.420) 198,183,413,437 ( 466.645)
branch-misses 1,309,555,010 ( 0.70%) 2,119,164,279 ( 1.07%)
jobs9 perfstat
stalled-cycles-frontend 449,612,872,179 ( 47.61%) 844,651,795,120 ( 57.11%)
stalled-cycles-backend 225,730,057,301 ( 23.90%) 541,881,702,721 ( 36.64%)
instructions 1,056,378,279,974 ( 1.12) 1,235,055,226,542 ( 0.84)
branches 210,682,445,933 ( 620.027) 278,801,098,961 ( 503.482)
branch-misses 1,284,610,267 ( 0.61%) 2,386,321,491 ( 0.86%)
jobs10 perfstat
stalled-cycles-frontend 523,925,463,468 ( 48.64%) 972,381,342,660 ( 58.01%)
stalled-cycles-backend 269,122,542,565 ( 24.99%) 633,864,439,755 ( 37.81%)
instructions 1,178,756,566,770 ( 1.09) 1,370,037,474,489 ( 0.82)
branches 235,713,310,396 ( 607.558) 309,456,445,052 ( 493.085)
branch-misses 1,657,381,559 ( 0.70%) 2,732,046,780 ( 0.88%)
OLD NEW
seconds elapsed 33.875828126 92.385042584
seconds elapsed 35.248880307 97.051962536
seconds elapsed 38.719622551 96.216104080
seconds elapsed 39.759294197 102.058599765
seconds elapsed 51.040574490 124.109075314
seconds elapsed 61.531148007 146.364090962
seconds elapsed 69.264584324 166.700161114
seconds elapsed 79.817910029 185.053367327
seconds elapsed 88.781317384 229.905476947
seconds elapsed 99.912528127 262.960880001
-ss
View attachment ".config" of type "text/plain" (92568 bytes)
Powered by blists - more mailing lists