lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20160324234149.GB4046@bbox>
Date:	Fri, 25 Mar 2016 08:41:49 +0900
From:	Minchan Kim <minchan@...nel.org>
To:	Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>
CC:	Sergey Senozhatsky <sergey.senozhatsky@...il.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	linux-kernel@...r.kernel.org
Subject: Re: zram: per-cpu compression streams

Hi Sergey,

On Wed, Mar 23, 2016 at 05:18:27PM +0900, Sergey Senozhatsky wrote:
>  ( was "[PATCH] zram: export the number of available comp streams"
>    forked from http://marc.info/?l=linux-kernel&m=145860707516861 )
> 
> d'oh.... sorry, now actually forked.
> 
> 
>  Hello Minchan,
> 
>  forked into a separate tread.
> 
> > On (03/22/16 09:39), Minchan Kim wrote:
> > >   zram_bvec_write()
> > >   {
> > >   	*get_cpu_ptr(comp-stream);
> > >   	 zcomp_compress();
> > >   	 zs_malloc()
> > >   	put_cpu_ptr(comp-stream);
> > >   }
> > >   
> > >   this, however, makes zsmalloc unhapy. pool has GFP_NOIO | __GFP_HIGHMEM
> > >   gfp, and GFP_NOIO is ___GFP_DIRECT_RECLAIM|___GFP_KSWAPD_RECLAIM. this
> > >   __GFP_DIRECT_RECLAIM is in the conflict with per-cpu streams, because
> > >   per-cpu streams require disabled preemption (up until we copy stream
> > >   buffer to zspage). so what options do we have here... from the top of
> > >   my head (w/o a lot of thinking)...
> >  
> >  Indeed.
> ...
> >  How about this?
> >  
> >  zram_bvec_write()
> >  {
> >  retry:
> >          *get_cpu_ptr(comp-stream);
> >          zcomp_compress();
> >          handle = zs_malloc((gfp &~ __GFP_DIRECT_RECLAIM| | GFP_NOWARN)
> >          if (!handle) {
> >                  put_cpu_ptr(comp-stream);
> >                  handle  = zs_malloc(gfp);
> >                  goto retry;
> >          }
> >          put_cpu_ptr(comp-stream);
> >  }
> 
>  interesting. the retry jump should go higher, we have "user_mem = kmap_atomic(page)"
>  which we unmap right after compression, because a) we don't need
>  uncompressed memory anymore b) zs_malloc() can sleep and we can't have atomic
>  mapping around. the nasty thing here is is_partial_io(). we need to re-do
>  
>  	if (is_partial_io(bvec))
>  		memcpy(uncmem + offset, user_mem + bvec-bv_offset,
>  			bvec-bv_len);
>  
>  once again in the worst case.
>  
>  so zs_malloc((gfp &~ __GFP_DIRECT_RECLAIM | GFP_NOWARN) so far can cause
>  double memcpy() and double compression. just to outline this.
>  
>  
>  the test.
>  
>  I executed a number of iozone tests, on each iteration re-creating zram
>  device (3GB, LZO, EXT4. the box has 4 x86_64 CPUs).
>  
>  $DEVICE_SZ=3G
>  $FREE_SPACE is 10% of $DEVICE_SZ
>  time ./iozone -t $i -R -r $((8*$i))K -s $((($DEVICE_SZ/$i - $FREE_SPACE)/(1024*1024)))M -I +Z
>  
>  
>  columns:
>  
>         TEST           MAX_STREAMS 4   MAX_STREAMS 8  PER_CPU STREAMS
>  ====================================================================
>  
>  Test #1 iozone -t 1 -R -r 8K -s 2764M -I +Z
>    Initial write         853492.31*      835868.50       839789.56
>          Rewrite        1642073.88      1657255.75      1693011.50*
>             Read        3384044.00*     3218727.25      3269109.50
>          Re-read        3389794.50*     3243187.00      3267422.25
>     Reverse Read        3209805.75*     3082040.00      3107957.25
>      Stride read        3100144.50*     2972280.25      2923155.25
>      Random read        2992249.75*     2874605.00      2854824.25
>   Mixed workload        2992274.75*     2878212.25      2883840.00
>     Random write        1471800.00      1452346.50      1515678.75*
>           Pwrite         802083.00       801627.31       820251.69*
>            Pread        3443495.00*     3308659.25      3302089.00
>           Fwrite        1880446.88      1838607.50      1909490.00*
>            Fread        3479614.75      3091634.75      6442964.50*
>  =          real          1m4.170s        1m4.513s        1m4.123s
>  =          user          0m0.559s        0m0.518s        0m0.511s
>  =           sys         0m18.766s       0m19.264s       0m18.641s
>  
>  
>  Test #2 iozone -t 2 -R -r 16K -s 1228M -I +Z
>    Initial write        2102532.12      2051809.19      2419072.50*
>          Rewrite        2217024.25      2250930.00      3681559.00*
>             Read        7716933.25      7898759.00      8345507.75*
>          Re-read        7748487.75      7765282.25      8342367.50*
>     Reverse Read        7415254.25      7552637.25      7822691.75*
>      Stride read        7041909.50      7091049.25      7401273.00*
>      Random read        6205044.25      6738888.50      7232104.25*
>   Mixed workload        4582990.00      5271651.50      5361002.88*
>     Random write        2591893.62      2513729.88      3660774.38*
>           Pwrite        1873876.75      1909758.69      2087238.81*
>            Pread        4669850.00      4651121.56      4919588.44*
>           Fwrite        1937947.25      1940628.06      2034251.25*
>            Fread        9930319.00      9970078.00*     9831422.50
>  =          real         0m53.844s       0m53.607s       0m52.528s
>  =          user          0m0.273s        0m0.289s        0m0.280s
>  =           sys         0m16.595s       0m16.478s       0m14.072s
>  
>  
>  Test #3 iozone -t 3 -R -r 24K -s 716M -I +Z
>    Initial write        3036567.50      2998918.25      3683853.00*
>          Rewrite        3402447.88      3415685.88      5054705.38*
>             Read       11767413.00*    11133789.50     11246497.25
>          Re-read       11797680.50*    11092592.00     11277382.00
>     Reverse Read       10828320.00*    10157665.50     10749055.00
>      Stride read       10532039.50*     9943521.75     10464700.25
>      Random read       10380365.75*     9807859.25     10234127.00
>   Mixed workload        8772132.50*     8415083.50      8457108.50
>     Random write        3364875.00      3310042.00      5059136.38*
>           Pwrite        2677290.25      2651309.50      3198166.25*
>            Pread        5221799.56*     4963050.69      4987293.78
>           Fwrite        2026887.56      2047679.00      2124199.62*
>            Fread       11310381.25     11413531.50     11444208.75*
>  =          real         0m50.209s       0m50.782s       0m49.750s
>  =          user          0m0.195s        0m0.205s        0m0.215s
>  =           sys         0m14.873s       0m15.159s       0m12.911s
>  
>  
>  Test #4 iozone -t 4 -R -r 32K -s 460M -I +Z
>    Initial write        3841474.94      3859279.81      5309988.88*
>          Rewrite        3905526.25      3917309.62      6814800.62*
>             Read       16233054.50     14843560.25     16352283.75*
>          Re-read       16335506.50     15529152.25     16352570.00*
>     Reverse Read       15316394.50*    14225482.50     15004897.50
>      Stride read       14799380.25*    14064034.25     14355184.25
>      Random read       14683771.00     14206928.50     14814913.00*
>   Mixed workload        9058851.50      9180650.75     10815917.50*
>     Random write        3990585.94      4004757.00      6722088.50*
>           Pwrite        3318836.12      3468977.69      4244747.69*
>            Pread        5894538.16*     5588046.38      5847345.62
>           Fwrite        2227353.75      2186688.62      2386974.88*
>            Fread       12046094.00     12240004.75*    12073956.75
>  =          real         0m48.561s       0m48.839s       0m48.142s
>  =          user          0m0.155s        0m0.170s        0m0.133s
>  =           sys         0m13.650s       0m13.684s       0m10.790s
>  
>  
>  Test #5 iozone -t 5 -R -r 40K -s 307M -I +Z
>    Initial write        4034878.94      4026610.69      5775746.12*
>          Rewrite        3898600.44      3901114.16      6923764.19*
>             Read       14947360.88     16698824.25*    10155333.62
>          Re-read       15844580.75*    15344057.00      9869874.38
>     Reverse Read        7459156.95      9023317.86*     7648295.03
>      Stride read       10823891.81      9615553.81     11231183.72*
>      Random read       10391702.56*     9740935.75     10048038.28
>   Mixed workload        8261830.94     10175925.00*     7535763.75
>     Random write        3951423.31      3960984.62      6671441.38*
>           Pwrite        4119023.12      4097204.56      5975659.12*
>            Pread        6072076.73*     4338668.50      6020808.34
>           Fwrite        2417235.47      2337875.88      2665450.62*
>            Fread       13393630.25     13648332.00*    13395391.00
>  =          real         0m47.756s       0m47.939s       0m47.483s
>  =          user          0m0.128s        0m0.128s        0m0.119s
>  =           sys         0m10.361s       0m10.392s        0m8.717s
>  
>  
>  Test #6 iozone -t 6 -R -r 48K -s 204M -I +Z
>    Initial write        4134932.97      4137171.88      5983193.31*
>          Rewrite        3928131.31      3950764.00      7124248.00*
>             Read       10965005.75*    10152236.50      9856572.88
>          Re-read        9386946.00     10776231.38     14303174.12*
>     Reverse Read        6035244.89      7456152.38*     5999446.38
>      Stride read        8041000.75      7995307.75     10182936.75*
>      Random read        8565099.09     10487707.58*     8694877.25
>   Mixed workload        5301593.06      7332589.09*     6802251.06
>     Random write        4046482.56      3986854.94      6723824.56*
>           Pwrite        4188226.41      4214513.34      6245278.44*
>            Pread        3452596.86      3708694.69*     3486420.41
>           Fwrite        2829500.22      3030742.72      3033792.28*
>            Fread       13331387.75     13490416.50     14940410.25*
>  =          real         0m47.150s       0m47.050s       0m47.044s
>  =          user          0m0.106s        0m0.100s        0m0.094s
>  =           sys          0m9.238s        0m8.804s        0m6.930s
>  
>  
>  Test #7 iozone -t 7 -R -r 56K -s 131M -I +Z
>    Initial write        4169480.84      4116331.03      5946801.38*
>          Rewrite        3993155.97      3986195.00      6928142.44*
>             Read       18901600.25*    10088918.69      6699592.78
>          Re-read        8738544.69     14881309.62*    13960026.06
>     Reverse Read        5008919.08      7923949.95*     5495212.41
>      Stride read        7029436.75      8747574.91*     6477087.25
>      Random read        6994738.56*     5448687.81      6585235.53
>   Mixed workload        5178632.44      5258914.92      5587421.81*
>     Random write        4008977.78      3928116.88      6816453.12*
>           Pwrite        4342852.09      4154319.09      6124520.06*
>            Pread        3880318.99      2978587.56      4493903.14*
>           Fwrite        5557990.03      2923556.59      6126649.94*
>            Fread       14451722.00     15281179.62*    14675436.50
>  =          real         0m46.321s       0m46.458s       0m45.791s
>  =          user          0m0.093s        0m0.089s        0m0.095s
>  =           sys          0m6.961s        0m6.600s        0m5.499s
>  
>  
>  Test #8 iozone -t 8 -R -r 64K -s 76M -I +Z
>    Initial write        4354783.88      4392731.31      6337397.50*
>          Rewrite        4070162.69      3974051.50      7587279.81*
>             Read       10095324.56     17945227.88*     8359665.56
>          Re-read       12316555.88     20468303.75*     7949999.34
>     Reverse Read        4924659.84      8542573.33*     6388858.72
>      Stride read       10895715.69     14828968.38*     6107484.81
>      Random read        6838537.34     14352104.25*     5389174.97
>   Mixed workload        5805646.75      8391745.53*     6052748.25
>     Random write        4148973.38      3890847.38      7247214.19*
>           Pwrite        4309372.41      4423800.34      6863604.69*
>            Pread        4875766.02*     4042375.33      3692948.91
>           Fwrite        6102404.31      6021884.41      6634112.09*
>            Fread       15485971.12*    14900780.62     13981842.50
>  =          real         0m45.618s       0m45.753s       0m45.619s
>  =          user          0m0.071s        0m0.080s        0m0.060s
>  =           sys          0m4.702s        0m4.430s        0m3.555s
>  
>  
>  Test #9 iozone -t 9 -R -r 72K -s 34M -I +Z
>    Initial write        4202354.67      4208936.34      6300798.88*
>          Rewrite        4046855.38      4294137.50      7623323.69*
>             Read       10926571.88     13304801.81*    10895587.19
>          Re-read       17725984.94*     7964431.25     12394078.50
>     Reverse Read        5843121.72      5851846.66*     4075657.20
>      Stride read        9688998.59     10306234.70*     5566376.62
>      Random read        7656689.97      8660602.06*     5437182.36
>   Mixed workload        6229215.62     11205238.73*     5575719.75
>     Random write        4094822.22      4517401.86      6601624.94*
>           Pwrite        4274497.50      4263936.64      6844453.11*
>            Pread        6525075.62*     6043725.62      5745003.28
>           Fwrite        5958798.56      8430354.78*     7636085.00
>            Fread       18636725.12*    17268959.12     16618803.62
>  =          real         0m44.945s       0m44.816s       0m45.194s
>  =          user          0m0.062s        0m0.060s        0m0.060s
>  =           sys          0m2.187s        0m2.223s        0m1.888s
>  
>  
>  Test #10 iozone -t 10 -R -r 80K -s 0M -I +Z
>    Initial write        3213973.56      2731512.62      4416466.25*
>          Rewrite        3066956.44*     2693819.50       332671.94
>             Read        7769523.25*     2681473.75       462840.44
>          Re-read        5244861.75      5473037.00*      382183.03
>     Reverse Read        7479397.25*     4869597.75       374714.06
>      Stride read        5403282.50*     5385083.75       382473.44
>      Random read        5131997.25      5176799.75*      380593.56
>   Mixed workload        3998043.25      4219049.00*     1645850.45
>     Random write        3452832.88      3290861.69      3588531.75*
>           Pwrite        3757435.81      2711756.47      4561807.88*
>            Pread        2743595.25*     2635835.00       412947.98
>           Fwrite       16076549.00     16741977.25*    14797209.38
>            Fread       23581812.62*    21664184.25      5064296.97
>  =          real         0m44.490s       0m44.444s       0m44.609s
>  =          user          0m0.054s        0m0.049s        0m0.055s
>  =           sys          0m0.037s        0m0.046s        0m0.148s
>  
>  
>  so when the number of active tasks become larger than the number
>  of online CPUS, iozone reports a bit hard to understand data. I
>  can assume that since now we keep the preemption disabled longer
>  in write path, a concurrent operation (READ or WRITE) cannot preempt
>  current anymore... slightly suspicious.
>  
>  the other hard to understand thing is why do READ-only tests have
>  such a huge jitter. READ-only tests don't depend on streams, they
>  don't even use them, we supply compressed data directly to
>  decompression api.
>  
>  may be better retire iozone and never use it again.
>  
>  
>  "118 insertions(+), 238 deletions(-)" the patches remove a big
>  pile of code.

First of all, I appreciate you very much!
At a glance, on write workload, huge win but worth to investigate
how such fluctuation/regression happens on read-related test
(read and mixed workload).

Could you send your patchset? I will test it.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ