lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20160323081209.GC512@swordfish>
Date:	Wed, 23 Mar 2016 17:18:27 +0900
From:	Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>
To:	Minchan Kim <minchan@...nel.org>
Cc:	Sergey Senozhatsky <sergey.senozhatsky@...il.com>,
	Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	linux-kernel@...r.kernel.org
Subject: Re: zram: per-cpu compression streams

 ( was "[PATCH] zram: export the number of available comp streams"
   forked from http://marc.info/?l=linux-kernel&m=145860707516861 )

d'oh.... sorry, now actually forked.


 Hello Minchan,

 forked into a separate tread.

> On (03/22/16 09:39), Minchan Kim wrote:
> >   zram_bvec_write()
> >   {
> >   	*get_cpu_ptr(comp-stream);
> >   	 zcomp_compress();
> >   	 zs_malloc()
> >   	put_cpu_ptr(comp-stream);
> >   }
> >   
> >   this, however, makes zsmalloc unhapy. pool has GFP_NOIO | __GFP_HIGHMEM
> >   gfp, and GFP_NOIO is ___GFP_DIRECT_RECLAIM|___GFP_KSWAPD_RECLAIM. this
> >   __GFP_DIRECT_RECLAIM is in the conflict with per-cpu streams, because
> >   per-cpu streams require disabled preemption (up until we copy stream
> >   buffer to zspage). so what options do we have here... from the top of
> >   my head (w/o a lot of thinking)...
>  
>  Indeed.
...
>  How about this?
>  
>  zram_bvec_write()
>  {
>  retry:
>          *get_cpu_ptr(comp-stream);
>          zcomp_compress();
>          handle = zs_malloc((gfp &~ __GFP_DIRECT_RECLAIM| | GFP_NOWARN)
>          if (!handle) {
>                  put_cpu_ptr(comp-stream);
>                  handle  = zs_malloc(gfp);
>                  goto retry;
>          }
>          put_cpu_ptr(comp-stream);
>  }

 interesting. the retry jump should go higher, we have "user_mem = kmap_atomic(page)"
 which we unmap right after compression, because a) we don't need
 uncompressed memory anymore b) zs_malloc() can sleep and we can't have atomic
 mapping around. the nasty thing here is is_partial_io(). we need to re-do
 
 	if (is_partial_io(bvec))
 		memcpy(uncmem + offset, user_mem + bvec-bv_offset,
 			bvec-bv_len);
 
 once again in the worst case.
 
 so zs_malloc((gfp &~ __GFP_DIRECT_RECLAIM | GFP_NOWARN) so far can cause
 double memcpy() and double compression. just to outline this.
 
 
 the test.
 
 I executed a number of iozone tests, on each iteration re-creating zram
 device (3GB, LZO, EXT4. the box has 4 x86_64 CPUs).
 
 $DEVICE_SZ=3G
 $FREE_SPACE is 10% of $DEVICE_SZ
 time ./iozone -t $i -R -r $((8*$i))K -s $((($DEVICE_SZ/$i - $FREE_SPACE)/(1024*1024)))M -I +Z
 
 
 columns:
 
        TEST           MAX_STREAMS 4   MAX_STREAMS 8  PER_CPU STREAMS
 ====================================================================
 
 Test #1 iozone -t 1 -R -r 8K -s 2764M -I +Z
   Initial write         853492.31*      835868.50       839789.56
         Rewrite        1642073.88      1657255.75      1693011.50*
            Read        3384044.00*     3218727.25      3269109.50
         Re-read        3389794.50*     3243187.00      3267422.25
    Reverse Read        3209805.75*     3082040.00      3107957.25
     Stride read        3100144.50*     2972280.25      2923155.25
     Random read        2992249.75*     2874605.00      2854824.25
  Mixed workload        2992274.75*     2878212.25      2883840.00
    Random write        1471800.00      1452346.50      1515678.75*
          Pwrite         802083.00       801627.31       820251.69*
           Pread        3443495.00*     3308659.25      3302089.00
          Fwrite        1880446.88      1838607.50      1909490.00*
           Fread        3479614.75      3091634.75      6442964.50*
 =          real          1m4.170s        1m4.513s        1m4.123s
 =          user          0m0.559s        0m0.518s        0m0.511s
 =           sys         0m18.766s       0m19.264s       0m18.641s
 
 
 Test #2 iozone -t 2 -R -r 16K -s 1228M -I +Z
   Initial write        2102532.12      2051809.19      2419072.50*
         Rewrite        2217024.25      2250930.00      3681559.00*
            Read        7716933.25      7898759.00      8345507.75*
         Re-read        7748487.75      7765282.25      8342367.50*
    Reverse Read        7415254.25      7552637.25      7822691.75*
     Stride read        7041909.50      7091049.25      7401273.00*
     Random read        6205044.25      6738888.50      7232104.25*
  Mixed workload        4582990.00      5271651.50      5361002.88*
    Random write        2591893.62      2513729.88      3660774.38*
          Pwrite        1873876.75      1909758.69      2087238.81*
           Pread        4669850.00      4651121.56      4919588.44*
          Fwrite        1937947.25      1940628.06      2034251.25*
           Fread        9930319.00      9970078.00*     9831422.50
 =          real         0m53.844s       0m53.607s       0m52.528s
 =          user          0m0.273s        0m0.289s        0m0.280s
 =           sys         0m16.595s       0m16.478s       0m14.072s
 
 
 Test #3 iozone -t 3 -R -r 24K -s 716M -I +Z
   Initial write        3036567.50      2998918.25      3683853.00*
         Rewrite        3402447.88      3415685.88      5054705.38*
            Read       11767413.00*    11133789.50     11246497.25
         Re-read       11797680.50*    11092592.00     11277382.00
    Reverse Read       10828320.00*    10157665.50     10749055.00
     Stride read       10532039.50*     9943521.75     10464700.25
     Random read       10380365.75*     9807859.25     10234127.00
  Mixed workload        8772132.50*     8415083.50      8457108.50
    Random write        3364875.00      3310042.00      5059136.38*
          Pwrite        2677290.25      2651309.50      3198166.25*
           Pread        5221799.56*     4963050.69      4987293.78
          Fwrite        2026887.56      2047679.00      2124199.62*
           Fread       11310381.25     11413531.50     11444208.75*
 =          real         0m50.209s       0m50.782s       0m49.750s
 =          user          0m0.195s        0m0.205s        0m0.215s
 =           sys         0m14.873s       0m15.159s       0m12.911s
 
 
 Test #4 iozone -t 4 -R -r 32K -s 460M -I +Z
   Initial write        3841474.94      3859279.81      5309988.88*
         Rewrite        3905526.25      3917309.62      6814800.62*
            Read       16233054.50     14843560.25     16352283.75*
         Re-read       16335506.50     15529152.25     16352570.00*
    Reverse Read       15316394.50*    14225482.50     15004897.50
     Stride read       14799380.25*    14064034.25     14355184.25
     Random read       14683771.00     14206928.50     14814913.00*
  Mixed workload        9058851.50      9180650.75     10815917.50*
    Random write        3990585.94      4004757.00      6722088.50*
          Pwrite        3318836.12      3468977.69      4244747.69*
           Pread        5894538.16*     5588046.38      5847345.62
          Fwrite        2227353.75      2186688.62      2386974.88*
           Fread       12046094.00     12240004.75*    12073956.75
 =          real         0m48.561s       0m48.839s       0m48.142s
 =          user          0m0.155s        0m0.170s        0m0.133s
 =           sys         0m13.650s       0m13.684s       0m10.790s
 
 
 Test #5 iozone -t 5 -R -r 40K -s 307M -I +Z
   Initial write        4034878.94      4026610.69      5775746.12*
         Rewrite        3898600.44      3901114.16      6923764.19*
            Read       14947360.88     16698824.25*    10155333.62
         Re-read       15844580.75*    15344057.00      9869874.38
    Reverse Read        7459156.95      9023317.86*     7648295.03
     Stride read       10823891.81      9615553.81     11231183.72*
     Random read       10391702.56*     9740935.75     10048038.28
  Mixed workload        8261830.94     10175925.00*     7535763.75
    Random write        3951423.31      3960984.62      6671441.38*
          Pwrite        4119023.12      4097204.56      5975659.12*
           Pread        6072076.73*     4338668.50      6020808.34
          Fwrite        2417235.47      2337875.88      2665450.62*
           Fread       13393630.25     13648332.00*    13395391.00
 =          real         0m47.756s       0m47.939s       0m47.483s
 =          user          0m0.128s        0m0.128s        0m0.119s
 =           sys         0m10.361s       0m10.392s        0m8.717s
 
 
 Test #6 iozone -t 6 -R -r 48K -s 204M -I +Z
   Initial write        4134932.97      4137171.88      5983193.31*
         Rewrite        3928131.31      3950764.00      7124248.00*
            Read       10965005.75*    10152236.50      9856572.88
         Re-read        9386946.00     10776231.38     14303174.12*
    Reverse Read        6035244.89      7456152.38*     5999446.38
     Stride read        8041000.75      7995307.75     10182936.75*
     Random read        8565099.09     10487707.58*     8694877.25
  Mixed workload        5301593.06      7332589.09*     6802251.06
    Random write        4046482.56      3986854.94      6723824.56*
          Pwrite        4188226.41      4214513.34      6245278.44*
           Pread        3452596.86      3708694.69*     3486420.41
          Fwrite        2829500.22      3030742.72      3033792.28*
           Fread       13331387.75     13490416.50     14940410.25*
 =          real         0m47.150s       0m47.050s       0m47.044s
 =          user          0m0.106s        0m0.100s        0m0.094s
 =           sys          0m9.238s        0m8.804s        0m6.930s
 
 
 Test #7 iozone -t 7 -R -r 56K -s 131M -I +Z
   Initial write        4169480.84      4116331.03      5946801.38*
         Rewrite        3993155.97      3986195.00      6928142.44*
            Read       18901600.25*    10088918.69      6699592.78
         Re-read        8738544.69     14881309.62*    13960026.06
    Reverse Read        5008919.08      7923949.95*     5495212.41
     Stride read        7029436.75      8747574.91*     6477087.25
     Random read        6994738.56*     5448687.81      6585235.53
  Mixed workload        5178632.44      5258914.92      5587421.81*
    Random write        4008977.78      3928116.88      6816453.12*
          Pwrite        4342852.09      4154319.09      6124520.06*
           Pread        3880318.99      2978587.56      4493903.14*
          Fwrite        5557990.03      2923556.59      6126649.94*
           Fread       14451722.00     15281179.62*    14675436.50
 =          real         0m46.321s       0m46.458s       0m45.791s
 =          user          0m0.093s        0m0.089s        0m0.095s
 =           sys          0m6.961s        0m6.600s        0m5.499s
 
 
 Test #8 iozone -t 8 -R -r 64K -s 76M -I +Z
   Initial write        4354783.88      4392731.31      6337397.50*
         Rewrite        4070162.69      3974051.50      7587279.81*
            Read       10095324.56     17945227.88*     8359665.56
         Re-read       12316555.88     20468303.75*     7949999.34
    Reverse Read        4924659.84      8542573.33*     6388858.72
     Stride read       10895715.69     14828968.38*     6107484.81
     Random read        6838537.34     14352104.25*     5389174.97
  Mixed workload        5805646.75      8391745.53*     6052748.25
    Random write        4148973.38      3890847.38      7247214.19*
          Pwrite        4309372.41      4423800.34      6863604.69*
           Pread        4875766.02*     4042375.33      3692948.91
          Fwrite        6102404.31      6021884.41      6634112.09*
           Fread       15485971.12*    14900780.62     13981842.50
 =          real         0m45.618s       0m45.753s       0m45.619s
 =          user          0m0.071s        0m0.080s        0m0.060s
 =           sys          0m4.702s        0m4.430s        0m3.555s
 
 
 Test #9 iozone -t 9 -R -r 72K -s 34M -I +Z
   Initial write        4202354.67      4208936.34      6300798.88*
         Rewrite        4046855.38      4294137.50      7623323.69*
            Read       10926571.88     13304801.81*    10895587.19
         Re-read       17725984.94*     7964431.25     12394078.50
    Reverse Read        5843121.72      5851846.66*     4075657.20
     Stride read        9688998.59     10306234.70*     5566376.62
     Random read        7656689.97      8660602.06*     5437182.36
  Mixed workload        6229215.62     11205238.73*     5575719.75
    Random write        4094822.22      4517401.86      6601624.94*
          Pwrite        4274497.50      4263936.64      6844453.11*
           Pread        6525075.62*     6043725.62      5745003.28
          Fwrite        5958798.56      8430354.78*     7636085.00
           Fread       18636725.12*    17268959.12     16618803.62
 =          real         0m44.945s       0m44.816s       0m45.194s
 =          user          0m0.062s        0m0.060s        0m0.060s
 =           sys          0m2.187s        0m2.223s        0m1.888s
 
 
 Test #10 iozone -t 10 -R -r 80K -s 0M -I +Z
   Initial write        3213973.56      2731512.62      4416466.25*
         Rewrite        3066956.44*     2693819.50       332671.94
            Read        7769523.25*     2681473.75       462840.44
         Re-read        5244861.75      5473037.00*      382183.03
    Reverse Read        7479397.25*     4869597.75       374714.06
     Stride read        5403282.50*     5385083.75       382473.44
     Random read        5131997.25      5176799.75*      380593.56
  Mixed workload        3998043.25      4219049.00*     1645850.45
    Random write        3452832.88      3290861.69      3588531.75*
          Pwrite        3757435.81      2711756.47      4561807.88*
           Pread        2743595.25*     2635835.00       412947.98
          Fwrite       16076549.00     16741977.25*    14797209.38
           Fread       23581812.62*    21664184.25      5064296.97
 =          real         0m44.490s       0m44.444s       0m44.609s
 =          user          0m0.054s        0m0.049s        0m0.055s
 =           sys          0m0.037s        0m0.046s        0m0.148s
 
 
 so when the number of active tasks become larger than the number
 of online CPUS, iozone reports a bit hard to understand data. I
 can assume that since now we keep the preemption disabled longer
 in write path, a concurrent operation (READ or WRITE) cannot preempt
 current anymore... slightly suspicious.
 
 the other hard to understand thing is why do READ-only tests have
 such a huge jitter. READ-only tests don't depend on streams, they
 don't even use them, we supply compressed data directly to
 decompression api.
 
 may be better retire iozone and never use it again.
 
 
 "118 insertions(+), 238 deletions(-)" the patches remove a big
 pile of code.
 
 	-ss
 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ