[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <201211121616.23616.Martin@lichtvoll.de>
Date: Mon, 12 Nov 2012 16:16:23 +0100
From: Martin Steigerwald <Martin@...htvoll.de>
To: Arnd Bergmann <arnd@...db.de>
Cc: linux-kernel@...r.kernel.org, Kim Jaegeuk <jaegeuk.kim@...il.com>,
Jaegeuk Kim <jaegeuk.kim@...sung.com>,
linux-fsdevel@...r.kernel.org, gregkh@...uxfoundation.org,
viro@...iv.linux.org.uk, tytso@....edu, chur.lee@...sung.com,
cm224.lee@...sung.com, jooyoung.hwang@...sung.com
Subject: Re: [PATCH 00/16 v3] f2fs: introduce flash-friendly file system
Am Samstag, 10. November 2012 schrieb Arnd Bergmann:
> On Saturday 10 November 2012, Martin Steigerwald wrote:
> > Command (m for help): n
> > Partition type:
> > p primary (0 primary, 0 extended, 4 free)
> > e extended
> > Select (default p): p
> > Partition number (1-4, default 1): 1
> > First sector (2048-4095998, default 2048):
> > Using default value 2048
> > Last sector, +sectors or +size{K,M,G} (2048-4095998, default 4095998):
> > Using default value 4095998
>
> This is almost certainly not the right setting for f2fs, which only works
> at its design point if the segments are aligned to erase blocks. All modern
> flash devices have erase blocks larger than 1 MB, so starting the partition
> at a 1 MB offset will cause it to be misaligned. Also, some USB sticks
> have an area optimized for random writes in the beginning of the drive
> where both FAT32 and f2fs store their metadata. It may be worth testing
> again without a partition table, using just the raw device.
Thank you for your hints, Arnd, much appreciated.
I already suspected as such after having read some of the fine documents on
the linaro website.
As I want to write some article to give Linux users some insight about
Linux on "cheap" flash, I am willing to learn more.
> I would also recommend using flashbench to find out the optimum parameters
> for your device. You can download it from
> git://git.linaro.org/people/arnd/flashbench.git
> In the long run, we should automate those tests and make them part of
> mkfs.f2fs, but for now, try to find out the erase block size and the number
> of concurrently used erase blocks on your device using a timing attack
> in flashbench. The README file in there explains how to interpret the
> results from "./flashbench -a /dev/sdb --blocksize=1024" to guess
> the erase block size, although that sometimes doesn't work.
Why do I use a blocksize of 1024 if the kernel reports me 512 byte blocks?
[ 3112.144086] scsi9 : usb-storage 1-1.1:1.0
[ 3113.145968] scsi 9:0:0:0: Direct-Access TinyDisk 2007-05-12 0.00 PQ: 0 ANSI: 2
[ 3113.146476] sd 9:0:0:0: Attached scsi generic sg2 type 0
[ 3113.147935] sd 9:0:0:0: [sdb] 4095999 512-byte logical blocks: (2.09 GB/1.95 GiB)
[ 3113.148935] sd 9:0:0:0: [sdb] Write Protect is off
And how do reads give information about erase block size? Wouldn´t writes me
more conclusive for that? (Having to erase one versus two erase blocks?)
Hmmm, I get very varying results here with said USB stick:
merkaba:~> /tmp/flashbench -a /dev/sdb
align 536870912 pre 1.1ms on 1.1ms post 1.08ms diff 13µs
align 268435456 pre 1.2ms on 1.19ms post 1.16ms diff 11.6µs
align 134217728 pre 1.12ms on 1.14ms post 1.15ms diff 9.51µs
align 67108864 pre 1.12ms on 1.15ms post 1.12ms diff 29.9µs
align 33554432 pre 1.11ms on 1.17ms post 1.13ms diff 49µs
align 16777216 pre 1.14ms on 1.16ms post 1.15ms diff 22.4µs
align 8388608 pre 1.12ms on 1.09ms post 1.06ms diff -2053ns
align 4194304 pre 1.13ms on 1.16ms post 1.14ms diff 21.7µs
align 2097152 pre 1.11ms on 1.08ms post 1.1ms diff -18488n
align 1048576 pre 1.11ms on 1.11ms post 1.11ms diff -2461ns
align 524288 pre 1.15ms on 1.17ms post 1.1ms diff 45.4µs
align 262144 pre 1.11ms on 1.13ms post 1.13ms diff 12µs
align 131072 pre 1.1ms on 1.09ms post 1.16ms diff -38025n
align 65536 pre 1.09ms on 1.08ms post 1.11ms diff -21353n
align 32768 pre 1.1ms on 1.08ms post 1.11ms diff -23854n
merkaba:~> /tmp/flashbench -a /dev/sdb
align 536870912 pre 1.11ms on 1.13ms post 1.13ms diff 10.6µs
align 268435456 pre 1.12ms on 1.2ms post 1.17ms diff 61.4µs
align 134217728 pre 1.14ms on 1.19ms post 1.15ms diff 46.8µs
align 67108864 pre 1.08ms on 1.15ms post 1.08ms diff 63.8µs
align 33554432 pre 1.09ms on 1.08ms post 1.09ms diff -4761ns
align 16777216 pre 1.12ms on 1.14ms post 1.07ms diff 41.4µs
align 8388608 pre 1.1ms on 1.1ms post 1.09ms diff 7.48µs
align 4194304 pre 1.08ms on 1.1ms post 1.1ms diff 10.1µs
align 2097152 pre 1.1ms on 1.11ms post 1.1ms diff 16µs
align 1048576 pre 1.09ms on 1.1ms post 1.07ms diff 15.5µs
align 524288 pre 1.12ms on 1.12ms post 1.1ms diff 11µs
align 262144 pre 1.13ms on 1.13ms post 1.1ms diff 21.6µs
align 131072 pre 1.11ms on 1.13ms post 1.12ms diff 17.9µs
align 65536 pre 1.07ms on 1.1ms post 1.1ms diff 11.6µs
align 32768 pre 1.09ms on 1.11ms post 1.13ms diff -5131ns
merkaba:~> /tmp/flashbench -a /dev/sdb
align 536870912 pre 1.2ms on 1.18ms post 1.21ms diff -27496n
align 268435456 pre 1.22ms on 1.21ms post 1.24ms diff -18972n
align 134217728 pre 1.15ms on 1.19ms post 1.14ms diff 42.5µs
align 67108864 pre 1.08ms on 1.09ms post 1.08ms diff 5.29µs
align 33554432 pre 1.18ms on 1.19ms post 1.18ms diff 9.25µs
align 16777216 pre 1.18ms on 1.22ms post 1.17ms diff 48.6µs
align 8388608 pre 1.14ms on 1.17ms post 1.19ms diff 4.36µs
align 4194304 pre 1.16ms on 1.2ms post 1.11ms diff 65.8µs
align 2097152 pre 1.13ms on 1.09ms post 1.12ms diff -37718n
align 1048576 pre 1.15ms on 1.2ms post 1.18ms diff 34.9µs
align 524288 pre 1.14ms on 1.19ms post 1.16ms diff 41.5µs
align 262144 pre 1.19ms on 1.12ms post 1.15ms diff -52725n
align 131072 pre 1.21ms on 1.11ms post 1.14ms diff -68522n
align 65536 pre 1.21ms on 1.13ms post 1.18ms diff -64248n
align 32768 pre 1.14ms on 1.25ms post 1.12ms diff 116µs
Even when I apply the explaination of the README I do not seem to get a
clear picture of the stick erase block size.
The values above seem to indicate to me: I don´t care about alignment at all.
With another flash, likely slower Intenso 4GB stick I get:
[ 3672.512143] scsi 10:0:0:0: Direct-Access Ut165 USB2FlashStorage 0.00 PQ: 0 ANSI: 2
[ 3672.514469] sd 10:0:0:0: Attached scsi generic sg2 type 0
[ 3672.514991] sd 10:0:0:0: [sdb] 7897088 512-byte logical blocks: (4.04 GB/3.76 GiB)
[…]
merkaba:~> /tmp/flashbench -a /dev/sdb
align 1073741824 pre 1.06ms on 1.03ms post 951µs diff 26.1µs
align 536870912 pre 1.06ms on 1ms post 941µs diff 1.17µs
align 268435456 pre 995µs on 957µs post 887µs diff 15.7µs
align 134217728 pre 994µs on 951µs post 883µs diff 12.4µs
align 67108864 pre 994µs on 989µs post 1.02ms diff -15104n
align 33554432 pre 934µs on 974µs post 1ms diff 4.16µs
align 16777216 pre 946µs on 916µs post 900µs diff -6588ns
align 8388608 pre 883µs on 881µs post 880µs diff -1176ns
align 4194304 pre 884µs on 884µs post 885µs diff -159ns
here?
align 2097152 pre 880µs on 879µs post 783µs diff 47.6µs
align 1048576 pre 877µs on 881µs post 878µs diff 3.92µs
align 524288 pre 869µs on 870µs post 875µs diff -2101ns
align 262144 pre 871µs on 875µs post 885µs diff -2539ns
align 131072 pre 878µs on 893µs post 900µs diff 3.6µs
align 65536 pre 851µs on 881µs post 884µs diff 13.7µs
align 32768 pre 836µs on 833µs post 880µs diff -25556n
merkaba:~> /tmp/flashbench -a /dev/sdb
align 1073741824 pre 1.07ms on 1e+03µ post 962µs diff -14615n
align 536870912 pre 1.06ms on 1.01ms post 940µs diff 12.2µs
align 268435456 pre 1ms on 943µs post 885µs diff -1132ns
align 134217728 pre 995µs on 982µs post 909µs diff 30µs
align 67108864 pre 999µs on 995µs post 1.01ms diff -9707ns
align 33554432 pre 960µs on 1.01ms post 1.03ms diff 15.2µs
align 16777216 pre 954µs on 928µs post 878µs diff 12.1µs
align 8388608 pre 872µs on 900µs post 895µs diff 16.5µs
align 4194304 pre 895µs on 862µs post 890µs diff -30439n
align 2097152 pre 889µs on 901µs post 876µs diff 18.7µs
align 1048576 pre 900µs on 898µs post 897µs diff -708ns
here?
align 524288 pre 885µs on 874µs post 881µs diff -8470ns
align 262144 pre 817µs on 873µs post 878µs diff 25.6µs
align 131072 pre 882µs on 854µs post 881µs diff -27423n
align 65536 pre 866µs on 890µs post 885µs diff 14.3µs
align 32768 pre 900µs on 881µs post 893µs diff -15412n
merkaba:~> /tmp/flashbench -a /dev/sdb
align 1073741824 pre 1.12ms on 1.02ms post 949µs diff -12574n
align 536870912 pre 1.07ms on 1.03ms post 948µs diff 16.5µs
align 268435456 pre 1.01ms on 958µs post 883µs diff 12.1µs
align 134217728 pre 994µs on 946µs post 879µs diff 9.2µs
align 67108864 pre 1ms on 1.05ms post 1.03ms diff 37.9µs
align 33554432 pre 942µs on 1.01ms post 1.03ms diff 20.6µs
align 16777216 pre 939µs on 903µs post 880µs diff -5972ns
align 8388608 pre 900µs on 914µs post 923µs diff 2.42µs
align 4194304 pre 894µs on 886µs post 882µs diff -1563ns
here?
align 2097152 pre 829µs on 890µs post 874µs diff 37.8µs
align 1048576 pre 899µs on 882µs post 843µs diff 11.1µs
align 524288 pre 890µs on 887µs post 902µs diff -9005ns
align 262144 pre 887µs on 887µs post 898µs diff -5474ns
align 131072 pre 928µs on 895µs post 914µs diff -26028n
align 65536 pre 898µs on 898µs post 894µs diff 2.59µs
align 32768 pre 884µs on 891µs post 901µs diff -1284ns
Similar picture. The diffs seem to be mostly quite small with only some
micro seconds. Or am I misreading something?
Then with a quite fast one 16 GB Transcend.
[ 4055.393399] sd 11:0:0:0: Attached scsi generic sg2 type 0
[ 4055.394729] sd 11:0:0:0: [sdb] 31375360 512-byte logical blocks: (16.0 GB/14.9 GiB)
[ 4055.395262] sd 11:0:0:0: [sdb] Write Protect is off
merkaba:~> /tmp/flashbench -a /dev/sdb
align 4294967296 pre 1.28ms on 1.48ms post 1.33ms diff 179µs
align 2147483648 pre 1.32ms on 1.51ms post 1.33ms diff 181µs
align 1073741824 pre 1.31ms on 1.46ms post 1.35ms diff 132µs
align 536870912 pre 1.27ms on 1.52ms post 1.33ms diff 228µs
align 268435456 pre 1.28ms on 1.46ms post 1.31ms diff 161µs
align 134217728 pre 1.28ms on 1.44ms post 1.37ms diff 120µs
align 67108864 pre 1.27ms on 1.44ms post 1.34ms diff 133µs
align 33554432 pre 1.24ms on 1.42ms post 1.31ms diff 150µs
align 16777216 pre 1.23ms on 1.46ms post 1.26ms diff 218µs
align 8388608 pre 1.31ms on 1.5ms post 1.33ms diff 180µs
align 4194304 pre 1.27ms on 1.45ms post 1.36ms diff 135µs
align 2097152 pre 1.29ms on 1.37ms post 1.39ms diff 33.7µs
here?
align 1048576 pre 1.31ms on 1.44ms post 1.35ms diff 115µs
align 524288 pre 1.33ms on 1.39ms post 1.48ms diff -12297n
align 262144 pre 1.36ms on 1.42ms post 1.4ms diff 45.6µs
align 131072 pre 1.37ms on 1.44ms post 1.4ms diff 57.7µs
align 65536 pre 1.36ms on 1.35ms post 1.33ms diff 4.67µs
align 32768 pre 1.32ms on 1.38ms post 1.34ms diff 44.1µs
merkaba:~> /tmp/flashbench -a /dev/sdb
align 4294967296 pre 1.36ms on 1.49ms post 1.34ms diff 139µs
align 2147483648 pre 1.26ms on 1.48ms post 1.27ms diff 213µs
align 1073741824 pre 1.26ms on 1.45ms post 1.33ms diff 164µs
align 536870912 pre 1.22ms on 1.46ms post 1.35ms diff 173µs
align 268435456 pre 1.34ms on 1.5ms post 1.31ms diff 172µs
align 134217728 pre 1.34ms on 1.48ms post 1.31ms diff 157µs
align 67108864 pre 1.29ms on 1.46ms post 1.34ms diff 142µs
align 33554432 pre 1.28ms on 1.47ms post 1.31ms diff 173µs
align 16777216 pre 1.26ms on 1.48ms post 1.37ms diff 168µs
align 8388608 pre 1.31ms on 1.47ms post 1.36ms diff 139µs
align 4194304 pre 1.26ms on 1.53ms post 1.33ms diff 237µs
align 2097152 pre 1.34ms on 1.4ms post 1.36ms diff 56.4µs
align 1048576 pre 1.32ms on 1.35ms post 1.37ms diff 638ns
here?
align 524288 pre 1.29ms on 1.47ms post 1.45ms diff 98.1µs
align 262144 pre 1.35ms on 1.38ms post 1.42ms diff -11916n
align 131072 pre 1.32ms on 1.46ms post 1.4ms diff 100µs
align 65536 pre 1.35ms on 1.42ms post 1.43ms diff 30.8µs
align 32768 pre 1.31ms on 1.37ms post 1.33ms diff 51µs
merkaba:~> /tmp/flashbench -a /dev/sdb
align 4294967296 pre 1.26ms on 1.49ms post 1.27ms diff 222µs
align 2147483648 pre 1.25ms on 1.41ms post 1.37ms diff 97.3µs
align 1073741824 pre 1.26ms on 1.47ms post 1.31ms diff 186µs
align 536870912 pre 1.25ms on 1.42ms post 1.32ms diff 132µs
align 268435456 pre 1.2ms on 1.44ms post 1.29ms diff 195µs
align 134217728 pre 1.27ms on 1.43ms post 1.34ms diff 118µs
align 67108864 pre 1.25ms on 1.45ms post 1.31ms diff 165µs
align 33554432 pre 1.22ms on 1.36ms post 1.25ms diff 124µs
align 16777216 pre 1.24ms on 1.44ms post 1.26ms diff 191µs
align 8388608 pre 1.22ms on 1.39ms post 1.23ms diff 164µs
align 4194304 pre 1.23ms on 1.43ms post 1.3ms diff 171µs
align 2097152 pre 1.26ms on 1.3ms post 1.32ms diff 16.7µs
align 1048576 pre 1.26ms on 1.27ms post 1.26ms diff 7.91µs
here?
align 524288 pre 1.24ms on 1.3ms post 1.3ms diff 29.2µs
align 262144 pre 1.25ms on 1.3ms post 1.28ms diff 28.2µs
align 131072 pre 1.25ms on 1.29ms post 1.28ms diff 24.8µs
align 65536 pre 1.15ms on 1.24ms post 1.26ms diff 34.5µs
align 32768 pre 1.17ms on 1.3ms post 1.26ms diff 82.6µs
Thing is that me here is not always at the same place :)
> With the correct guess, compare the performance you get using
>
> $ ERASESIZE=$[2*1024*1024] # replace with guess from flashbench -a
> $ ./flashbench /dev/sdb --open-au --open-au-nr=1 --blocksize=4096 --erasesize=${ERASESIZE}
> $ ./flashbench /dev/sdb --open-au --open-au-nr=3 --blocksize=4096 --erasesize=${ERASESIZE}
> $ ./flashbench /dev/sdb --open-au --open-au-nr=5 --blocksize=4096 --erasesize=${ERASESIZE}
> $ ./flashbench /dev/sdb --open-au --open-au-nr=7 --blocksize=4096 --erasesize=${ERASESIZE}
> $ ./flashbench /dev/sdb --open-au --open-au-nr=13 --blocksize=4096 --erasesize=${ERASESIZE}
I omit this for now, cause I am not yet sure about the correct guess.
> The first one of those should always be the fastest, hopefully followed by
> some that are equally fast and then some much slower ones (especially for the
> smaller block sizes). The "active_logs=N" mount option should be one less
> than the highest number above that is still "fast", and only "2", "4" and "6"
> are valid at the moment. If you are lucky, your device is still fast with
> "--open-au-nr=7" and slow only for higher numbers, then the default of "6"
> is ok.
>
> If the erase size is larger than 2 MB, then you have to "-s" option in
> mkfs.f2fs to configure how many 2 MB segments there are in one erase block.
> For a 2 GB USB stick, I would guess that the erase block size is 1, 2 or
> 4 MB. Newer (larger) sticks will have larger erase blocks that may also
> be a multiple of 3 MB (3, 6, 12, or 24).
Thanks,
--
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists