lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Fri, 18 Dec 2020 10:05:20 +0900 From: Daejun Park <daejun7.park@...sung.com> To: Greg KH <gregkh@...uxfoundation.org>, Daejun Park <daejun7.park@...sung.com> CC: "avri.altman@....com" <avri.altman@....com>, "jejb@...ux.ibm.com" <jejb@...ux.ibm.com>, "martin.petersen@...cle.com" <martin.petersen@...cle.com>, "asutoshd@...eaurora.org" <asutoshd@...eaurora.org>, "stanley.chu@...iatek.com" <stanley.chu@...iatek.com>, "cang@...eaurora.org" <cang@...eaurora.org>, "bvanassche@....org" <bvanassche@....org>, "huobean@...il.com" <huobean@...il.com>, ALIM AKHTAR <alim.akhtar@...sung.com>, "linux-scsi@...r.kernel.org" <linux-scsi@...r.kernel.org>, "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, Sung-Jun Park <sungjun07.park@...sung.com>, yongmyung lee <ymhungry.lee@...sung.com>, Jinyoung CHOI <j-young.choi@...sung.com>, Adel Choi <adel.choi@...sung.com>, BoRam Shin <boram.shin@...sung.com>, SEUNGUK SHIN <seunguk.shin@...sung.com> Subject: RE: Re: [PATCH v14 0/3] scsi: ufs: Add Host Performance Booster Support Hi, Greg > > NAND flash memory-based storage devices use Flash Translation Layer (FTL) > > to translate logical addresses of I/O requests to corresponding flash > > memory addresses. Mobile storage devices typically have RAM with > > constrained size, thus lack in memory to keep the whole mapping table. > > Therefore, mapping tables are partially retrieved from NAND flash on > > demand, causing random-read performance degradation. > > > > To improve random read performance, JESD220-3 (HPB v1.0) proposes HPB > > (Host Performance Booster) which uses host system memory as a cache for the > > FTL mapping table. By using HPB, FTL data can be read from host memory > > faster than from NAND flash memory. > > > > The current version only supports the DCM (device control mode). > > This patch consists of 3 parts to support HPB feature. > > > > 1) HPB probe and initialization process > > 2) READ -> HPB READ using cached map information > > 3) L2P (logical to physical) map management > > > > In the HPB probe and init process, the device information of the UFS is > > queried. After checking supported features, the data structure for the HPB > > is initialized according to the device information. > > > > A read I/O in the active sub-region where the map is cached is changed to > > HPB READ by the HPB. > > > > The HPB manages the L2P map using information received from the > > device. For active sub-region, the HPB caches through ufshpb_map > > request. For the in-active region, the HPB discards the L2P map. > > When a write I/O occurs in an active sub-region area, associated dirty > > bitmap checked as dirty for preventing stale read. > > > > HPB is shown to have a performance improvement of 58 - 67% for random read > > workload. [1] > > > > We measured the total start-up time of popular applications and observed > > the difference by enabling the HPB. > > Popular applications are 12 game apps and 24 non-game apps. Each target > > applications were launched in order. The cycle consists of running 36 > > applications in sequence. We repeated the cycle for observing performance > > improvement by L2P mapping cache hit in HPB. > > > > The Following is experiment environment: > > - kernel version: 4.4.0 > > - UFS 2.1 (64GB) > > > > Result: > > +-------+----------+----------+-------+ > > | cycle | baseline | with HPB | diff | > > +-------+----------+----------+-------+ > > | 1 | 272.4 | 264.9 | -7.5 | > > | 2 | 250.4 | 248.2 | -2.2 | > > | 3 | 226.2 | 215.6 | -10.6 | > > | 4 | 230.6 | 214.8 | -15.8 | > > | 5 | 232.0 | 218.1 | -13.9 | > > | 6 | 231.9 | 212.6 | -19.3 | > > +-------+----------+----------+-------+ > > I feel this was burried in the 00 email, shouldn't it go into the 01 > commit changelog so that you can see this? Sure, I will move this result to 01 commit log. > But why does the "cycle" matter here? I think iteration minimizes other factors that affect the start-up time of application. > Can you run a normal benchmark, like fio, on here so we can get some > numbers we know how to compare to other systems with, and possible > reproduct it ourselves? I'm sure fio will easily show random read > performance increases, right? Here is my iozone script: iozone -r 4k -+n -i2 -ecI -t 16 -l 16 -u 16 -s $IO_RANGE/16 -F mnt/tmp_1 mnt/tmp_2 mnt/tmp_3 mnt/tmp_4 mnt/tmp_5 mnt/tmp_6 mnt/tmp_7 mnt/tmp_8 mnt/tmp_9 mnt/tmp_10 mnt/tmp_11 mnt/tmp_12 mnt/tmp_13 mnt/tmp_14 mnt/tmp_15 mnt/tmp_16 Result: +----------+--------+---------+ | IO range | HPB on | HPB off | +----------+--------+---------+ | 1 GB | 294.8 | 300.87 | | 4 GB | 293.51 | 179.35 | | 8 GB | 294.85 | 162.52 | | 16 GB | 293.45 | 156.26 | | 32 GB | 277.4 | 153.25 | +----------+--------+---------+ Thanks, Daejun
Powered by blists - more mailing lists