linux-kernel - RE: [GIT PULL 20/20] lightnvm: pblk: sync RB and RL states during GC

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:   Tue, 29 May 2018 13:07:05 +0000
From:   "Konopko, Igor J" <igor.j.konopko@...el.com>
To:     Javier Gonzalez <javier@...xlabs.com>,
        Matias Bjorling <mb@...htnvm.io>
CC:     Jens Axboe <axboe@...com>,
        "linux-block@...r.kernel.org" <linux-block@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "Dziegielewski, Marcin" <marcin.dziegielewski@...el.com>
Subject: RE: [GIT PULL 20/20] lightnvm: pblk: sync RB and RL states during GC

> From: Javier Gonzalez [mailto:javier@...xlabs.com]
> Do you mean random writes? On fully sequential, a line will either be
> fully written, fully invalidated or on its way to be written. When
> invalidating the line, then the whole line will be invalidated and GC
> will free it without having to move valid data.

I meant sequential writes, since this is the easiest way to reach rl->rb_state = PBLK_RL_LOW. When we updating this values inside __pblk_rl_update_rates(), most of the times rl->rb_user_cnt will became equal to rl->rb_user_max - which means that we cannot handle any more user IOs. In that case pblk_rl_user_may_insert() will return false, no one will call pblk_rb_update_l2p(), rl->rb_user_cnt will not be decreased, so user IOs will stuck for forever.

> I can see why you might have problems with very low OP due to the rate
> limiter, but unfortunately this is not a good way of solving the
> problem. When you do this, you basically make the L2P to point to the
> device instead of pointing to the write cache, which in essence bypasses
> mw_cuints. As a result, if a read comes in to one of the synced entries,
> it will violate the device-host contract and most probably fail (for
> sure fail on > SLC).

What about using on that path some modified version of pblk_rb_sync_l2p() which will synchronize all the RB entries except last mw_cunits number of elements?

Also you wrote about mw_cuints definitely makes sense, but even without my changes I believe that we can lead into such a situation - especially for pblk with small number of LUNs assigned under  write IOs with high sector count. Pblk_rb_update_l2p() does not explicitly takes mw_cunints this value into consideration right now.

> I think that the right way of solving this problem is separating the
> write and GC buffers and then assigning tokens to them. The write thread
> will then consume both buffers based on these tokens. In this case, user
> I/O will have a buffer for itself, which will be guaranteed to advance
> at the rate the rate-limiter is allowing it to. Note that the 2 buffers
> can be a single buffer with a new set of pointers so that the lookup can
> be done with a single bit.
> 
> I have been looking for time to implement this for a while. If you want
> to give it a go, we can talk and I can give you some pointers on
> potential issues I have thought about.

I believe this is interesting option - we can discuss this for one of next releases.