linux-kernel - Re: [PATCH 5/5] lightnvm: pblk: fix erase counters on error fail

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:   Sun, 23 Apr 2017 19:59:37 +0200
From:   Matias Bjørling <mb@...htnvm.io>
To:     Javier González <jg@...htnvm.io>
Cc:     linux-block@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 5/5] lightnvm: pblk: fix erase counters on error fail

On 04/22/2017 11:31 AM, Javier González wrote:
>
>> On 22 Apr 2017, at 11.22, Matias Bjørling <mb@...htnvm.io> wrote:
>>
>> On 04/22/2017 01:32 AM, Javier González wrote:
>>> When block erases fail, these blocks are marked bad. The number of valid
>>> blocks in the line was not updated, which could cause an infinite loop
>>> on the erase path.
>>>
>>> Fix this atomic counter and, in order to avoid taking an irq lock on the
>>> interrupt context, make the erase counters atomic too.
>>
>> I can't find out where the counters are used in irq context? Can you
>> point me in the right direction? I'll prefer for these counters to go
>> in under the existing line_lock.
>>
>
> This counter is line->blk_in_line, which is used on pblk_mark_bb. This
> is triggered on the erase completion path. Note that we do not want to
> wait until the recovery kicks in on the workqueue since the line might
> start getting recycled straight away if we are close to reaching
> capacity. This is indeed the scenario that triggers the race condition.
>
> Making all erase counters atomic allows us to avoid taking the
> line_lock. Note that the counters do not depend on each other.
>
>>> Also, in the case that a significant number of blocks become bad in a
>>> line, the result is the double shared metadata buffer (emeta) to stop
>>> the pipeline until all metadata is flushed to the media. Increase the
>>> number of metadata lines from 2 to 4 to avoid this case.
>>
>> How does moving to 4 lines solve this case? The way I read it is that
>> it only postpones when this occurs?
>>
>
> The chances of having more than 2 lines falling out of blocks after
> pre-condition are slim. Adding two more lines should be enough.
>
>>>
>>> [...]
>>>
>>> -#define PBLK_DATA_LINES 2
>>> +#define PBLK_DATA_LINES 4
>>
>> Why this change? I like to keep new features for 4.13. Only bugfixes for 4.12.
>
> This is the 4 lines referred above. I see it as a bug fix since we risk
> stalling the pipeline if a line goes above a certain number of bad
> blocks on initialization, but we can leave it out and fix this when we
> add in-line metadata on the write thread for 4.12
>
> Thanks,
> Javier
>

Okay. It tickles me a bit. However, to make it pretty, some refactoring 
is needed, which we won't push for this release.

Reviewed-by: Matias Bjørling <matias@...xlabs.com>