lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <ACB77414-F14D-4FBE-90D2-555E95B127C3@lightnvm.io>
Date:   Sat, 22 Apr 2017 11:31:17 +0200
From:   Javier González <jg@...htnvm.io>
To:     Matias Bjørling <mb@...htnvm.io>
Cc:     linux-block@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 5/5] lightnvm: pblk: fix erase counters on error fail


> On 22 Apr 2017, at 11.22, Matias Bjørling <mb@...htnvm.io> wrote:
> 
> On 04/22/2017 01:32 AM, Javier González wrote:
>> When block erases fail, these blocks are marked bad. The number of valid
>> blocks in the line was not updated, which could cause an infinite loop
>> on the erase path.
>> 
>> Fix this atomic counter and, in order to avoid taking an irq lock on the
>> interrupt context, make the erase counters atomic too.
> 
> I can't find out where the counters are used in irq context? Can you
> point me in the right direction? I'll prefer for these counters to go
> in under the existing line_lock.
> 

This counter is line->blk_in_line, which is used on pblk_mark_bb. This
is triggered on the erase completion path. Note that we do not want to
wait until the recovery kicks in on the workqueue since the line might
start getting recycled straight away if we are close to reaching
capacity. This is indeed the scenario that triggers the race condition.

Making all erase counters atomic allows us to avoid taking the
line_lock. Note that the counters do not depend on each other.

>> Also, in the case that a significant number of blocks become bad in a
>> line, the result is the double shared metadata buffer (emeta) to stop
>> the pipeline until all metadata is flushed to the media. Increase the
>> number of metadata lines from 2 to 4 to avoid this case.
> 
> How does moving to 4 lines solve this case? The way I read it is that
> it only postpones when this occurs?
> 

The chances of having more than 2 lines falling out of blocks after
pre-condition are slim. Adding two more lines should be enough.

>> 
>> [...]
>> 
>> -#define PBLK_DATA_LINES 2
>> +#define PBLK_DATA_LINES 4
> 
> Why this change? I like to keep new features for 4.13. Only bugfixes for 4.12.

This is the 4 lines referred above. I see it as a bug fix since we risk
stalling the pipeline if a line goes above a certain number of bad
blocks on initialization, but we can leave it out and fix this when we
add in-line metadata on the write thread for 4.12

Thanks,
Javier

Download attachment "signature.asc" of type "application/pgp-signature" (802 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ