[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <57ffdc3b-59c7-66d4-8e55-909ad4710f5f@toxicpanda.com>
Date: Wed, 18 Dec 2019 14:54:44 -0500
From: Josef Bacik <josef@...icpanda.com>
To: dsterba@...e.cz, Aditya Pakki <pakki001@....edu>, kjlu@....edu,
Chris Mason <clm@...com>, David Sterba <dsterba@...e.com>,
linux-btrfs@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] btrfs: remove BUG_ON used as assertions
On 12/18/19 11:47 AM, David Sterba wrote:
> On Wed, Dec 18, 2019 at 11:38:18AM -0500, Josef Bacik wrote:
>> On 12/15/19 12:12 PM, Aditya Pakki wrote:
>>> alloc_extent_state_atomic() allocates extents via GFP_ATOMIC flag
>>> and cannot fail. There are multiple invocations of BUG_ON on the
>>> return value to check for failure. The patch replaces certain
>>> invocations of BUG_ON by returning the error upstream.
>>>
>>> Signed-off-by: Aditya Pakki <pakki001@....edu>
>>
>> I already tried this a few months ago and gave up. There are a few things if
>> you want to tackle something like this
>>
>> 1) use bpf's error injection thing to make sure you handle every path that can
>> error out. This is the script I wrote to do just that
>>
>> https://github.com/josefbacik/debug-scripts/blob/master/error-injection-stress.py
>>
>> 2) We actually can't fail here. We would need to go back and make _all_ callers
>> of lock_extent_bits() handle the allocation error. This is theoretically
>> possible, but a giant pain in the ass. In general we can make allocations here
>> and we need to be able to make them.
>>
>> 3) We should probably mark this path with __GFP_NOFAIL because again, this is
>> locking and we need locking to succeed.
>
> NOFAIL can introduce loops that could lead to deadlocks, if not used
> carefully. __set_extent_bit is not just locking, so if one thread wants
> to set bits, allocate, wait, allocator goes to write some memory
>
> eg.
>
> set_extent_bit on some range
> alloc state (NOFAIL)
> allocator wants to flush dome dirty data
> ------------------------------>
> set_extent_bit
> alloc state (NOFAIL)
> (wait)
>
Yes obviously I just want it for EXTENT_LOCKED. But we could even just use a
mempool to be really safe, since most places are going to use GFP_KERNEL or
something else related, we only really need the safety in a few critical areas.
Thanks,
Josef
Powered by blists - more mailing lists