lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20191218164706.GP3929@twin.jikos.cz>
Date:   Wed, 18 Dec 2019 17:47:06 +0100
From:   David Sterba <dsterba@...e.cz>
To:     Josef Bacik <josef@...icpanda.com>
Cc:     Aditya Pakki <pakki001@....edu>, kjlu@....edu,
        Chris Mason <clm@...com>, David Sterba <dsterba@...e.com>,
        linux-btrfs@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] btrfs: remove BUG_ON used as assertions

On Wed, Dec 18, 2019 at 11:38:18AM -0500, Josef Bacik wrote:
> On 12/15/19 12:12 PM, Aditya Pakki wrote:
> > alloc_extent_state_atomic() allocates extents via GFP_ATOMIC flag
> > and cannot fail. There are multiple invocations of BUG_ON on the
> > return value to check for failure. The patch replaces certain
> > invocations of BUG_ON by returning the error upstream.
> > 
> > Signed-off-by: Aditya Pakki <pakki001@....edu>
> 
> I already tried this a few months ago and gave up.  There are a few things if 
> you want to tackle something like this
> 
> 1) use bpf's error injection thing to make sure you handle every path that can 
> error out.  This is the script I wrote to do just that
> 
> https://github.com/josefbacik/debug-scripts/blob/master/error-injection-stress.py
> 
> 2) We actually can't fail here.  We would need to go back and make _all_ callers 
> of lock_extent_bits() handle the allocation error.  This is theoretically 
> possible, but a giant pain in the ass.  In general we can make allocations here 
> and we need to be able to make them.
> 
> 3) We should probably mark this path with __GFP_NOFAIL because again, this is 
> locking and we need locking to succeed.

NOFAIL can introduce loops that could lead to deadlocks, if not used
carefully. __set_extent_bit is not just locking, so if one thread wants
to set bits, allocate, wait, allocator goes to write some memory

eg.

set_extent_bit on some range
  alloc state (NOFAIL)
    allocator wants to flush dome dirty data
                   ------------------------------>
		                               set_extent_bit
					         alloc state (NOFAIL)
						 (wait)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ