lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1476275641-4697-29-git-send-email-lizf@kernel.org>
Date:   Wed, 12 Oct 2016 20:32:25 +0800
From:   lizf@...nel.org
To:     stable@...r.kernel.org
Cc:     linux-kernel@...r.kernel.org, Filipe Manana <fdmanana@...e.com>,
        Zefan Li <lizefan@...wei.com>
Subject: [PATCH 3.4 029/125] Btrfs: fix race leading to BUG_ON when running delalloc for nodatacow

From: Filipe Manana <fdmanana@...e.com>

3.4.113-rc1 review patch.  If anyone has any objections, please let me know.

------------------


commit 1d512cb77bdbda80f0dd0620a3b260d697fd581d upstream.

If we are using the NO_HOLES feature, we have a tiny time window when
running delalloc for a nodatacow inode where we can race with a concurrent
link or xattr add operation leading to a BUG_ON.

This happens because at run_delalloc_nocow() we end up casting a leaf item
of type BTRFS_INODE_[REF|EXTREF]_KEY or of type BTRFS_XATTR_ITEM_KEY to a
file extent item (struct btrfs_file_extent_item) and then analyse its
extent type field, which won't match any of the expected extent types
(values BTRFS_FILE_EXTENT_[REG|PREALLOC|INLINE]) and therefore trigger an
explicit BUG_ON(1).

The following sequence diagram shows how the race happens when running a
no-cow dellaloc range [4K, 8K[ for inode 257 and we have the following
neighbour leafs:

             Leaf X (has N items)                    Leaf Y

 [ ... (257 INODE_ITEM 0) (257 INODE_REF 256) ]  [ (257 EXTENT_DATA 8192), ... ]
              slot N - 2         slot N - 1              slot 0

 (Note the implicit hole for inode 257 regarding the [0, 8K[ range)

       CPU 1                                         CPU 2

 run_dealloc_nocow()
   btrfs_lookup_file_extent()
     --> searches for a key with value
         (257 EXTENT_DATA 4096) in the
         fs/subvol tree
     --> returns us a path with
         path->nodes[0] == leaf X and
         path->slots[0] == N

   because path->slots[0] is >=
   btrfs_header_nritems(leaf X), it
   calls btrfs_next_leaf()

   btrfs_next_leaf()
     --> releases the path

                                              hard link added to our inode,
                                              with key (257 INODE_REF 500)
                                              added to the end of leaf X,
                                              so leaf X now has N + 1 keys

     --> searches for the key
         (257 INODE_REF 256), because
         it was the last key in leaf X
         before it released the path,
         with path->keep_locks set to 1

     --> ends up at leaf X again and
         it verifies that the key
         (257 INODE_REF 256) is no longer
         the last key in the leaf, so it
         returns with path->nodes[0] ==
         leaf X and path->slots[0] == N,
         pointing to the new item with
         key (257 INODE_REF 500)

   the loop iteration of run_dealloc_nocow()
   does not break out the loop and continues
   because the key referenced in the path
   at path->nodes[0] and path->slots[0] is
   for inode 257, its type is < BTRFS_EXTENT_DATA_KEY
   and its offset (500) is less then our delalloc
   range's end (8192)

   the item pointed by the path, an inode reference item,
   is (incorrectly) interpreted as a file extent item and
   we get an invalid extent type, leading to the BUG_ON(1):

   if (extent_type == BTRFS_FILE_EXTENT_REG ||
      extent_type == BTRFS_FILE_EXTENT_PREALLOC) {
       (...)
   } else if (extent_type == BTRFS_FILE_EXTENT_INLINE) {
       (...)
   } else {
       BUG_ON(1)
   }

The same can happen if a xattr is added concurrently and ends up having
a key with an offset smaller then the delalloc's range end.

So fix this by skipping keys with a type smaller than
BTRFS_EXTENT_DATA_KEY.

Signed-off-by: Filipe Manana <fdmanana@...e.com>
Signed-off-by: Zefan Li <lizefan@...wei.com>
---
 fs/btrfs/inode.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 575c190..d460390 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -1203,8 +1203,14 @@ next_slot:
 		num_bytes = 0;
 		btrfs_item_key_to_cpu(leaf, &found_key, path->slots[0]);
 
-		if (found_key.objectid > ino ||
-		    found_key.type > BTRFS_EXTENT_DATA_KEY ||
+		if (found_key.objectid > ino)
+			break;
+		if (WARN_ON_ONCE(found_key.objectid < ino) ||
+		    found_key.type < BTRFS_EXTENT_DATA_KEY) {
+			path->slots[0]++;
+			goto next_slot;
+		}
+		if (found_key.type > BTRFS_EXTENT_DATA_KEY ||
 		    found_key.offset > end)
 			break;
 
-- 
1.9.1

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ