[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20231213155440.GA454379@perftesting>
Date: Wed, 13 Dec 2023 10:54:40 -0500
From: Josef Bacik <josef@...icpanda.com>
To: "Paul E. McKenney" <paulmck@...nel.org>
Cc: David Sterba <dsterba@...e.cz>, kernel test robot <lkp@...el.com>,
llvm@...ts.linux.dev, oe-kbuild-all@...ts.linux.dev,
linux-kernel@...r.kernel.org, julia.lawall@...ia.fr, clm@...com,
dsterba@...e.com, baptiste.lepers@...il.com
Subject: Re: [paulmck-rcu:frederic.2023.12.08a 29/37]
fs/btrfs/transaction.c:496:6: error: call to '__compiletime_assert_329'
declared with 'error' attribute: Need native word sized stores/loads for
atomicity.
On Wed, Dec 13, 2023 at 06:56:37AM -0800, Paul E. McKenney wrote:
> On Wed, Dec 13, 2023 at 01:53:58PM +0100, David Sterba wrote:
> > On Sat, Dec 09, 2023 at 07:51:30AM -0800, Paul E. McKenney wrote:
> > > On Sat, Dec 09, 2023 at 06:20:37PM +0800, kernel test robot wrote:
> > > > tree: https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git frederic.2023.12.08a
> > > > head: 37843b5f561a08ae899fb791eeeb5abd992eabe2
> > > > commit: 7dd87072d40809e26503f04b79d63290288dbbac [29/37] btrfs: Adjust ->last_trans ordering in btrfs_record_root_in_trans()
> > > > config: riscv-rv32_defconfig (https://download.01.org/0day-ci/archive/20231209/202312091837.cKaPw0Tf-lkp@intel.com/config)
> > > > compiler: clang version 17.0.0 (https://github.com/llvm/llvm-project.git 4a5ac14ee968ff0ad5d2cc1ffa0299048db4c88a)
> > > > reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20231209/202312091837.cKaPw0Tf-lkp@intel.com/reproduce)
> > > >
> > > > If you fix the issue in a separate patch/commit (i.e. not just a new version of
> > > > the same patch/commit), kindly add following tags
> > > > | Reported-by: kernel test robot <lkp@...el.com>
> > > > | Closes: https://lore.kernel.org/oe-kbuild-all/202312091837.cKaPw0Tf-lkp@intel.com/
> > > >
> > > > All errors (new ones prefixed by >>):
> > > >
> > > > warning: unknown warning option '-Wpacked-not-aligned'; did you mean '-Wpacked-non-pod'? [-Wunknown-warning-option]
> > > > warning: unknown warning option '-Wstringop-truncation'; did you mean '-Wstring-concatenation'? [-Wunknown-warning-option]
> > > > warning: unknown warning option '-Wmaybe-uninitialized'; did you mean '-Wuninitialized'? [-Wunknown-warning-option]
> > > > >> fs/btrfs/transaction.c:496:6: error: call to '__compiletime_assert_329' declared with 'error' attribute: Need native word sized stores/loads for atomicity.
> > > > 496 | if (smp_load_acquire(&root->last_trans) == trans->transid && /* ^^^ */
> > > > | ^
> > >
> > > Ooooh!!! :-/
> > >
> > > From what I can see, the current code can tear this load on 32-bit
> > > systems, which can result in bad comparisons and then in failure to wait
> > > for a partially complete transaction.
> > >
> > > So is btrfs actually supported on 32-bit systems? If not, would the
> > > following patch be appropriate?
> >
> > There are limitations on 32bit systems, eg. due to shorter inode numbers
> > (ino_t is unsigned long) and that radix-tree/xarray does support only
> > unsigned long keys, while we have 64bit identifiers for inodes or tree
> > roots.
> >
> > So far we support that and dropping it completely is I think a big deal,
> > like with any possibly used feature. What I've seen there are NAS boxes
> > with low power ARM that are 32bit.
> >
> > > If btrfs is to be supported on 32-bit systems, from what I can see some
> > > major surgery is required, even if a 32-bit counter is wrap-safe for
> > > this particular type of transaction. (But SSDs? In-memory btrfs
> > > filesystems?)
> >
> > We won't probably do any major surgery to support 32bit systems.
>
> Got it, and thank you for the background! My takeaway is that 32-bit
> BTRFS must work in the common case, but might have issues on some
> workloads, for example, running out of inode numbers or load tearing.
>
> > > ------------------------------------------------------------------------
> > >
> > > diff --git a/fs/btrfs/Kconfig b/fs/btrfs/Kconfig
> > > index 4fb925e8c981..4d56158c34f9 100644
> > > --- a/fs/btrfs/Kconfig
> > > +++ b/fs/btrfs/Kconfig
> > > @@ -19,6 +19,7 @@ config BTRFS_FS
> > > select RAID6_PQ
> > > select XOR_BLOCKS
> > > depends on PAGE_SIZE_LESS_THAN_256KB
> > > + depends on 64BIT
> >
> > Can we keep the current inefficient smp_* barriers instead of dropping
> > the whole 32bit support as an alternative. If the smp_load_acquire are
> > better but not strictly necessary for the correctness (from the barriers
> > POV) I'd suggest to leave it as-is. We can put comments in case somebody
> > wants to optimize it in the future again.
>
> We still have the barrier placement issue, given that smp_rmb() enforces
> only the ordering of earlier and later loads, correct? Or am I missing
> some other ordering constraint that makes all that work?
>
> But I can make each of the current patch's smp_load_acquire() call instead
> be be a READ_ONCE() followed by an smp_rmb(), the test_bit_acquire()
> call be test_bit() followed by smp_rmb(), and the smp_store_release()
> call be an smp_wmb() followed by a WRITE_ONCE(). This provides the needed
> ordering, bullet-proofs the 64-bit code against compilers, but works on
> 32-bit systems. For example, on a 32-bit system the 64-bit READ_ONCE()
> and WRITE_ONCE() might still be compiled as a pair of 32-bit memory
> accesses, but they will both be guaranteed to be single memory accesses
> on 64-bit systems.
>
> Would that work for you guys?
Actually I think we just need to re-work all of this to make it less silly.
Does this look reasonable to you Paul? I still have to test it, but I think it
addresses your concerns and lets us keep 32bit support ;). Thanks,
Josef
From: Josef Bacik <josef@...icpanda.com>
Date: Wed, 13 Dec 2023 10:46:57 -0500
Subject: [PATCH] btrfs: cleanup how we record roots in the transaction
Previously we didn't have a bit field for tracking if a root was
recorded in the transaction, it was simply an integer on the root.
Additionally we have the ->last_trans which we record the last
transaction this root was modified in. We had various memory barriers
to attempt to synchronize these two updates so we could avoid taking the
->reloc_mutex all of the time for transaction starts.
However this code has changed a lot, but the dubious memory ordering has
remained. Fix up this code to make it more clear and get rid of the
memory barriers. All we're trying to avoid here is constantly taking
the ->reloc_mutex, so add a new bit to mark that the root has been
recorded. If this bit is set then we can skip the expensive step,
otherwise we have to take the lock and wait.
Add a big comment indicating why we're doing the ordering and locking
the way we are, and use ->fs_roots_radix_lock as the arbiter of this new
bit.
This makes the code simpler, removes the memory barriers, and documents
clearly what we're doing so it's less confusing for future readers.
Signed-off-by: Josef Bacik <josef@...icpanda.com>
---
fs/btrfs/ctree.h | 11 ++++---
fs/btrfs/disk-io.c | 1 +
fs/btrfs/transaction.c | 69 +++++++++++++++++-------------------------
3 files changed, 34 insertions(+), 47 deletions(-)
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 70e828d33177..eb44cf191863 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -90,12 +90,13 @@ struct btrfs_path {
*/
enum {
/*
- * btrfs_record_root_in_trans is a multi-step process, and it can race
- * with the balancing code. But the race is very small, and only the
- * first time the root is added to each transaction. So IN_TRANS_SETUP
- * is used to tell us when more checks are required
+ * We set this after we've recorded the root in the transaction. This
+ * is used to avoid taking the reloc_mutex every time we call
+ * btrfs_start_transaction(root). Once the root has been recorded we
+ * don't have to do the setup work again, and this is cleared on
+ * transaction commit once we're done with the root.
*/
- BTRFS_ROOT_IN_TRANS_SETUP,
+ BTRFS_ROOT_RECORDED,
/*
* Set if tree blocks of this root can be shared by other roots.
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index c6907d533fe8..0410fac5f78e 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -4805,6 +4805,7 @@ static void btrfs_free_all_qgroup_pertrans(struct btrfs_fs_info *fs_info)
radix_tree_tag_clear(&fs_info->fs_roots_radix,
(unsigned long)root->root_key.objectid,
BTRFS_ROOT_TRANS_TAG);
+ clear_bit(BTRFS_ROOT_RECORDED, &root->state);
}
}
spin_unlock(&fs_info->fs_roots_radix_lock);
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 5b3333ceef04..186089ddac27 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -407,54 +407,43 @@ static int record_root_in_trans(struct btrfs_trans_handle *trans,
int ret = 0;
if ((test_bit(BTRFS_ROOT_SHAREABLE, &root->state) &&
- root->last_trans < trans->transid) || force) {
+ !test_bit(BTRFS_ROOT_RECORDED, &root->state)) || force) {
WARN_ON(!force && root->commit_root != root->node);
/*
- * see below for IN_TRANS_SETUP usage rules
- * we have the reloc mutex held now, so there
- * is only one writer in this function
+ * We are using ->fs_roots_radix_lock as the ultimate protector
+ * of BTRFS_ROOT_RECORDED, which means we want to set it once
+ * we're completely done, so we call btrfs_init_reloc_root()
+ * first.
+ *
+ * This is ok because we are protected here by ->reloc_mutex, so
+ * in reality we won't ever race and call
+ * btrfs_init_reloc_root() twice on the same root back to back.
+ * However if we do it's ok, because again we're protected by
+ * ->reloc_mutex and the second call will simply update the
+ * ->last_trans on the reloc root to the current transid, making
+ * it essentially a no-op.
+ *
+ * We're using ->fs_roots_radix_lock here because
+ * btrfs_drop_snapshot will clear the tag so we don't update the
+ * root, and we will clear the RECORDED flag there without the
+ * ->reloc_mutex lock held. Again this is fine because we won't
+ * be messing with a dropped root anymore, but for concurrency
+ * it's best to keep the rules nice and simple.
*/
- set_bit(BTRFS_ROOT_IN_TRANS_SETUP, &root->state);
-
- /* make sure readers find IN_TRANS_SETUP before
- * they find our root->last_trans update
- */
- smp_wmb();
+ ret = btrfs_init_reloc_root(trans, root);
spin_lock(&fs_info->fs_roots_radix_lock);
- if (root->last_trans == trans->transid && !force) {
+ if (test_bit(BTRFS_ROOT_RECORDED, &root->state) && !force) {
spin_unlock(&fs_info->fs_roots_radix_lock);
return 0;
}
radix_tree_tag_set(&fs_info->fs_roots_radix,
(unsigned long)root->root_key.objectid,
BTRFS_ROOT_TRANS_TAG);
- spin_unlock(&fs_info->fs_roots_radix_lock);
+ set_bit(BTRFS_ROOT_RECORDED, &root->state);
root->last_trans = trans->transid;
-
- /* this is pretty tricky. We don't want to
- * take the relocation lock in btrfs_record_root_in_trans
- * unless we're really doing the first setup for this root in
- * this transaction.
- *
- * Normally we'd use root->last_trans as a flag to decide
- * if we want to take the expensive mutex.
- *
- * But, we have to set root->last_trans before we
- * init the relocation root, otherwise, we trip over warnings
- * in ctree.c. The solution used here is to flag ourselves
- * with root IN_TRANS_SETUP. When this is 1, we're still
- * fixing up the reloc trees and everyone must wait.
- *
- * When this is zero, they can trust root->last_trans and fly
- * through btrfs_record_root_in_trans without having to take the
- * lock. smp_wmb() makes sure that all the writes above are
- * done before we pop in the zero below
- */
- ret = btrfs_init_reloc_root(trans, root);
- smp_mb__before_atomic();
- clear_bit(BTRFS_ROOT_IN_TRANS_SETUP, &root->state);
+ spin_unlock(&fs_info->fs_roots_radix_lock);
}
return ret;
}
@@ -476,6 +465,7 @@ void btrfs_add_dropped_root(struct btrfs_trans_handle *trans,
radix_tree_tag_clear(&fs_info->fs_roots_radix,
(unsigned long)root->root_key.objectid,
BTRFS_ROOT_TRANS_TAG);
+ clear_bit(BTRFS_ROOT_RECORDED, &root->state);
spin_unlock(&fs_info->fs_roots_radix_lock);
}
@@ -488,13 +478,7 @@ int btrfs_record_root_in_trans(struct btrfs_trans_handle *trans,
if (!test_bit(BTRFS_ROOT_SHAREABLE, &root->state))
return 0;
- /*
- * see record_root_in_trans for comments about IN_TRANS_SETUP usage
- * and barriers
- */
- smp_rmb();
- if (root->last_trans == trans->transid &&
- !test_bit(BTRFS_ROOT_IN_TRANS_SETUP, &root->state))
+ if (test_bit(BTRFS_ROOT_RECORDED, &root->state))
return 0;
mutex_lock(&fs_info->reloc_mutex);
@@ -1531,6 +1515,7 @@ static noinline int commit_fs_roots(struct btrfs_trans_handle *trans)
radix_tree_tag_clear(&fs_info->fs_roots_radix,
(unsigned long)root->root_key.objectid,
BTRFS_ROOT_TRANS_TAG);
+ clear_bit(BTRFS_ROOT_RECORDED, &root->state);
spin_unlock(&fs_info->fs_roots_radix_lock);
btrfs_free_log(trans, root);
--
2.43.0
Powered by blists - more mailing lists