[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1457146299-1601-3-git-send-email-Waiman.Long@hpe.com>
Date: Fri, 4 Mar 2016 21:51:39 -0500
From: Waiman Long <Waiman.Long@....com>
To: Tejun Heo <tj@...nel.org>,
Christoph Lameter <cl@...ux-foundation.org>,
Dave Chinner <dchinner@...hat.com>
Cc: xfs@....sgi.com, linux-kernel@...r.kernel.org,
Ingo Molnar <mingo@...hat.com>,
Peter Zijlstra <peterz@...radead.org>,
Scott J Norton <scott.norton@...com>,
Douglas Hatch <doug.hatch@...com>,
Waiman Long <Waiman.Long@....com>
Subject: [RFC PATCH 2/2] xfs: Allow degeneration of m_fdblocks/m_ifree to global counters
Small XFS filesystems on systems with large number of CPUs can incur a
significant overhead due to excessive calls to the percpu_counter_sum()
function which needs to walk through a large number of different
cachelines.
This patch uses the newly added percpu_counter_set_limit() API to
potentially switch the m_fdblocks and m_ifree per-cpu counters to
a global counter with locks at filesystem mount time if its size
is small relatively to the number of CPUs available.
A possible use case is the use of the NVDIMM as an application scratch
storage area for log file and other small files. Current battery-backed
NVDIMMs are pretty small in size, e.g. 8G per DIMM. So we cannot create
large filesystem on top of them.
On a 4-socket 80-thread system running 4.5-rc6 kernel, this patch can
improve the throughput of the AIM7 XFS disk workload by 25%. Before
the patch, the perf profile was:
18.68% 0.08% reaim [k] __percpu_counter_compare
18.05% 9.11% reaim [k] __percpu_counter_sum
0.37% 0.36% reaim [k] __percpu_counter_add
After the patch, the perf profile was:
0.73% 0.36% reaim [k] __percpu_counter_add
0.27% 0.27% reaim [k] __percpu_counter_compare
Signed-off-by: Waiman Long <Waiman.Long@....com>
---
fs/xfs/xfs_mount.c | 1 -
fs/xfs/xfs_mount.h | 5 +++++
fs/xfs/xfs_super.c | 6 ++++++
3 files changed, 11 insertions(+), 1 deletions(-)
diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
index bb753b3..fe74b91 100644
--- a/fs/xfs/xfs_mount.c
+++ b/fs/xfs/xfs_mount.c
@@ -1163,7 +1163,6 @@ xfs_mod_ifree(
* a large batch count (1024) to minimise global counter updates except when
* we get near to ENOSPC and we have to be very accurate with our updates.
*/
-#define XFS_FDBLOCKS_BATCH 1024
int
xfs_mod_fdblocks(
struct xfs_mount *mp,
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index b570984..d9520f4 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -206,6 +206,11 @@ typedef struct xfs_mount {
#define XFS_WSYNC_WRITEIO_LOG 14 /* 16k */
/*
+ * FD blocks batch size for per-cpu compare
+ */
+#define XFS_FDBLOCKS_BATCH 1024
+
+/*
* Allow large block sizes to be reported to userspace programs if the
* "largeio" mount option is used.
*
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 59c9b7b..c0b4f79 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -1412,6 +1412,12 @@ xfs_reinit_percpu_counters(
percpu_counter_set(&mp->m_icount, mp->m_sb.sb_icount);
percpu_counter_set(&mp->m_ifree, mp->m_sb.sb_ifree);
percpu_counter_set(&mp->m_fdblocks, mp->m_sb.sb_fdblocks);
+
+ /*
+ * Use default batch size for m_ifree
+ */
+ percpu_counter_set_limit(&mp->m_ifree, 0);
+ percpu_counter_set_limit(&mp->m_fdblocks, 4 * XFS_FDBLOCKS_BATCH);
}
static void
--
1.7.1
Powered by blists - more mailing lists