lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 31 May 2022 14:18:21 -0400
From:   Waiman Long <longman@...hat.com>
To:     Tejun Heo <tj@...nel.org>, Jens Axboe <axboe@...nel.dk>
Cc:     cgroups@...r.kernel.org, linux-block@...r.kernel.org,
        linux-kernel@...r.kernel.org, Ming Lei <ming.lei@...hat.com>,
        Waiman Long <longman@...hat.com>
Subject: [PATCH] blk-cgroup: Optimize blkcg_rstat_flush()

For a system with many CPUs and block devices, the time to do
blkcg_rstat_flush() from cgroup_rstat_flush() can be rather long. It
can be especially problematic as interrupt is disabled during the flush.
It was reported that it might take seconds in some extreme cases leading
to hard lockup messages.

As it is likely that not all the percpu blkg_iostat_set's has been
updated since the last flush, those stale blkg_iostat_set's don't need
to be flushed in this case. This patch optimizes blkcg_rstat_flush()
by checking the current sequence number against the one recorded since
the last flush and skip the blkg_iostat_set if the sequence number
hasn't changed. There is a slight chance that it may miss an update
that is being done in parallel, the new update will just have to wait
until the next flush.

Signed-off-by: Waiman Long <longman@...hat.com>
---
 block/blk-cgroup.c | 18 +++++++++++++++---
 block/blk-cgroup.h |  1 +
 2 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 40161a3f68d0..79b89af61ef2 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -864,11 +864,23 @@ static void blkcg_rstat_flush(struct cgroup_subsys_state *css, int cpu)
 		unsigned long flags;
 		unsigned int seq;
 
+		seq = u64_stats_fetch_begin(&bisc->sync);
+		/*
+		 * If the sequence number hasn't been updated since the last
+		 * flush, we can skip this blkg_iostat_set though we may miss
+		 * an update that is happening in parallel.
+		 */
+		if (seq == bisc->last_seq)
+			continue;
+
 		/* fetch the current per-cpu values */
-		do {
-			seq = u64_stats_fetch_begin(&bisc->sync);
+		while (true) {
+			bisc->last_seq = seq;
 			blkg_iostat_set(&cur, &bisc->cur);
-		} while (u64_stats_fetch_retry(&bisc->sync, seq));
+			if (!u64_stats_fetch_retry(&bisc->sync, seq))
+				break;
+			seq = u64_stats_fetch_begin(&bisc->sync);
+		}
 
 		/* propagate percpu delta to global */
 		flags = u64_stats_update_begin_irqsave(&blkg->iostat.sync);
diff --git a/block/blk-cgroup.h b/block/blk-cgroup.h
index d4de0a35e066..22b4ea139b93 100644
--- a/block/blk-cgroup.h
+++ b/block/blk-cgroup.h
@@ -45,6 +45,7 @@ struct blkg_iostat_set {
 	struct u64_stats_sync		sync;
 	struct blkg_iostat		cur;
 	struct blkg_iostat		last;
+	unsigned int			last_seq;
 };
 
 /* association between a blk cgroup and a request queue */
-- 
2.31.1

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ