linux-kernel - [PATCH 4/5] mm/swapfile: refcount block and queue before using blkcg_schedule

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20200414041902.16769-5-mcgrof@kernel.org>
Date:   Tue, 14 Apr 2020 04:19:01 +0000
From:   Luis Chamberlain <mcgrof@...nel.org>
To:     axboe@...nel.dk, viro@...iv.linux.org.uk, bvanassche@....org,
        gregkh@...uxfoundation.org, rostedt@...dmis.org, mingo@...hat.com,
        jack@...e.cz, ming.lei@...hat.com, nstange@...e.de,
        akpm@...ux-foundation.org
Cc:     mhocko@...e.com, yukuai3@...wei.com, linux-block@...r.kernel.org,
        linux-fsdevel@...r.kernel.org, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org, Luis Chamberlain <mcgrof@...nel.org>,
        Omar Sandoval <osandov@...com>,
        Hannes Reinecke <hare@...e.com>,
        Michal Hocko <mhocko@...nel.org>
Subject: [PATCH 4/5] mm/swapfile: refcount block and queue before using blkcg_schedule_throttle()

block devices are refcounted so to ensure once its final user goes away it
can be cleaned up by the lower layers properly. The block device's
request_queue structure is also refcounted, however, if the last
blk_put_queue() is called under atomic context the block layer has
to defer removal.

By refcounting the block device during the use of blkcg_schedule_throttle(),
we ensure ensure two things:

1) the block device remains available during the call
2) we ensure avoid having to deal with the fact we're using the
   request_queue structure in atomic context, since the last
   blk_put_queue() will be called upon disk_release(), *after*
   our own bdput().

This means this code path is *not* going to remove the request_queue
structure, as we are ensuring some later upper layer disk_release()
will be the one to release the request_queue structure for us.

Cc: Bart Van Assche <bvanassche@....org>
Cc: Omar Sandoval <osandov@...com>
Cc: Hannes Reinecke <hare@...e.com>
Cc: Nicolai Stange <nstange@...e.de>
Cc: Greg Kroah-Hartman <gregkh@...uxfoundation.org>
Cc: Michal Hocko <mhocko@...nel.org>
Cc: yu kuai <yukuai3@...wei.com>
Signed-off-by: Luis Chamberlain <mcgrof@...nel.org>
---
 mm/swapfile.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/mm/swapfile.c b/mm/swapfile.c
index 6659ab563448..9285ff6030ca 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -3753,6 +3753,7 @@ static void free_swap_count_continuations(struct swap_info_struct *si)
 void mem_cgroup_throttle_swaprate(struct mem_cgroup *memcg, int node,
 				  gfp_t gfp_mask)
 {
+	struct block_device *bdev;
 	struct swap_info_struct *si, *next;
 	if (!(gfp_mask & __GFP_IO) || !memcg)
 		return;
@@ -3771,8 +3772,17 @@ void mem_cgroup_throttle_swaprate(struct mem_cgroup *memcg, int node,
 	plist_for_each_entry_safe(si, next, &swap_avail_heads[node],
 				  avail_lists[node]) {
 		if (si->bdev) {
-			blkcg_schedule_throttle(bdev_get_queue(si->bdev),
-						true);
+			bdev = bdgrab(si->bdev);
+			if (!bdev)
+				continue;
+			/*
+			 * By adding our own bdgrab() we ensure the queue
+			 * sticks around until disk_release(), and so we ensure
+			 * our release of the request_queue does not happen in
+			 * atomic context.
+			 */
+			blkcg_schedule_throttle(bdev_get_queue(bdev), true);
+			bdput(bdev);
 			break;
 		}
 	}
-- 
2.25.1