[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20171003114243.827256796@linuxfoundation.org>
Date: Tue, 3 Oct 2017 14:29:24 +0200
From: Greg Kroah-Hartman <gregkh@...uxfoundation.org>
To: linux-kernel@...r.kernel.org
Cc: Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
stable@...r.kernel.org, Ilya Dryomov <idryomov@...il.com>,
Sage Weil <sage@...hat.com>
Subject: [PATCH 4.13 062/110] libceph: dont allow bidirectional swap of pg-upmap-items
4.13-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ilya Dryomov <idryomov@...il.com>
commit 29a0cfbf91ba997591535a4f7246835ce8328141 upstream.
This reverts most of commit f53b7665c8ce ("libceph: upmap semantic
changes").
We need to prevent duplicates in the final result. For example, we
can currently take
[1,2,3] and apply [(1,2)] and get [2,2,3]
or
[1,2,3] and apply [(3,2)] and get [1,2,2]
The rest of the system is not prepared to handle duplicates in the
result set like this.
The reverted piece was intended to allow
[1,2,3] and [(1,2),(2,1)] to get [2,1,3]
to reorder primaries. First, this bidirectional swap is hard to
implement in a way that also prevents dups. For example, [1,2,3] and
[(1,4),(2,3),(3,4)] would give [4,3,4] but would we just drop the last
step we'd have [4,3,3] which is also invalid, etc. Simpler to just not
handle bidirectional swaps. In practice, they are not needed: if you
just want to choose a different primary then use primary_affinity, or
pg_upmap (not pg_upmap_items).
Link: http://tracker.ceph.com/issues/21410
Signed-off-by: Ilya Dryomov <idryomov@...il.com>
Reviewed-by: Sage Weil <sage@...hat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@...uxfoundation.org>
---
net/ceph/osdmap.c | 33 ++++++++++++++++++++++++---------
1 file changed, 24 insertions(+), 9 deletions(-)
--- a/net/ceph/osdmap.c
+++ b/net/ceph/osdmap.c
@@ -2445,19 +2445,34 @@ static void apply_upmap(struct ceph_osdm
pg = lookup_pg_mapping(&osdmap->pg_upmap_items, pgid);
if (pg) {
- for (i = 0; i < raw->size; i++) {
- for (j = 0; j < pg->pg_upmap_items.len; j++) {
- int from = pg->pg_upmap_items.from_to[j][0];
- int to = pg->pg_upmap_items.from_to[j][1];
+ /*
+ * Note: this approach does not allow a bidirectional swap,
+ * e.g., [[1,2],[2,1]] applied to [0,1,2] -> [0,2,1].
+ */
+ for (i = 0; i < pg->pg_upmap_items.len; i++) {
+ int from = pg->pg_upmap_items.from_to[i][0];
+ int to = pg->pg_upmap_items.from_to[i][1];
+ int pos = -1;
+ bool exists = false;
- if (from == raw->osds[i]) {
- if (!(to != CRUSH_ITEM_NONE &&
- to < osdmap->max_osd &&
- osdmap->osd_weight[to] == 0))
- raw->osds[i] = to;
+ /* make sure replacement doesn't already appear */
+ for (j = 0; j < raw->size; j++) {
+ int osd = raw->osds[j];
+
+ if (osd == to) {
+ exists = true;
break;
}
+ /* ignore mapping if target is marked out */
+ if (osd == from && pos < 0 &&
+ !(to != CRUSH_ITEM_NONE &&
+ to < osdmap->max_osd &&
+ osdmap->osd_weight[to] == 0)) {
+ pos = j;
+ }
}
+ if (!exists && pos >= 0)
+ raw->osds[pos] = to;
}
}
}
Powered by blists - more mailing lists