linux-kernel - [V2 PATCH 1/2] aio, memory-hotplug: Fix confliction when migrating and, accessing ring pages

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <531D4E0F.8060907@cn.fujitsu.com>
Date:	Mon, 10 Mar 2014 13:30:55 +0800
From:	Tang Chen <tangchen@...fujitsu.com>
To:	viro@...iv.linux.org.uk, bcrl@...ck.org, jmoyer@...hat.com,
	kosaki.motohiro@...il.com, kosaki.motohiro@...fujitsu.com,
	isimatu.yasuaki@...fujitsu.com, guz.fnst@...fujitsu.com
CC:	linux-fsdevel@...r.kernel.org, linux-aio@...ck.org,
	linux-kernel@...r.kernel.org, Miao Xie <miaox@...fujitsu.com>
Subject: [V2 PATCH 1/2] aio, memory-hotplug: Fix confliction when migrating
 and, accessing ring pages

AIO ring page migration has been implemented by the following patch:


https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/fs/aio.c?id=36bc08cc01709b4a9bb563b35aa530241ddc63e3

In this patch, ctx->completion_lock is used to prevent other processes
from accessing the ring page being migrated.

But in aio_setup_ring(), ioctx_add_table() and aio_read_events_ring(),
when writing to the ring page, they didn't take ctx->completion_lock.

As a result, for example, we have the following problem:

             thread 1                      |              thread 2
                                           |
aio_migratepage()                         |
  |-> take ctx->completion_lock            |
  |-> migrate_page_copy(new, old)          |
  |   *NOW*, ctx->ring_pages[idx] == old   |
                                           |
                                           |    *NOW*, 
ctx->ring_pages[idx] == old
                                           |    aio_read_events_ring()
                                           |     |-> ring = 
kmap_atomic(ctx->ring_pages[0])
                                           |     |-> ring->head = head; 
          *HERE, write to the old ring page*
                                           |     |-> kunmap_atomic(ring);
                                           |
  |-> ctx->ring_pages[idx] = new           |
  |   *BUT NOW*, the content of            |
  |    ring_pages[idx] is old.             |
  |-> release ctx->completion_lock         |

As above, the new ring page will not be updated.

The solution is taking ctx->completion_lock in thread 2, which means,
in aio_setup_ring(), ioctx_add_table() and aio_read_events_ring() when
writing to ring pages.

v2:
   Use spin_lock_irq rather than spin_lock_irqsave as Jeff suggested.

Reported-by: Yasuaki Ishimatsu <isimatu.yasuaki@...fujitsu.com>
Reviewed-by: Jeff Moyer <jmoyer@...hat.com>
Signed-off-by: Tang Chen <tangchen@...fujitsu.com>
---
  fs/aio.c |   28 ++++++++++++++++++++++++++++
  1 files changed, 28 insertions(+), 0 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index 062a5f6..dc70246 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -437,6 +437,14 @@ static int aio_setup_ring(struct kioctx *ctx)
  	ctx->user_id = ctx->mmap_base;
  	ctx->nr_events = nr_events; /* trusted copy */

+	/*
+	 * The aio ring pages are user space pages, so they can be migrated.
+	 * When writing to an aio ring page, we should ensure the page is not
+	 * being migrated. Aio page migration procedure is protected by
+	 * ctx->completion_lock, so we add this lock here.
+	 */
+	spin_lock_irq(&ctx->completion_lock);
+
  	ring = kmap_atomic(ctx->ring_pages[0]);
  	ring->nr = nr_events;	/* user copy */
  	ring->id = ~0U;
@@ -448,6 +456,8 @@ static int aio_setup_ring(struct kioctx *ctx)
  	kunmap_atomic(ring);
  	flush_dcache_page(ctx->ring_pages[0]);

+	spin_unlock_irq(&ctx->completion_lock);
+
  	return 0;
  }

@@ -556,9 +566,17 @@ static int ioctx_add_table(struct kioctx *ctx, 
struct mm_struct *mm)
  					rcu_read_unlock();
  					spin_unlock(&mm->ioctx_lock);

+					/*
+					 * Accessing ring pages must be done
+					 * holding ctx->completion_lock to
+					 * prevent aio ring page migration
+					 * procedure from migrating ring pages.
+					 */
+					spin_lock_irq(&ctx->completion_lock);
  					ring = kmap_atomic(ctx->ring_pages[0]);
  					ring->id = ctx->id;
  					kunmap_atomic(ring);
+					spin_unlock_irq(&ctx->completion_lock);
  					return 0;
  				}

@@ -1066,11 +1084,21 @@ static long aio_read_events_ring(struct kioctx *ctx,
  		head %= ctx->nr_events;
  	}

+	/*
+	 * The aio ring pages are user space pages, so they can be migrated.
+	 * When writing to an aio ring page, we should ensure the page is not
+	 * being migrated. Aio page migration procedure is protected by
+	 * ctx->completion_lock, so we add this lock here.
+	 */
+	spin_lock_irq(&ctx->completion_lock);
+
  	ring = kmap_atomic(ctx->ring_pages[0]);
  	ring->head = head;
  	kunmap_atomic(ring);
  	flush_dcache_page(ctx->ring_pages[0]);

+	spin_unlock_irq(&ctx->completion_lock);
+
  	pr_debug("%li  h%u t%u\n", ret, head, tail);

  	put_reqs_available(ctx, ret);
-- 
1.7.7


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/