lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 2 Aug 2010 15:02:25 +0900
From:	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>
To:	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>
Cc:	Hugh Dickins <hughd@...gle.com>,
	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
	"Rafael J. Wysocki" <rjw@...k.pl>,
	Ondrej Zary <linux@...nbow-software.org>,
	Kernel development list <linux-kernel@...r.kernel.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Balbir Singh <balbir@...ibm.com>,
	Andrea Arcangeli <aarcange@...hat.com>
Subject: [RFC][PATCH -mm] hibernation: freeze swap at hibernation (Was Re:
 Memory corruption during hibernation since 2.6.31

On Fri, 30 Jul 2010 13:14:32 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com> wrote:

> On Thu, 29 Jul 2010 21:10:10 -0700
> Hugh Dickins <hughd@...gle.com> wrote:
> 
> > On Thu, Jul 29, 2010 at 5:01 PM, KAMEZAWA Hiroyuki
> > <kamezawa.hiroyu@...fujitsu.com> wrote:
> > 
> > > I think the best way is kexec(). But maybe rollback from hibernation failure
> > > will be difficult. Considering how crash-dump works well and under maintainance
> > > by many enterprise guys, hibernation-by-kexec is a choice. I think. It can make
> > > reuse of kdump code, ...or, hibernation-resume code can eat kdump image
> > > directly. Maybe the problem will be the speed of dump.
> > 
> > I've no appetite for a total rework of hibernation, and I don't see
> > how that would
> > address the issue: I'm just looking for some protection against swap
> > reuse danger.
> > 
> Okay ;) (And I forget that kexec has to prepare memory at boot time for 2nd kernel.
> It will be harmful for small device guys.)
> 
> I'll prepare a routine not-quick-fix.

Ok, here. Passed easy tests as
# echo disk > /sys/power/state


Looks like a big hammer ? But I think following kind of patch is required.
About swap-reuse, it's only possible when a page is added to swap cache
but try_to_unmap() fails and the page remains in memory. IIUC, most of this
kind of pages will be backed to swap by shrink_all_memory(). So, reuse-swap
happens only when the user unlucky. This patch ignores reuse-swap but freeze
swap_map[] for saving swap_map[] to the disk in consistent way.

==
From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>

swap_map[] is a one of objects which can be update while hibernation. I.E,
usage counter in swap_map[] is updated while hibernation is making a copy
of memory image to the disk.

At resume, hibenation code doesn't call swap_free() against the swaps they
used...So, the swap_map[] will turns to be an initial state.

With small consideration, the question is how swap_map[] updated before
dumping to the disk is treated. In swap-system view, there are no guarantee
that swap_map[] are properly reloaded and there is no leak in swap count.

This patch tries to freeze swap_map[] during hibernation.
By this, no updates will be happen to swap_map[] among save_image().
At load_image(), the swap_map[] has no record for swap entries used by
save_image(), we can simply forget it.

Note: I'm not a specialist of hibernation...so, I'm not sure the hooks
      to kernel/power/user.c is appropriate or not.
      And this disables swap-out once hibernation starts saving.
      Should we afraid of OOM ?

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>
---
 include/linux/swap.h |    7 +++
 kernel/power/swap.c  |   15 ++++++--
 kernel/power/user.c  |    1 
 mm/swapfile.c        |   93 ++++++++++++++++++++++++++++++++++++++-------------
 4 files changed, 90 insertions(+), 26 deletions(-)

Index: mmotm-0727/include/linux/swap.h
===================================================================
--- mmotm-0727.orig/include/linux/swap.h
+++ mmotm-0727/include/linux/swap.h
@@ -316,7 +316,6 @@ extern long nr_swap_pages;
 extern long total_swap_pages;
 extern void si_swapinfo(struct sysinfo *);
 extern swp_entry_t get_swap_page(void);
-extern swp_entry_t get_swap_page_of_type(int);
 extern int valid_swaphandles(swp_entry_t, unsigned long *);
 extern int add_swap_count_continuation(swp_entry_t, gfp_t);
 extern void swap_shmem_alloc(swp_entry_t);
@@ -333,6 +332,12 @@ extern int reuse_swap_page(struct page *
 extern int try_to_free_swap(struct page *);
 struct backing_dev_info;
 
+#ifdef CONFIG_HIBERNATION
+void hibernation_thaw_swap(void);
+swp_entry_t get_swap_for_hibernation(int type);
+void swap_free_for_hibernation(swp_entry_t val);
+#endif
+
 /* linux/mm/thrash.c */
 extern struct mm_struct *swap_token_mm;
 extern void grab_swap_token(struct mm_struct *);
Index: mmotm-0727/mm/swapfile.c
===================================================================
--- mmotm-0727.orig/mm/swapfile.c
+++ mmotm-0727/mm/swapfile.c
@@ -47,6 +47,8 @@ long nr_swap_pages;
 long total_swap_pages;
 static int least_priority;
 
+static bool swap_for_hibernation;
+
 static const char Bad_file[] = "Bad swap file entry ";
 static const char Unused_file[] = "Unused swap file entry ";
 static const char Bad_offset[] = "Bad swap offset entry ";
@@ -449,6 +451,8 @@ swp_entry_t get_swap_page(void)
 	spin_lock(&swap_lock);
 	if (nr_swap_pages <= 0)
 		goto noswap;
+	if (swap_for_hibernation)
+		goto noswap;
 	nr_swap_pages--;
 
 	for (type = swap_list.next; type >= 0 && wrapped < 2; type = next) {
@@ -481,28 +485,6 @@ noswap:
 	return (swp_entry_t) {0};
 }
 
-/* The only caller of this function is now susupend routine */
-swp_entry_t get_swap_page_of_type(int type)
-{
-	struct swap_info_struct *si;
-	pgoff_t offset;
-
-	spin_lock(&swap_lock);
-	si = swap_info[type];
-	if (si && (si->flags & SWP_WRITEOK)) {
-		nr_swap_pages--;
-		/* This is called for allocating swap entry, not cache */
-		offset = scan_swap_map(si, 1);
-		if (offset) {
-			spin_unlock(&swap_lock);
-			return swp_entry(type, offset);
-		}
-		nr_swap_pages++;
-	}
-	spin_unlock(&swap_lock);
-	return (swp_entry_t) {0};
-}
-
 static struct swap_info_struct *swap_info_get(swp_entry_t entry)
 {
 	struct swap_info_struct *p;
@@ -762,6 +744,73 @@ int mem_cgroup_count_swap_user(swp_entry
 #endif
 
 #ifdef CONFIG_HIBERNATION
+
+static pgoff_t hibernation_offset[MAX_SWAPFILES];
+static void hibernation_freeze_swap(void)
+{
+	int i;
+
+	printk(KERN_INFO "PM: Freeze Swap\n");
+	swap_for_hibernation = true;
+	for (i = 0; i < MAX_SWAPFILES; i++)
+		hibernation_offset[i] = 1;
+}
+
+void hibernation_thaw_swap(void)
+{
+	spin_lock(&swap_lock);
+	if (swap_for_hibernation) {
+		printk(KERN_INFO "PM: Thaw Swap\n");
+		swap_for_hibernation = false;
+	}
+	spin_unlock(&swap_lock);
+}
+
+/*
+ * Because updateing swap_map[] can make not-saved-status-change,
+ * we use our own easy allocator.
+ * Please see kernel/power/swap.c, Used swaps are recorded into
+ * RB-tree.
+ */
+swp_entry_t get_swap_for_hibernation(int type)
+{
+	pgoff_t off;
+	swp_entry_t val = {0};
+	struct swap_info_struct *si;
+
+	spin_lock(&swap_lock);
+	/*
+	 * Once hibernation starts to use swap, we freeze swap_map[]. Otherwise,
+	 * saved swap_map[] image to the disk will be an incomplete because it's
+	 * changing without synchronization with hibernation snap shot.
+	 * At resume, we just make swap_for_hibernation=false. We can forget
+	 * used maps easily.
+	 */
+	if (!swap_for_hibernation)
+		hibernation_freeze_swap();
+
+	si = swap_info[type];
+	if (!si || !(si->flags & SWP_WRITEOK))
+		goto done;
+
+	for (off = hibernation_offset[type]; off < si->max; ++off) {
+		if (!si->swap_map[off])
+			break;
+	}
+	if (off < si->max) {
+		val = swp_entry(type, off);
+		hibernation_offset[type] = off + 1;
+	}
+done:
+	spin_unlock(&swap_lock);
+	return val;
+}
+
+void swap_free_for_hibernation(swp_entry_t ent)
+{
+	/* Nothing to do */
+}
+
 /*
  * Find the swap type that corresponds to given device (if any).
  *
Index: mmotm-0727/kernel/power/swap.c
===================================================================
--- mmotm-0727.orig/kernel/power/swap.c
+++ mmotm-0727/kernel/power/swap.c
@@ -135,10 +135,10 @@ sector_t alloc_swapdev_block(int swap)
 {
 	unsigned long offset;
 
-	offset = swp_offset(get_swap_page_of_type(swap));
+	offset = swp_offset(get_swap_for_hibernation(swap));
 	if (offset) {
 		if (swsusp_extents_insert(offset))
-			swap_free(swp_entry(swap, offset));
+			swap_free_for_hibernation(swp_entry(swap, offset));
 		else
 			return swapdev_block(swap, offset);
 	}
@@ -162,10 +162,11 @@ void free_all_swap_pages(int swap)
 		ext = container_of(node, struct swsusp_extent, node);
 		rb_erase(node, &swsusp_extents);
 		for (offset = ext->start; offset <= ext->end; offset++)
-			swap_free(swp_entry(swap, offset));
+			swap_free_for_hibernation(swp_entry(swap, offset));
 
 		kfree(ext);
 	}
+	hibernation_thaw_swap();
 }
 
 int swsusp_swap_in_use(void)
@@ -440,10 +441,12 @@ int swsusp_write(unsigned int flags)
 	error = get_swap_writer(&handle);
 	if (error) {
 		printk(KERN_ERR "PM: Cannot get swap writer\n");
+		hibernation_thaw_swap();
 		return error;
 	}
 	if (!enough_swap(pages)) {
 		printk(KERN_ERR "PM: Not enough free swap\n");
+		hibernation_thaw_swap();
 		error = -ENOSPC;
 		goto out_finish;
 	}
@@ -461,6 +464,8 @@ int swsusp_write(unsigned int flags)
 		error = save_image(&handle, &snapshot, pages - 1);
 out_finish:
 	error = swap_writer_finish(&handle, flags, error);
+	if (error)
+		hibernation_thaw_swap();
 	return error;
 }
 
@@ -614,6 +619,10 @@ int swsusp_read(unsigned int *flags_p)
 	if (!error)
 		error = load_image(&handle, &snapshot, header->pages - 1);
 	swap_reader_finish(&handle);
+	/*
+	 * When we saved kernel image, swap is freezed, thaw it.
+	 */
+	hibernation_thaw_swap();
 end:
 	if (!error)
 		pr_debug("PM: Image successfully loaded\n");
Index: mmotm-0727/kernel/power/user.c
===================================================================
--- mmotm-0727.orig/kernel/power/user.c
+++ mmotm-0727/kernel/power/user.c
@@ -135,6 +135,7 @@ static int snapshot_release(struct inode
 	free_basic_memory_bitmaps();
 	data = filp->private_data;
 	free_all_swap_pages(data->swap);
+	hibernation_thaw_swap();
 	if (data->frozen)
 		thaw_processes();
 	pm_notifier_call_chain(data->mode == O_WRONLY ?


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ