linux-kernel - Re: [PATCH]: Compress hibernation image with LZO (in-kernel)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4C58D25F.5030705@tuxonice.net>
Date:	Wed, 04 Aug 2010 12:37:19 +1000
From:	Nigel Cunningham <nigel@...onice.net>
To:	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>
CC:	Bojan Smojver <bojan@...ursive.com>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH]: Compress hibernation image with LZO (in-kernel)

Hi.

On 04/08/10 12:18, KAMEZAWA Hiroyuki wrote:
> On Wed, 04 Aug 2010 12:14:19 +1000
> Bojan Smojver<bojan@...ursive.com>  wrote:
>
>> On Wed, 2010-08-04 at 11:02 +0900, KAMEZAWA Hiroyuki wrote:
>>> Then, after resume, all vmalloc() area is resumed as "allocated".
>>>
>>> Wrong ?
>>
>> I actually tried remembering vmalloc() returned pointers into a global
>> variable as you suggested. On resume, they were always set to NULL,
>> which would suggest that what has gotten into the image was the state
>> before vmalloc() was called in save_image(). See:
>> http://lkml.org/lkml/2010/8/2/537.
>>
>> Anyone else wants to comment here?
>>
> Hmm, ok. let's see the result.
>
> The reason I mention about the race is my patch corrupts saved image
> by changing swap_map[] status and swap-cache radix-tree during save_image().
>
> Maybe I don't understand something important.

That's a different issue.

Remember that the snapshot includes more than just the running programs. 
It includes structs recording filesystem info and the state of swap. 
This is why we say you can't safely hibernate, use your filesystem from 
another kernel or OS, then resume. The use of the filesystem in another 
kernel/OS makes the state on disk inconsistent with the state in memory 
that we saved in our image. (I'm assuming it's written to or at least 
that the journal is replayed).

I'm not 100% sure, but it sounds like your issue is the same, but with 
swap. If you free a swap page post-snapshot and it gets used for (say) 
saving a page of the image, then you have a problem post-resume. The 
resumed kernel will think the swap state is still as it originally was 
and might try to swap back in the page of memory that was freed and used 
for the snapshot, creating in-memory corruption.

One solution is to allocate the swap for the image before the snapshot. 
This is what TuxOnIce does - it freezes processes, calculates the image 
statistics and uses them to allocate storage and free memory as 
necessary, writes the first part of the image then does the snapshot and 
writes the remainder. By doing things in this order, the only unknown is 
the amount of memory needed for drivers, and that can be handled pretty 
easily.

Regards,

Nigel
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/