lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <200709270652.58205.nigel@nigel.suspend2.net>
Date:	Thu, 27 Sep 2007 06:52:57 +1000
From:	Nigel Cunningham <nigel@...el.suspend2.net>
To:	Joseph Fannin <jfannin@...il.com>
Cc:	Pavel Machek <pavel@....cz>,
	"Eric W. Biederman" <ebiederm@...ssion.com>, nigel@...pend2.net,
	"Huang, Ying" <ying.huang@...el.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Jeremy Maitin-Shepard <jbms@....edu>,
	linux-kernel@...r.kernel.org, linux-pm@...ts.linux-foundation.org,
	Kexec Mailing List <kexec@...ts.infradead.org>
Subject: Re: [RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

Hi.

On Thursday 27 September 2007 06:30:36 Joseph Fannin wrote:
> On Fri, Sep 21, 2007 at 11:45:12AM +0200, Pavel Machek wrote:
> > Hi!
> > > >
> > > > Sounds doable, as long as you can cope with long command lines (which
> > > > shouldn't be a biggie). (If you've got a swapfile or parts of a swap
> > > > partition already in use, it can be quite fragmented).
> > >
> > > Hmm.  This is an interesting problem.  Sharing a swap file or a swap
> > > partition with the actual swap of user space pages does seem to be
> > > a limitation of this approach.
> > >
> > > Although the fact that it is simple to write to a separate file may
> > > be a reasonable compensation.
> >
> > I'm not sure how you'd write it to a separate file. Notice that kjump
> > kernel may not mount journalling filesystems, not even
> > read-only. (Ext3 replays journal in that case). You could pass block
> > numbers from the original kernel...
> 
>     The ext3 thing is a bug, the case for which I don't think has been
> adequately explained to the ext[34] folks.  There should be at least a
> no_replay mount flag available, or something.  It has ramifications
> for more than just hibernation.
> 
>     And yeah, I'm gonna bring up the swap files thing again.  If you
> can hibernate to a swap file, you can hibernate to a dedicated
> hibernation file, and vice versa.
> 
>     If you can't hibernate to a swap file, then swap files are
> effectively unsupported for any system you might want to hibernate.
> <handwave> I wonder what embedded folks would think about that
> </handwave>.
> 
>     But, in my ignorance, I'm not sure even fixing the ext3 bug will
> guarantee you consistent metadata so that you can handle a
> swap/hibernate file.  You can do a sync(), but how do you make that
> not race against running processes without the freezer, or blkdev
> snapshots?
> 
>     I guess uswsusp and the-patch-previously-known-as-suspend2 handle
> this somehow, though.
> 
>    (It's that same ignorance that has me waiting for someone with
> established credit with kernel people to make that argument for the
> ext3 bug, so I can hang my own reasons for thinking that it's bad off
> of theirs).

I haven't looked at swsusp support, but TuxOnIce handles all storage (swap 
partitions, swap files and ordinary files) by first allocating swap (if we're 
using swap), then bmapping the storage we're going to use. After that, we can 
freeze filesystems and processes with impunity. The allocated storage is then 
viewed as just a collection of bdevs, each with an ordered chain of extents 
defining which blocks we're going to read/write - a series of tapes if you 
like. In the image header, we store dev_ts and the block chains, together 
with the configuration information. As long as the same bdevs are configured 
at boot time prior to the echo > /sys/power/resume, we're in business. 
Filesystems don't need to be mounted because we don't use filesystem code 
anyway. (LVM etc does though in so far as it's needed to make the dev_t match 
the device again).

This matches with what you said above about hibernating to swap files and 
dedicated hibernation files - TuxOnIce uses exactly the same code to do the 
i/o to both; the variation is in the code to recognise the image header and 
allocate/free/bmap storage.

<not a filesystem expert> Personally, I don't think ext[34] is broken. If 
there's data being left in the journal that will need replaying, then 
mounting without replaying the journal sounds wrong. Perhaps you should 
instead be arguing that nothing should be left in the journal after a 
filesystem freeze. But, of course, current code isn't doing a filesystem 
freeze (just a process freeze) and the kexec guys want to take even that 
away. </not a filesystem expert>

In short, I agree. AFAICS, you need both the process freezer and filesystem 
freezing to make this thing fly properly.

Nigel
-- 
See http://www.tuxonice.net for Howtos, FAQs, mailing
lists, wiki and bugzilla info.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ