linux-ext4 - Re: [PATCH] ext4/jbd2: don't wait (forever) for stale tid caused by wraparound

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20130318143156.GA14430@thunk.org>
Date:	Mon, 18 Mar 2013 10:31:56 -0400
From:	Theodore Ts'o <tytso@....edu>
To:	Ben Hutchings <ben@...adent.org.uk>
Cc:	George Barnett <gbarnett@...assian.com>,
	linux-ext4@...r.kernel.org,
	Debian kernel maintainers <debian-kernel@...ts.debian.org>
Subject: Re: [PATCH] ext4/jbd2: don't wait (forever) for stale tid caused by
 wraparound

On Mon, Mar 18, 2013 at 04:53:15AM +0000, Ben Hutchings wrote:
> 
> We need you to verify that this fix works first.  If it does, it should
> get included in the various 3.x.y stable branches and in Debian kernel
> packages.

I suspect it will be hard for George to verify this, since it requires
a tid wrap, which by definition takes a long time.

Also, this will probably not hit the stable kernels until after the
next merge window, since it's already post -rc3 and I really want to
make sure this gets a lot of testing and review.

I'll also note that I managed to trigger a BUG when incrementing by a
factor of (1 << 24), but we don't see a BUG_ON when incrementing by
((1 << 24) + 1).  (Neither of these testing changes were in the patch
that I sent out; so the patch is "safe" in that I very much doubt it
will make things worse --- those changes were to stress test the patch
so that I wouldn't have to wait several months until the tid wrapped
to test whether we had finally fixed all of the potential problems.)
So there is something we probably do want to look at a bit more
closely before we formally push this fix into mainline.

As far as the Debian servers are concerned, I'm pretty sure the patch
should be safe in that it won't make things worse than they were
before --- however, if you are looking for the lowest risk approach,
it's probably better to simply schedule downtime every few months and
force a reboot at a time that is minimizes developer inconvenience.
You can use "dumpe2fs -h /dev/XXX" to get the current sequence number
of the journal, if you measure the sequence number separated by 24 or
48 hours, you should be able to calculate when the the sequence number
will have incremented by 2**31, and thus calculate the frequency of
scheduled reboots for your workload.

Regards,

						- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html