lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:	Tue, 6 Dec 2011 15:26:31 -0600
From:	"Moffett, Kyle D" <Kyle.D.Moffett@...ing.com>
To:	Jan Kara <jack@...e.cz>
CC:	Sean Ryle <seanbo@...il.com>, "Ted Ts'o" <tytso@....edu>,
	"615998@...s.debian.org" <615998@...s.debian.org>,
	"linux-ext4@...r.kernel.org" <linux-ext4@...r.kernel.org>,
	Sachin Sant <sachinp@...ibm.com>,
	"Aneesh Kumar K.V" <aneesh.kumar@...ux.vnet.ibm.com>
Subject: Re: Bug#615998: linux-image-2.6.32-5-xen-amd64: Repeatable "kernel
 BUG at fs/jbd2/commit.c:534" from Postfix on ext4

Hello again!

I know it's been ages, but I finally got some time to get that patch
tested out and try additional debugging.

On Sep 01, 2011, at 11:17, Jan Kara wrote:
> On Tue 30-08-11 19:26:22, Moffett, Kyle D wrote:
>> On Aug 30, 2011, at 18:12, Jan Kara wrote:
>>>> I can still trigger it on my VM snapshot very easily, so if you have anything
>>>> you think I should test I would be very happy to give it a shot.
>>> 
>>> OK, so in the meantime I found a bug in data=journal code which could be
>>> related to your problem. It is fixed by commit
>>> 2d859db3e4a82a365572592d57624a5f996ed0ec which is in 3.1-rc1. Have you
>>> tried that or newer kernel as well?
>>> 
>>> If the problem still is not fixed, I can provide some debugging patch to
>>> you. We spoke with Josef Bacik how errors like yours could happen so I have
>>> some places to watch...
>> 
>> I have not tried anything more recent; I'm actually a bit reluctant to move
>> away from the Debian squeeze official kernels since I do need the security
>> updates.
>> 
>> I took a quick look and I can't find that function in 2.6.32, so I assume it
>> would be a rather nontrivial back-port.  It looks like the relevant code
>> used to be in ext4_clear_inode somewhere?
> It's not that hard - untested patch attached.

So this applied mostly cleanly (with one minor context-only conflict in
the 2.6.32.17 patch), unfortunately it didn't resolve the problem.
Just as a sanity check, I upgraded to the Debian 3.1.0-1-amd64 kernel,
based on kernel version 3.1.1 and the problem still occurs there too
(additional info at the end of the email).

Looking at the issue again, I don't think it has anything to do with
file deletion at all.

Specifically, there are a grand total of 4 files in that filesystem
(alongside an empty "lost+found" directory):
  master.lock
  prng_exch
  smtpd_scache.db
  smtp_scache.db

As far as I can tell, none of those is ever deleted during normal
operation.

The crash occurs very quickly after starting postfix.  It connects to
the external email server (using TLS) and begins to flush queued mail.

At that point, the "tlsmgr" daemon tries to update the "smtp_scache.db"
file, which is a Berkeley DB about 40k in size.  Somewhere in there,
the Berkeley DB does an fdatasync().

The "fdatasync()" apparently triggers the bad behavior from the "jbd2"
thread, which then oopses in fs/jbd2/commit.c:485 (which appears to be
the same same BUG_ON() as before).

The stack looks something like this:
  jbd_journal_commit_transaction+0x4ea/0x1053 [jbd2]
  kjournald2+0xc0/0x20a [jbd2]
  add_wait_queue+0x3c/0x3c
  commit_timeout+0x5/0x5 [jbd2]
  kthread+0x76/0x7e

Cheers,
Kyle Moffett

--
Curious about my work on the Debian powerpcspe port?
I'm keeping a blog here: http://pureperl.blogspot.com/
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ