lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 29 Sep 2006 12:20:26 -0700
From:	Andrew Morton <akpm@...l.org>
To:	Badari Pulavarty <pbadari@...ibm.com>, Ingo Molnar <mingo@...e.hu>
Cc:	Jan Kara <jack@...e.cz>, torvalds@...l.org, stable@...nel.org,
	ext4 <linux-ext4@...r.kernel.org>
Subject: Re: [patch 003/152] jbd: fix commit of ordered data buffers

On Fri, 29 Sep 2006 09:11:46 -0700
Badari Pulavarty <pbadari@...ibm.com> wrote:

> On Fri, 2006-09-29 at 11:02 +0200, Jan Kara wrote:
> ...
> > > >+		}
> > > >+		/* Someone already cleaned up the buffer? */
> > > >+		if (!buffer_jbd(bh)
> > > >+			|| jh->b_transaction != commit_transaction
> > > >+			|| jh->b_jlist != BJ_SyncData) {
> > > >+			jbd_unlock_bh_state(bh);
> > > >+			if (locked)
> > > >+				unlock_buffer(bh);
> > > >+			BUFFER_TRACE(bh, "already cleaned up");
> > > >+			put_bh(bh);
> > > >+			continue;
> >    ---> Here the buffer was refiled by someone else
> 
> I am little concerned about this particular code. We know that
> some one else will do the unfile/remove - but we will keep
> spinning on it till that happens. Isn't it ? Why don't we
> skip it and move to next one ?

That check looks OK to me.  Either

a) the buffer has been taken off the journal altogether or

b) it's been moved to the running transaction or

c) it's jounalled, it's ont he committing transaction but it's not on
   t_sync_datalist any more.

So it shouldn't be possible for that buffer to be on
commit_transaction->t_sync_datalist.

> We are seeing few following message while running tests and
> wondering if your patch is causing it ..
> 
> BUG: spinlock lockup on CPU#1, scp/30189, c00000000fb503d8 (Not tainted)
> Call Trace:
> [C000000018FDB320] [C0000000000102E0] .show_stack+0x68/0x1b0 (unreliable)
> [C000000018FDB3C0] [C0000000001734F4] ._raw_spin_lock+0x138/0x184
> [C000000018FDB460] [C00000000025AD24] ._spin_lock+0x10/0x24
> [C000000018FDB4E0] [D000000000172E14] .journal_dirty_data+0xa4/0x2c0 [jbd]
> [C000000018FDB580] [D000000000205BAC] .ext3_journal_dirty_data+0x28/0x70 [ext3]
> [C000000018FDB610] [D0000000002048BC] .walk_page_buffers+0xb0/0x134 [ext3]
> [C000000018FDB6D0] [D000000000208280] .ext3_ordered_commit_write+0x74/0x114 

Presumably j_list_lock got stuck.  What we really need to see here is
the backtrace from other CPUs.

gad, there have been so many all-CPU-backtrace patches over the years.

<optimistically cc's Ingo>

Ingo, do you think that's something which we shuld have in the spinlock
debugging code?  A trace to let us see which CPU is holding that lock,
and where from?  I guess if the other cpu is stuck in spin_lock_irqsave()
then we'll get stuck delivering the IPI, so it'd need to be async.


-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists