linux-kernel - Re: Problems with 2.6.17-rt8

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <1154615261.32264.6.camel@localhost.localdomain>
Date:	Thu, 03 Aug 2006 10:27:41 -0400
From:	Steven Rostedt <rostedt@...dmis.org>
To:	Robert Crocombe <rcrocomb@...il.com>
Cc:	linux-kernel@...r.kernel.org, Ingo Molnar <mingo@...e.hu>,
	Thomas Gleixner <tglx@...utronix.de>,
	Bill Huey <billh@...ppy.monkey.org>
Subject: Re: Problems with 2.6.17-rt8

Please don't trim CC lines.  LKML is too big to read all emails.

On Thu, 2006-08-03 at 04:48 -0700, Robert Crocombe wrote: 
> On 8/2/06, Steven Rostedt <rostedt@...dmis.org> wrote:
> > You mention problems but I don't see you listing what exactly the
> > problems are.  Just saying "the problems exist" doesn't tell us
> > anything.
> >
> > Don't assume that we will go to some web site to figure out what you're
> > talking about. Please list the problems you are facing.
> 
> The machine dies (no alt-sysrq, no keyboard LEDs of any kind: dead in
> the water).  I thought the log would provide more useful information
> without potentially erroneous editorialization by myself.  Here are
> some highlights:
> 
> kjournald/1105[CPU#3]: BUG in debug_rt_mutex_unlock at kernel/rtmutex-debug.c:47
> 1

Ouch, that looks like kjournald is unlocking a lock that it doesn't own?

> 
> Call Trace:
>        <ffffffff8047655a>{_raw_spin_lock_irqsave+24}
>        <ffffffff8022b272>{__WARN_ON+100}
>        <ffffffff802457e4>{debug_rt_mutex_unlock+199}
>        <ffffffff804757b7>{rt_lock_slowunlock+25}
>        <ffffffff80476301>{__lock_text_start+9}

hmm, here we are probably having trouble with the percpu slab locks,
that is somewhat of a hack to get slabs working on a per cpu basis.

>        <ffffffff80271e93>{kmem_cache_alloc+202}

It would also be nice to know exactly where ffffffff80271e93 is.

>        <ffffffff8025493b>{mempool_alloc_slab+17}
>        <ffffffff80254d07>{mempool_alloc+75}
>        <ffffffff802f2f8c>{generic_make_request+375}
>        <ffffffff8027b914>{bio_alloc_bioset+35}
>        <ffffffff8027ba2a>{bio_alloc+16}
>        <ffffffff802781d1>{submit_bh+137}
>        <ffffffff80279377>{ll_rw_block+122}
>        <ffffffff8027939e>{ll_rw_block+161}
>        <ffffffff802c85dc>{journal_commit_transaction+1011}
>        <ffffffff80476a5f>{_raw_spin_unlock_irqrestore+56}
>        <ffffffff804769ac>{_raw_spin_unlock+46}
>        <ffffffff804757df>{rt_lock_slowunlock+65}
>        <ffffffff80476301>{__lock_text_start+9}
>        <ffffffff802339b0>{try_to_del_timer_sync+85}
>        <ffffffff802cca63>{kjournald+202}
>        <ffffffff8023db60>{autoremove_wake_function+0}
>        <ffffffff802cc999>{kjournald+0}
>        <ffffffff8023d739>{keventd_create_kthread+0}
>        <ffffffff8023da2f>{kthread+219}
>        <ffffffff80225a23>{schedule_tail+188}
>        <ffffffff8020aaca>{child_rip+8}
>        <ffffffff8023d739>{keventd_create_kthread+0}
>        <ffffffff8023d954>{kthread+0}
>        <ffffffff8020aac2>{child_rip+0}
> ---------------------------
> | preempt count: 00000002 ]
> | 2-level deep critical section nesting:
> ----------------------------------------
> .. [<ffffffff80476499>] .... _raw_spin_lock+0x16/0x23
> .....[<ffffffff804757af>] ..   ( <= rt_lock_slowunlock+0x11/0x6b)
> .. [<ffffffff8047655a>] .... _raw_spin_lock_irqsave+0x18/0x29
> .....[<ffffffff8022b22d>] ..   ( <= __WARN_ON+0x1f/0x82)
> 
> 
> Somewhat later:
> 
> Kernel BUG at kernel/rtmutex.c:639

The rest was probably caused as a side effect from above.  The above is
already broken!

You have NUMA configured too, so this is also something to look at.

I still wouldn't ignore the first bug message you got:

----
BUG: scheduling while atomic: udev_run_devd/0x00000001/1568

Call Trace:
       <ffffffff8045c693>{__schedule+155}
       <ffffffff8045f156>{_raw_spin_unlock_irqrestore+53}
       <ffffffff80242241>{task_blocks_on_rt_mutex+518}
       <ffffffff80252da0>{free_pages_bulk+39}
       <ffffffff80252da0>{free_pages_bulk+39}
...
----

This could also have a side effect that messes things up.

Unfortunately, right now I'm assigned to other tasks and I cant spend
much more time on this at the moment.  So hopefully, Ingo, Thomas or
Bill, or someone else can help you find the reason for this problem.

-- Steve


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/