linux-kernel - Re: [next] unix stream crashes

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CA+icZUVuJWpLquLk4kHJzmMKtPw24oxud=u3Na0U0HhSYqwV1w@mail.gmail.com>
Date:	Sat, 3 Sep 2011 07:54:47 +0200
From:	Sedat Dilek <sedat.dilek@...glemail.com>
To:	Valdis.Kletnieks@...edu
Cc:	Tim Chen <tim.c.chen@...ux.intel.com>,
	Jiri Slaby <jirislaby@...il.com>,
	"David S. Miller" <davem@...emloft.net>,
	ML netdev <netdev@...r.kernel.org>,
	LKML <linux-kernel@...r.kernel.org>,
	Stephen Rothwell <sfr@...b.auug.org.au>
Subject: Re: [next] unix stream crashes

On Sat, Sep 3, 2011 at 7:35 AM,  <Valdis.Kletnieks@...edu> wrote:
> On Fri, 02 Sep 2011 16:55:03 PDT, Tim Chen said:
>
>> I'll like to isolate the problem to either the send path or receive
>> path. My suspicion is the error handling portion of the send path is not
>> quite right but I haven't yet found any issues after reviewing the
>> patch.
>
> Took a while, because it took a few tries to get netconsole working,
> and then I was seeing odd results, but here we go:
>
> next-20110831 - crashes 100% consistent.
> next-20110831 + revert 0856a30409 - OK.
> revert + scm_recv.patch - OK.
> revert + scm_send.patch - crashes 100% consistent.
>

YES, I can confirm this with next-20110826.

> Now the odd part - although I was seeing crashes 100% of the time, I saw a
> number of different tracebacks (but I never actually saw the same traceback
> that Jiri had). Also, the system died at different points - most of the time it
> would live long enough for GDM to prompt for a userid/password and then die,
> but sometimes it didn't get as far as the GDM screen. Hopefully the variety of
> crashes will tell you something useful.
>
> I'll be able to test patches for go/nogo over the weekend, but probably won't
> have a second machine to catch netconsole until I'm back in the office Monday.
>
> Example 1:
>
> [  142.316258] Kernel panic - not syncing: CRED: put_cred_rcu() sees ffff88010d1ff300 with usage -41
> [  142.316260]
> [  142.316275] Pid: 2264, comm: gdm-simple-slav Tainted: G        W   3.1.0-rc4-next-20110831-dirty #17
> [  142.316279] Call Trace:
> [  142.316283]  <IRQ>  [<ffffffff81577a6c>] panic+0x96/0x1a2
> [  142.316300]  [<ffffffff8105cb54>] put_cred_rcu+0x32/0x91
> [  142.316306]  [<ffffffff8157a44f>] rcu_do_batch+0xcb/0x1e4
> [  142.316313]  [<ffffffff81092967>] invoke_rcu_callbacks+0x6c/0xc7
> [  142.316319]  [<ffffffff810932f8>] __rcu_process_callbacks+0x118/0x124
> [  142.316325]  [<ffffffff810934f0>] rcu_process_callbacks+0x64/0x72
> [  142.316331]  [<ffffffff8103f8c4>] __do_softirq+0x110/0x278
> [  142.316338]  [<ffffffff815a23ac>] call_softirq+0x1c/0x30
> [  142.316342]  <EOI>  [<ffffffff81003647>] do_softirq+0x44/0xf1
> [  142.316352]  [<ffffffff8103f485>] _local_bh_enable_ip+0x12a/0x178
> [  142.316358]  [<ffffffff8103f4dc>] local_bh_enable_ip+0x9/0xb
> [  142.316364]  [<ffffffff8159a2f3>] _raw_write_unlock_bh+0x36/0x3a
> [  142.316372]  [<ffffffff814c1ac3>] unix_release_sock+0x86/0x1ff
> [  142.316378]  [<ffffffff8105b548>] ? up_read+0x1b/0x32
> [  142.316383]  [<ffffffff814c1c5d>] unix_release+0x21/0x23
> [  142.316390]  [<ffffffff81423d02>] sock_release+0x1a/0x6f
> [  142.316395]  [<ffffffff81424a30>] sock_close+0x22/0x26
> [  142.316401]  [<ffffffff810fcacb>] __fput+0x140/0x1fe
> [  142.316407]  [<ffffffff810f97cb>] ? sys_close+0xe6/0x158
> [  142.316412]  [<ffffffff810fcb9e>] fput+0x15/0x17
> [  142.316417]  [<ffffffff810f8ef2>] filp_close+0x87/0x93
> [  142.316422]  [<ffffffff810f97d6>] sys_close+0xf1/0x158
> [  142.316429]  [<ffffffff815a0ffb>] system_call_fastpath+0x16/0x1b
>

I saw similiar call-traces with put_cred_rcu() - besides with
kmem_cache_alloc_trace().
My post-it says:
Kernel panic - not syncing: CRED: put_cred_rcu sees f67ac0c0 with usage -43

BTW, systemd (uses dbus/sockets) is more sensitive than Debian's
standard sysvinit.

- Sedat -
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/