lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Sun, 14 Sep 2008 18:59:08 +0200
From:	Ingo Molnar <mingo@...e.hu>
To:	Suresh Siddha <suresh.b.siddha@...el.com>,
	"Pallipadi, Venkatesh" <venkatesh.pallipadi@...el.com>
Cc:	hpa@...or.com, tglx@...utronix.de, arjan@...ux.intel.com,
	linux-kernel@...r.kernel.org, Jeremy Fitzhardinge <jeremy@...p.org>
Subject: Re: [patch 0/7] x86, cpa: cpa related changes to be inline with
	TLB Application note


* Ingo Molnar <mingo@...e.hu> wrote:

> > Signed-off-by: Suresh Siddha <suresh.b.siddha@...el.com>
> 
> applied to tip/x86/pat, thanks Suresh.

hm, -tip testing found sporadic lockups on a testbox today:

[   17.465685] Freeing unused kernel memory: 1544k freed
[   17.468035] Write protecting the kernel read-only data: 2684k
[   17.476049] Testing CPA: undo c0814000-c0ab3000
[   17.480369] Testing CPA: write protecting again
[   33.700054] CPA self-test:
[   33.703648]  4k 3070 large 219 gb 0 x 3289[c0000000-f77fd000] miss 0
[   98.788039] BUG: soft lockup - CPU#0 stuck for 61s! [pageattr-test:287]
[   98.788039] Modules linked in:
[   98.788039] 
[   98.788039] Pid: 287, comm: pageattr-test Not tainted (2.6.27-rc6-tip-00431-g506f75d-dirty #32098)
[   98.788039] EIP: 0060:[<c0157599>] EFLAGS: 00000202 CPU: 0
[   98.788039] EIP is at csd_flag_wait+0x19/0x30
[   98.788039] EAX: f7158da0 EBX: c22854e0 ECX: 00000000 EDX: 014ba000
[   98.788039] ESI: f7158da0 EDI: c22854e8 EBP: f7158d70 ESP: f7158d70
[   98.788039]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
[   98.788039] CR0: 8005003b CR2: b7e6c130 CR3: 00ca7000 CR4: 000006d0
[   98.788039] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[   98.788039] DR6: ffff0ff0 DR7: 00000400
[   98.788039]  [<c01576a5>] generic_exec_single+0x65/0x80
[   98.788039]  [<c0157a1e>] smp_call_function_single+0xee/0x110
[   98.788039]  [<c011df70>] ? __global_flush_tlb+0x0/0x30
[   98.788039]  [<c011df70>] ? __global_flush_tlb+0x0/0x30
[   98.788039]  [<c0157ba4>] smp_call_function_mask+0x164/0x1a0
[   98.788039]  [<c011df70>] ? __global_flush_tlb+0x0/0x30
[   98.788039]  [<c0151a07>] ? __lock_acquire+0x1b7/0x690
[   98.788039]  [<c011df70>] ? __global_flush_tlb+0x0/0x30
[   98.788039]  [<c011df70>] ? __global_flush_tlb+0x0/0x30
[   98.788039]  [<c0157bfc>] smp_call_function+0x1c/0x20
[   98.788039]  [<c01347af>] on_each_cpu+0x1f/0x60
[   98.788039]  [<c011e22f>] __change_page_attr_set_clr+0x25f/0x610
[   98.788039]  [<c03a6136>] ? _raw_spin_unlock+0x46/0x80
[   98.788039]  [<c011e683>] change_page_attr_set_clr+0xa3/0x310
[   98.788039]  [<c011f3bb>] do_pageattr_test+0x39b/0x4f0
[   98.788039]  [<c011f020>] ? do_pageattr_test+0x0/0x4f0
[   98.788039]  [<c0143b77>] kthread+0x47/0x80
[   98.788039]  [<c0143b30>] ? kthread+0x0/0x80
[   98.788039]  [<c01043ab>] kernel_thread_helper+0x7/0x10
[   98.788039]  =======================
[   98.788039] Kernel panic - not syncing: softlockup: hung tasks

it's not specific enough to bisect it precisely, but excluding all these 
commits from x86/pat:

6b5b551: x86: handle error returns in set_memory_*()
5f25f5b: x86: track memtype for RAM in page struct
3196625: x86, cpa: global flush tlb after splitting large page and before doing cpa
79abc89: x86, cpa: remove cpa pool code
e96d59b: x86, cpa: fix taking the pgd_lock with interrupts off
888fdd9: x86, cpa: dont use large pages for kernel identity mapping with DEBUG_PAGEALLOC
e579af6: x86, cpa: make the kernel physical mapping initialization a two pass sequence
c86eefc: x86, cpa: remove USER permission from the very early identity mapping attribute
e8729a5: x86, cpa: rename PTE attribute macros for kernel direct mapping in early boot

makes the lockups go away.

i've pushed out the broken tree into tip/tmp.x86.pat.broken 
(2.6.27-rc6-tip-00431-g506f75d-dirty), you should be able to test the 
attached config with that specific tree. (i'll send the full crashlog in 
private mail, it's too large for lkml)

	Ingo

View attachment "config" of type "text/plain" (58983 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ