linux-kernel - Re: inode->i_wb

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-Id: <201203152208.04168.petr@tesarici.cz>
Date:	Thu, 15 Mar 2012 22:08:03 +0800
From:	Petr Tesařík <petr@...arici.cz>
To:	Dave Jones <davej@...hat.com>
Cc:	Yang Bai <hamo.by@...il.com>,
	Fengguang Wu <fengguang.wu@...el.com>,
	Linux Kernel <linux-kernel@...r.kernel.org>,
	Fedora Kernel Team <kernel-team@...oraproject.org>,
	kernel@...arici.cz
Subject: Re: inode->i_wb_list corruption.

Dne So 10. března 2012 02:00:15 Dave Jones napsal(a):
> (trimmed cc)
> 
> On Sat, Mar 10, 2012 at 12:14:37AM +0800, Yang Bai wrote:
>  > On Fri, Mar 9, 2012 at 11:19 PM, Dave Jones <davej@...hat.com> wrote:
>  > > And with that, this arrived..
>  > > https://bugzilla.redhat.com/show_bug.cgi?id=788433#c3
>  > > 
>  > > I'm leaning strongly towards believing this is yet another case of
>  > > i915 corrupting memory on resume.
>  > 
>  > Nice catch. I am wondering
>  > 1) why all lists being affected and
>  > 2) why all list_head's prev being set to NULL.
>  > 
>  > Any ideas?
> 
> This is probably the same bug:
> https://bugzilla.kernel.org/show_bug.cgi?id=37142 Petr noticed that the
> corruption is 32 bytes getting zeroed at the beginning of a page.
> 
> I think this may be responsible for a lot of different bugs that we've
> had reported.
> 
> i915_drm_thaw is a deep nest of functions though, so this is going to be
> hard to track down where that write is coming from. Because the corruption
> seems to happen to pages that are already allocated, we probably can't
> even rely on DEBUG_PAGEALLOC, though it might be worth trying.

If it you believe it could be written by the CPU, I can try to catch the 
instruction that writes to this memory. My plan is as follows:

Set up all the hardware debug registers to trap writes to the pages that are 
likely to get corrupted. Remember, I've seen the corruption happen always 
roughly in the same physical memory area.

I know, there are only 4 registers I can use, and the potential corruption 
area is much larger than 4 pages, but with enough reboots, the chance is quite 
high that I'll be lucky.

I haven't gone for that plan yet, because I thought the area was in fact 
written to by someone else on the PCI bus, not the CPU. If nothing else, I can 
verify that. ;-)

Dave, do you think the result of such testing would help you resolve the bug?

Petr
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/