linux-kernel - Re: Panic in multiple kernels: IA64 SBA IOMMU: Culprit commit on Mar 28, 2008

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <49125DFF.5080900@cse.unsw.edu.au>
Date:	Thu, 06 Nov 2008 14:01:19 +1100
From:	Shehjar Tikoo <shehjart@....unsw.edu.au>
To:	"Luck, Tony" <tony.luck@...el.com>
CC:	"fujita.tomonori@....ntt.co.jp" <fujita.tomonori@....ntt.co.jp>,
	"akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"linux-ia64@...r.kernel.org" <linux-ia64@...r.kernel.org>,
	linux-parisc@...r.kernel.org
Subject: Re: Panic in multiple kernels: IA64 SBA IOMMU: Culprit commit on
 Mar 28, 2008

Luck, Tony wrote:
> Added Cc: linux-ia64 ... more likely to attract attention of HP
> ia64 experts there.
> 
>> arch/ia64/hp/common/sba_iommu.c: I/O MMU is out of mapping resources
> 
> Odd ... the code (back to the dawn of git time in 2.6.12-rc1) looks like
> 
>         panic(__FILE__ ": I/O MMU @ %p is out of mapping resources\n"
>                 ioc->ioc_hpa);
> 
> I wonder why you don't see the "@ HEXADDRESS"?

That was copy paste from memory. You're right. There is a hex address.
I've copied a full message at the end of the email.

> 
>> Using git-bisect, I've zeroed in on the commit that introduced this.
>> Please see the attached file for the commit.
> 
> Did you confirm that reverting this commit on a recent kernel
> fixes the problem (once in a while git bisect can point to
> the wrong commit ... it seems very likely that it got the
> right one here, but it is always good to check).  When I
> tried to use "patch -R" to revert this it got confused on
> the Kconfig file because the lines that were added were
> subsequently changed ... so you may need to revert that
> by hand ... the sba_iommu.c apparently reverted ok).


Yes, reverting this commit in 2.6.27 prevents kernel panic on both
workloads.

> 
>> Other info:
>> System is HP RX6600(16Gb RAM, 16 processors w/ dual cores and HT)
>> 20 SATA disks under software RAID0 with 6 TB capacity.
>> Silicon Image 3124 controller.
>> File system is XFS.
> 
> My HP test system is way too small to attempt to recreate
> this (just 2 cpus & 1 disk).  How long does each of your
> tests take to hit the problems ... a few minutes? Or hours?

The points at which panic occur are variable for both tests but
generally, I felt the panics were occurring nearer to the end of the
750G to 1TB writes.

> 
>> I'd much appreciate some help in fixing this because this panic has
>> basically stalled my own work. I'd be willing to run more tests on my
>> setup to test any patches that possibly fix this issue.
> 
> Adding some printk() before the panic might give a clue as to what
> is going wrong.  Either a bogus call is trying to allocate far
> too much space, or the bitmap is leaking, or we have a totally
> messed up "ioc" structure.
> 
> Printing "pages_needed" the address of "ioc" and some interesting
> fields from ioc (at least ioc->res_size) would help.  I assume
> the the return value from sba_search_bitmap() is ~0x0 ... but
> you should print "pide" just to be sure.


Heres some more info from a printk:

Kernel panic - not syncing: arch/ia64/hp/common/sba_iommu.c: I/O MMU @ 
c0000000fed01000 is out of mapping resources: pide: 
18446744073709551615, pages_needed: 5, iocres_size: 8192

> 
> -Tony

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/