lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:	Sat, 23 Aug 2008 13:49:49 +0300
From:	Jari Aalto <jari.aalto@...te.net>
To:	linux-kernel@...r.kernel.org
Subject:  2.6.25 DMA: Out of SW-IOMMU space - Asus M2N32 AMD 8GB memory


Message from /etc/syslog:

        [1] Aug 21 11:01:19 jondo kernel: [174628.275859] DMA: Out of SW-IOMMU space for 65536 bytes at device 0000:00:0d.0

My AMD freezes with Kernel 2.6.25 has experienced regular freezing so
that only power button can take the system down. This is alarming,
because the system can stay up only a few days.

I've spent countless of hours reading related "Out of SW-IOMMU space"
(Google) documents. For some people they have worked, for some they
haven't and there has not been any clear explanation what options
whould/should be used in what chipsets/MBs and why.

I've gone through various combinations of kernel boot options, but
nothing seems to completely solve the problem:

iommu=soft swiotlb=65536

        Freezing continued, but the disk corruption did not happen any more.
        Increasing the swiotlb value has not had helped.

iommu=soft,memaper=3 swiotlb=65536

        Adding memaper did not help. "Out of SW-IOMMU space" messages [see
        1] creept in and I'm preparing to see another freeze eventally.

iommu=noaperture

        Same as above. Not progress.

iommu=noagp,noaperture swiotlb=512M

        Current options that I use. They were giving hope for 2 days,
        but then a single "Out of SW-IOMMU space" message appeared. I'm
        afraid the freeze is about to come.

Should I try following options next? or just "iommu=off"?

        iommu=noagp,noaperture,off swiotlb=512M
                               ===

I don't understand enough what are the effects related to the MCP55 SATA
Controller which seems to be the target [See 1; based on device id
"00:0d.0"] of these IOMMU messages. Only the plain SATA connectors, not
the onboard RAID SATA connectors, are in use for the harddisk.

To best of my knowledge going through this motherboard:

- Asus award bios does not have setting related to IOMMU. I'm using the
  latest bios 2001 from www.asus.com
- has no aperture setting in bios.
- has no AGP, only PCI and CPIe slots.

My arsenal of knowledge is exhausting, so please, if you have any
insight what could be examined further or what could be done to solve
the IOMMU problem, let me know.

Jari

Some of the links and threads I've read
---------------------------------------

        "Appendix L. Known Issues" > The X86-64 platform (AMD64/EM64T) and 2.6 kernels
        ftp://download.nvidia.com/XFree86/Linux-x86/1.0-8174/README/32bit_html/appendix-l.html

        "What is AGP Aperture size?"
        http://www.techpowerup.com/articles/overclocking/vidcard/43

        "PCI-DMA: high address but no IOMMU"
        http://article.gmane.org/gmane.linux.kernel/342411

        "Out of IOMMU space"
        http://www.x86-64.org/pipermail/discuss/2005-September/006490.html

        "Your BIOS doesn't leave a aperture memory hole"
        http://www.linuxquestions.org/questions/linux-hardware-18/your-bios-doesnt-leave-a-aperture-memory-hole-624088/

Hardware details
----------------
 OS
        $ cat /etc/debian_version
        lenny/sid               (pinning: that's 90% testing + 10% unstable packages)

 Kernel
        $ uname -a
        2.6.25-2-amd64 #1 SMP Mon Jul 14 11:05:23 UTC 2008 x86_64 GNU/Linux

 CPU
        $ cat /proc/cpuinfo
        model name      : AMD Athlon(tm) X2 Dual Core Processor BE-2400
        stepping        : 2
        cpu MHz         : 2310.518
        cache size      : 512 KB
        ...

        $ cat /proc/meminfo
        MemTotal:      8266632 kB
        MemFree:        110212 kB
        Buffers:        237132 kB
        Cached:        3803660 kB
        SwapCached:          0 kB
        ...

 HD
        $ hdparm -I /dev/sda

        ATA device, with non-removable media
                Model Number:       ST31000340AS
                Serial Number:      5QJ01MS4
                Firmware Revision:  SD01

        http://www.seagate.com/ww/v/index.jsp?vgnextoid=0732f141e7f43110VgnVCM100000f5ee0a0aRCRD

 MB

        Asus M2N32-SLI Deluxe/Wireless Edition
        - nvidia nForce 590 SLI chipset MCP
        - 2 x PCIe (SLI x16), 1 x PCI (x4), 1 x PCI (x1), 2 x PCI 2.2
        - Socket AM2

        http://www.asus.com/products.aspx?l1=3&l2=101&l3=300&model=1163&modelmenu=1

        $ lspci -nn
        00:0d.0 IDE interface [0101]: nVidia Corporation MCP55 SATA Controller [10de:037f] (rev a2)
        01:00.0 VGA compatible controller [0300]: nVidia Corporation G70 [GeForce 7600 GS] [10de:0392] (rev a1)
        02:0b.0 FireWire (IEEE 1394) [0c00]: Texas Instruments TSB43AB22/A IEEE-1394a-2000 Controller (PHY/Link) [104c:8023]
        03:00.0 Mass storage controller [0180]: Silicon Image, Inc. SiI 3132 Serial ATA Raid II Controller [1095:3132] (rev 01)
        ...

lspci -vv
----------------------------

00:16.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a2)
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 32 bytes
	Bus: primary=00, secondary=03, subordinate=03, sec-latency=0
	I/O behind bridge: 00009000-00009fff
	Memory behind bridge: fde00000-fdefffff
	Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
	BridgeCtl: Parity- SERR- NoISA+ VGA- MAbort- >Reset- FastB2B-
		PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
	Capabilities: [40] Subsystem: nVidia Corporation Device 0000
	Capabilities: [48] Power Management version 2
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
		Status: D0 PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [50] Message Signalled Interrupts: Mask- 64bit+ Queue=0/1 Enable+
		Address: 00000000fee0300c  Data: 4151
	Capabilities: [60] HyperTransport: MSI Mapping Enable+ Fixed-
		Mapping Address Base: 00000000fee00000
	Capabilities: [80] Express (v1) Root Port (Slot+), MSI 00
		DevCap:	MaxPayload 256 bytes, PhantFunc 0, Latency L0s <512ns, L1 <4us
			ExtTag- RBE+ FLReset-
		DevCtl:	Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
			RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
			MaxPayload 128 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
		LnkCap:	Port #1, Speed 2.5GT/s, Width x1, ASPM L0s L1, Latency L0 <512ns, L1 <4us
			ClockPM- Suprise- LLActRep+ BwNot-
		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive+ BWMgmt- ABWMgmt-
		SltCap:	AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surpise-
			Slot #  0, PowerLimit 0.000000; Interlock- NoCompl-
		SltCtl:	Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg-
			Control: AttnInd Off, PwrInd On, Power- Interlock-
		SltSta:	Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock-
			Changed: MRL- PresDet+ LinkState+
		RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna- CRSVisible-
		RootCap: CRSVisible-
		RootSta: PME ReqID 0000, PMEStatus- PMEPending-
	Capabilities: [100] Virtual Channel <?>
	Kernel driver in use: pcieport-driver
	Kernel modules: shpchp


[1] Full message from syslog
-----------------------------

Aug 21 11:01:19 jondo kernel: [174628.275859] DMA: Out of SW-IOMMU space for 65536 bytes at device 0000:00:0d.0
Aug 21 11:01:19 jondo kernel: [174628.279020] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Aug 21 11:01:19 jondo kernel: [174628.279020] ata3.00: cmd 35/00:00:9f:b9:fd/00:04:71:00:00/e0 tag 0 dma 524288 out
Aug 21 11:01:19 jondo kernel: [174628.279020]          res 50/00:00:96:b9:fd/00:00:71:00:00/e0 Emask 0x40 (internal error)
Aug 21 11:01:19 jondo kernel: [174628.279020] ata3.00: status: { DRDY }
Aug 21 11:01:19 jondo kernel: [174628.322932] ata3.00: configured for UDMA/133
Aug 21 11:01:19 jondo kernel: [174628.322932] ata3: EH complete
Aug 21 11:01:19 jondo kernel: [174628.330761] sd 2:0:0:0: [sda] 1953525168 512-byte hardware sectors (1000205 MB)
Aug 21 11:01:19 jondo kernel: [174628.340876] sd 2:0:0:0: [sda] Write Protect is off
Aug 21 11:01:19 jondo kernel: [174628.340876] sd 2:0:0:0: [sda] Mode Sense: 00 3a 00 00
Aug 21 11:01:19 jondo kernel: [174628.351250] sd 2:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA

dmesg
-------------------------------

[    0.914265] Linux agpgart interface v0.103
...
[    3.687719] ata1: SATA link down (SStatus 0 SControl 0)
[    5.770299] ata2: SATA link down (SStatus 0 SControl 0)
[    5.582800] ACPI: PCI Interrupt Link [APC3] enabled at IRQ 18
[    5.582811] ACPI: PCI Interrupt 0000:02:08.1[A] -> Link [APC3] -> GSI 18 (level, low) -> IRQ 18
[    5.584163] NFORCE-MCP55: 0000:00:0c.0 (rev a1) UDMA133 controller
[    5.584167] NFORCE-MCP55: IDE controller (0x10de:0x036e rev 0xa1) at  PCI slot 0000:00:0c.0
[    5.584187] NFORCE-MCP55: not 100% native mode: will probe irqs later
[    5.584194] NFORCE-MCP55: IDE port disabled
[    5.584198]     ide0: BM-DMA at 0xf400-0xf407, BIOS settings: hda:DMA, hdb:DMA
[    5.584208] Probing IDE interface ide0...
[    5.661667] firewire_ohci: Added fw-ohci device 0000:02:08.1, OHCI version 1.10
[    5.661706] ACPI: PCI Interrupt Link [APC1] enabled at IRQ 16
[    5.661706] ACPI: PCI Interrupt 0000:02:0b.0[A] -> Link [APC1] -> GSI 16 (level, low) -> IRQ 16
[    5.732701] firewire_ohci: Added fw-ohci device 0000:02:0b.0, OHCI version 1.10
[    6.345280] ACPI: PCI Interrupt Link [APCL] enabled at IRQ 20
[    6.345280] ACPI: PCI Interrupt 0000:00:0a.1[B] -> Link [APCL] -> GSI 20 (level, low) -> IRQ 20
[    6.345280] PCI: Setting latency timer of device 0000:00:0a.1 to 64
[    6.345280] ehci_hcd 0000:00:0a.1: EHCI Host Controller
[    6.345280] ehci_hcd 0000:00:0a.1: new USB bus registered, assigned bus number 2
[    6.345280] ehci_hcd 0000:00:0a.1: debug port 1
[    6.345280] PCI: cache line size of 64 is not supported by device 0000:00:0a.1
[    6.345280] ehci_hcd 0000:00:0a.1: irq 20, io mem 0xfe02e000

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ