lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <450EEA1A.90806@kautzy.com>
Date:	Mon, 18 Sep 2006 20:48:58 +0200
From:	kautzy <kautzy@...tzy.com>
To:	Jon Mason <jdmason@...zu.us>
CC:	linux-kernel@...r.kernel.org
Subject: Re: Dual Core Opteron hangs, iommu Entries (x86_64)

Jon Mason wrote:
> On Mon, Sep 18, 2006 at 03:12:41PM +0200, kautzy wrote:
>   
>> Since this is my first post on this list, I would like to say hello to 
>> everyone!
>>
>> I am experiencing problems with a 2x dual core opteron servers. every 
>> 5-7 days the system hangs. while it still pings, it does not react on 
>> console inputs, i can't login via ssh either. when that happens, the 
>> only thing one can do is to reset the machine. there aren't any errors 
>> logged.
>>
>> i have checked the memory for errors, but it looks like it is ok.
>>
>> I found a post on this list describing a problem which looks similar to 
>> mine:
>>
>> http://www.gatago.com/linux/kernel/13699679.html
>>
>> as mentioned in the above post, a dmesg on my server also shows 
>> following entries:
>>
>> Allocating PCI resources starting at fb800000 (gap: fb000000:4780000)
>> Checking aperture...
>> CPU 0: aperture @ cc24000000 size 32 MB
>> Aperture from northbridge cpu 0 too small (32 MB)
>> No AGP bridge found
>> Your BIOS doesn't leave a aperture memory hole
>> Please enable the IOMMU option in the BIOS setup
>> This costs you 64 MB of RAM
>> Mapping aperture over 65536 KB of RAM @ 8000000
>> Built 1 zonelists
>>
>> can those entries have anything to do with the system crashes, and if, 
>> can booting with iommu=memaper=3 help to solve the problem?
>>
>> i am running kernel 2.6.17.11, sarge amd64 , the system has 6GB RAM
>>
>> i appreciate any suggestions :)
>>     
>
> Your problem is that you have more than 4GB of RAM and not enough room
> in your IOMMU aperature to handle all of the pending DMA requests.
> Dmesg suggests you go into your BIOS and increase your AGP aperature
> from 32M to 64M, did you try that?  
>
> Thanks,
> Jon
>
>   
thanks alot for your reply jon!

i should have mentioned that the mainboard  has neither an agp slot nor 
AGP aperature settings in the bios :(

the biggest problem i am facing is, that i always have to wait 5-7 days 
until i see if the changes i made since i had to reboot the computer had 
a positive effect (which unluckily never was the case until now) ;-)

regards

chris

>>
>> chris
>>
>> the full output of dmesg:
>>
>> Bootdata ok (command line is root=/dev/sda8 ro console=tty0 )
>> Linux version 2.6.17.11-mli1-opteron-v2 (root@...1) (gcc version 3.3.5 
>> (Debian 1:3.3.5-13)) #1 SMP Mon Sep 11 12:29:02 CEST 2006
>> BIOS-provided physical RAM map:
>> BIOS-e820: 0000000000000000 - 000000000009f400 (usable)
>> BIOS-e820: 000000000009f400 - 00000000000a0000 (reserved)
>> BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
>> BIOS-e820: 0000000000100000 - 00000000faff0000 (usable)
>> BIOS-e820: 00000000faff0000 - 00000000fafff000 (ACPI data)
>> BIOS-e820: 00000000fafff000 - 00000000fb000000 (ACPI NVS)
>> BIOS-e820: 00000000ff780000 - 0000000100000000 (reserved)
>> BIOS-e820: 0000000100000000 - 0000000180000000 (usable)
>> DMI 2.3 present.
>> On node 0 totalpages: 1529283
>>  DMA zone: 2459 pages, LIFO batch:0
>>  DMA32 zone: 1009704 pages, LIFO batch:31
>>  Normal zone: 517120 pages, LIFO batch:31
>> Intel MultiProcessor Specification v1.1
>>    Virtual Wire compatibility mode.
>> OEM ID: TYAN     Product ID: S2882        APIC at: 0xFEE00000
>> Processor #0 15:1 APIC version 16
>> Processor #1 15:1 APIC version 16
>> Processor #2 15:1 APIC version 16
>> Processor #3 15:1 APIC version 16
>> I/O APIC #4 Version 17 at 0xFEC00000.
>> I/O APIC #5 Version 17 at 0xFEBFF000.
>> I/O APIC #6 Version 17 at 0xFEBFE000.
>> Setting APIC routing to flat
>> Processors: 4
>> Allocating PCI resources starting at fb800000 (gap: fb000000:4780000)
>> Checking aperture...
>> CPU 0: aperture @ cc24000000 size 32 MB
>> Aperture from northbridge cpu 0 too small (32 MB)
>> No AGP bridge found
>> Your BIOS doesn't leave a aperture memory hole
>> Please enable the IOMMU option in the BIOS setup
>> This costs you 64 MB of RAM
>> Mapping aperture over 65536 KB of RAM @ 8000000
>> Built 1 zonelists
>> Kernel command line: root=/dev/sda8 ro console=tty0
>> Initializing CPU#0
>> PID hash table entries: 4096 (order: 12, 32768 bytes)
>> time.c: Using 1.193182 MHz WALL PIT GTOD PIT/TSC timer.
>> time.c: Detected 2190.816 MHz processor.
>> Console: colour VGA+ 80x25
>> Dentry cache hash table entries: 1048576 (order: 11, 8388608 bytes)
>> Inode-cache hash table entries: 524288 (order: 10, 4194304 bytes)
>> Memory: 6038612k/6291456k available (3002k kernel code, 170092k 
>> reserved, 1269k data, 168k init)
>> Calibrating delay using timer specific routine.. 4390.66 BogoMIPS 
>> (lpj=8781339)
>> Mount-cache hash table entries: 256
>> CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
>> CPU: L2 Cache: 1024K (64 bytes/line)
>> Using IO-APIC 4
>> Using IO-APIC 5
>> Using IO-APIC 6
>> GSI 18 sharing vector 0x89 and IRQ 18
>> GSI 19 sharing vector 0x91 and IRQ 19
>> GSI 24 sharing vector 0x99 and IRQ 24
>> GSI 25 sharing vector 0xA1 and IRQ 25
>> GSI 29 sharing vector 0xA9 and IRQ 29
>> Using local APIC timer interrupts.
>> result 12447820
>> Detected 12.447 MHz APIC timer.
>> Booting processor 1/4 APIC 0x1
>> Initializing CPU#1
>> Calibrating delay using timer specific routine.. 4381.80 BogoMIPS 
>> (lpj=8763613)
>> CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
>> CPU: L2 Cache: 1024K (64 bytes/line)
>> Dual Core AMD Opteron(tm) Processor 275 stepping 02
>> CPU 1: Syncing TSC to CPU 0.
>> CPU 1: synchronized TSC with CPU 0 (last diff 6 cycles, maxerr 627 cycles)
>> Booting processor 2/4 APIC 0x2
>> Initializing CPU#2
>> Calibrating delay using timer specific routine.. 4381.88 BogoMIPS 
>> (lpj=8763771)
>> CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
>> CPU: L2 Cache: 1024K (64 bytes/line)
>> Dual Core AMD Opteron(tm) Processor 275 stepping 02
>> CPU 2: Syncing TSC to CPU 0.
>> CPU 2: synchronized TSC with CPU 0 (last diff 1 cycles, maxerr 876 cycles)
>> Booting processor 3/4 APIC 0x3
>> Initializing CPU#3
>> Calibrating delay using timer specific routine.. 4381.92 BogoMIPS 
>> (lpj=8763852)
>> CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
>> CPU: L2 Cache: 1024K (64 bytes/line)
>> Dual Core AMD Opteron(tm) Processor 275 stepping 02
>> CPU 3: Syncing TSC to CPU 0.
>> CPU 3: synchronized TSC with CPU 0 (last diff 7 cycles, maxerr 864 cycles)
>> Brought up 4 CPUs
>> testing NMI watchdog ... OK.
>> migration_cost=460
>> NET: Registered protocol family 16
>> PCI: Using configuration type 1
>> SCSI subsystem initialized
>> PCI: Probing PCI hardware
>> PCI: Probing PCI hardware (bus 00)
>> Boot video device is 0000:03:06.0
>> PCI: Using IRQ router default [1022/746b] at 0000:00:07.3
>> PCI->APIC IRQ transform: 0000:00:07.2[D] -> IRQ 19
>> PCI->APIC IRQ transform: 0000:03:06.0[A] -> IRQ 18
>> PCI->APIC IRQ transform: 0000:03:08.0[A] -> IRQ 18
>> PCI->APIC IRQ transform: 0000:02:09.0[A] -> IRQ 24
>> PCI->APIC IRQ transform: 0000:02:09.1[B] -> IRQ 25
>> PCI->APIC IRQ transform: 0000:01:04.0[A] -> IRQ 29
>> PCI-DMA: Disabling AGP.
>> PCI-DMA: aperture base @ 8000000 size 65536 KB
>> PCI-DMA: using GART IOMMU.
>> PCI-DMA: Reserving 64MB of IOMMU area in the AGP aperture
>> PCI: Bridge: 0000:00:06.0
>>  IO window: 9000-bfff
>>  MEM window: fca00000-feafffff
>>  PREFETCH window: disabled.
>> PCI: Bridge: 0000:00:0a.0
>>  IO window: disabled.
>>  MEM window: fc900000-fc9fffff
>>  PREFETCH window: fc600000-fc6fffff
>> PCI: Bridge: 0000:00:0b.0
>>  IO window: 8000-8fff
>>  MEM window: fc800000-fc8fffff
>>  PREFETCH window: fb500000-fc5fffff
>> NET: Registered protocol family 2
>> IP route cache hash table entries: 262144 (order: 9, 2097152 bytes)
>> TCP established hash table entries: 262144 (order: 10, 4194304 bytes)
>> TCP bind hash table entries: 65536 (order: 8, 1048576 bytes)
>> TCP: Hash tables configured (established 262144 bind 65536)
>> TCP reno registered
>> IA32 emulation $Id: sys_ia32.c,v 1.32 2002/03/24 13:02:28 ak Exp $
>> Installing knfsd (copyright (C) 1996 okir@...ad.swb.de).
>> Initializing Cryptographic API
>> io scheduler noop registered
>> io scheduler anticipatory registered
>> io scheduler deadline registered (default)
>> io scheduler cfq registered
>> PCI: MSI quirk detected. PCI_BUS_FLAGS_NO_MSI set for subordinate bus.
>> PCI: MSI quirk detected. PCI_BUS_FLAGS_NO_MSI set for subordinate bus.
>> Real Time Clock Driver v1.12ac
>> Linux agpgart interface v0.101 (c) Dave Jones
>> Floppy drive(s): fd0 is 1.44M
>> FDC 0 is a post-1991 82077
>> loop: loaded (max 8 devices)
>> Intel(R) PRO/1000 Network Driver - version 7.0.33-k2
>> Copyright (c) 1999-2005 Intel Corporation.
>> eepro100.c:v1.09j-t 9/29/99 Donald Becker 
>> http://www.scyld.com/network/eepro100.html
>> eepro100.c: $Revision: 1.36 $ 2000/11/17 Modified by Andrey V. Savochkin 
>> <saw@....sw.com.sg> and others
>> eth0: 0000:03:08.0, 00:E0:81:32:F6:36, IRQ 18.
>>  Board assembly 567812-052, Physical connectors present: RJ45
>>  Primary interface chip i82555 PHY #1.
>>  General self-test: passed.
>>  Serial sub-system self-test: passed.
>>  Internal registers self-test: passed.
>>  ROM checksum self-test: passed (0xd0a6c714).
>> e100: Intel(R) PRO/100 Network Driver, 3.5.10-k2-NAPI
>> e100: Copyright(c) 1999-2005 Intel Corporation
>> tg3.c:v3.59 (June 8, 2006)
>> eth1: Tigon3 [partno(BCM95704A7) rev 2003 PHY(5704)] 
>> (PCIX:100MHz:64-bit) 10/100/1000BaseT Ethernet 00:e0:81:32:f7:ac
>> eth1: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] Split[0] WireSpeed[1] 
>> TSOcap[1]
>> eth1: dma_rwctrl[769f4000] dma_mask[64-bit]
>> eth2: Tigon3 [partno(BCM95704A7) rev 2003 PHY(5704)] 
>> (PCIX:100MHz:64-bit) 10/100/1000BaseT Ethernet 00:e0:81:32:f7:ad
>> eth2: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] Split[0] WireSpeed[1] 
>> TSOcap[1]
>> eth2: dma_rwctrl[769f4000] dma_mask[64-bit]
>> Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
>> ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
>> 3ware 9000 Storage Controller device driver for Linux v2.26.02.007.
>> 3w-9xxx: scsi0: AEN: INFO (0x04:0x0055): Battery charging started:.
>> 3w-9xxx: scsi0: AEN: INFO (0x04:0x0053): Battery capacity test is overdue:.
>> scsi0 : 3ware 9000 Storage Controller
>> 3w-9xxx: scsi0: Found a 3ware 9000 Storage Controller at 0xfc8ffc00, 
>> IRQ: 29.
>> 3w-9xxx: scsi0: Firmware FE9X 2.08.00.005, BIOS BE9X 2.03.01.052, Ports: 8.
>>  Vendor: AMCC      Model: 9500S-8    DISK   Rev: 2.08
>>  Type:   Direct-Access                      ANSI SCSI revision: 03
>> SCSI device sda: 956884992 512-byte hdwr sectors (489925 MB)
>> sda: Write Protect is off
>> sda: Mode Sense: 23 00 00 00
>> SCSI device sda: drive cache: write back, no read (daft)
>> SCSI device sda: 956884992 512-byte hdwr sectors (489925 MB)
>> sda: Write Protect is off
>> sda: Mode Sense: 23 00 00 00
>> SCSI device sda: drive cache: write back, no read (daft)
>> sda: sda1 < sda5 sda6 sda7 sda8 sda9 sda10 > sda2 sda3
>> sd 0:0:0:0: Attached scsi disk sda
>> serio: i8042 AUX port at 0x60,0x64 irq 12
>> serio: i8042 KBD port at 0x60,0x64 irq 1
>> mice: PS/2 mouse device common for all mice
>> TCP bic registered
>> NET: Registered protocol family 1
>> NET: Registered protocol family 10
>> IPv6 over IPv4 tunneling driver
>> NET: Registered protocol family 17
>> NET: Registered protocol family 15
>> 802.1Q VLAN Support v1.8 Ben Greear <greearb@...delatech.com>
>> All bugs added by David S. Miller <davem@...hat.com>
>> ReiserFS: sda8: found reiserfs format "3.6" with standard journal
>> ReiserFS: sda8: using ordered data mode
>> ReiserFS: sda8: journal params: device sda8, size 8192, journal first 
>> block 18, max trans len 1024, max batch 900, max commit age 30, max 
>> trans age 30
>> ReiserFS: sda8: checking transaction log (sda8)
>> input: AT Translated Set 2 keyboard as /class/input/input0
>> ReiserFS: sda8: replayed 15 transactions in 1 seconds
>> ReiserFS: sda8: Using r5 hash to sort names
>> VFS: Mounted root (reiserfs filesystem) readonly.
>> Freeing unused kernel memory: 168k freed
>> Adding 1951856k swap on /dev/sda5.  Priority:-1 extents:1 across:1951856k
>> Adding 1951856k swap on /dev/sda6.  Priority:-2 extents:1 across:1951856k
>> Adding 1951792k swap on /dev/sda7.  Priority:-3 extents:1 across:1951792k
>> ReiserFS: sda10: found reiserfs format "3.6" with standard journal
>> ReiserFS: sda10: using ordered data mode
>> ReiserFS: sda10: journal params: device sda10, size 8192, journal first 
>> block 18, max trans len 1024, max batch 900, max commit age 30, max 
>> trans age 30
>> ReiserFS: sda10: checking transaction log (sda10)
>> ReiserFS: sda10: Using r5 hash to sort names
>> ReiserFS: sda10: Removing [30 40588 0x0 SD]..done
>> ReiserFS: sda10: Removing [3 40583 0x0 SD]..done
>> ReiserFS: sda10: Removing [3 40582 0x0 SD]..done
>> ReiserFS: sda10: Removing [3 40579 0x0 SD]..done
>> ReiserFS: sda10: There were 4 uncompleted unlinks/truncates. Completed
>> ReiserFS: sda2: found reiserfs format "3.6" with standard journal
>> ReiserFS: sda2: using ordered data mode
>> ReiserFS: sda2: journal params: device sda2, size 8192, journal first 
>> block 18, max trans len 1024, max batch 900, max commit age 30, max 
>> trans age 30
>> ReiserFS: sda2: checking transaction log (sda2)
>> ReiserFS: sda2: Using r5 hash to sort names
>> ReiserFS: sda2: Removing [1306 51393 0x0 SD]..done
>> ReiserFS: sda2: Removing [1306 51193 0x0 SD]..done
>> ReiserFS: sda2: There were 2 uncompleted unlinks/truncates. Completed
>> ReiserFS: sda3: found reiserfs format "3.6" with standard journal
>> ReiserFS: sda3: using ordered data mode
>> ReiserFS: sda3: journal params: device sda3, size 8192, journal first 
>> block 18, max trans len 1024, max batch 900, max commit age 30, max 
>> trans age 30
>> ReiserFS: sda3: checking transaction log (sda3)
>> ReiserFS: sda3: Using r5 hash to sort names
>> PM: Writing back config space on device 0000:02:09.1 at offset b (was 
>> 164814e4, writing 164414e4)
>> PM: Writing back config space on device 0000:02:09.1 at offset 3 (was 
>> 804000, writing 804010)
>> PM: Writing back config space on device 0000:02:09.1 at offset 2 (was 
>> 2000000, writing 2000003)
>> PM: Writing back config space on device 0000:02:09.1 at offset 1 (was 
>> 2b00000, writing 2b00106)
>> ADDRCONF(NETDEV_UP): eth2: link is not ready
>> tg3: eth2: Link is up at 1000 Mbps, full duplex.
>> tg3: eth2: Flow control is off for TX and off for RX.
>> ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready
>> eth2: no IPv6 routers present
>> 3w-9xxx: scsi0: AEN: INFO (0x04:0x0056): Battery charging completed:.
>>
>> -
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majordomo@...r.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at  http://www.tux.org/lkml/
>>
>>     
>
>   

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ