[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <087b53e5-b23b-d3c2-6b8e-980bdcbf75c1@gmx.de>
Date: Tue, 4 Oct 2016 21:55:58 +0200
From: Johannes Bauer <dfnsonfsduifb@....de>
To: Andrey Korolyov <andrey@...l.ru>
Cc: Jan Kara <jack@...e.cz>, linux-ext4@...r.kernel.org,
linux-mm@...ck.org
Subject: Re: Frequent ext4 oopses with 4.4.0 on Intel NUC6i3SYB
On 04.10.2016 20:45, Andrey Korolyov wrote:
>> Damn bad idea to build on the instable target. Lots of gcc segfaults and
>> weird stuff, even without a kernel panic. The system appears to be
>> instable as hell. Wonder how it can even run and how much of the root fs
>> is already corrupted :-(
>>
>> Rebuilding 4.8 on a different host.
>
> Looks like a platform itself is somewhat faulty: [1]. Also please bear
> in mind that standalone memory testers would rather not expose certain
> classes of memory failures, I`d suggest to test allocator`s work
> against gcc runs on tmpfs, almost same as you did before. Frequency of
> crashes due to wrong pointer contents of an fs cache is most probably
> a direct outcome from its relative memory footprint.
So there's some interesting new data points that I couldn't make sense
of. Maybe you can.
First off, 4.8.0 shows the same symptoms. When I try to build 4.8.0 in
/usr/src/linux using make -j4, I get bus errors and segfaults in gcc
pretty soon.
Doing the same thing in /dev/shm, however, builds like a charm. Three
kernels built, all ran through perfectly. Not one try in /usr/src did
that, all my attempts failed.
What could cause this? Faulty hard drive? It's brand new:
Model Family: Western Digital Red
Device Model: WDC WD10JFCX-68N6GN0
Firmware Version: 82.00A82
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE
UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always
- 0
3 Spin_Up_Time 0x0027 182 181 021 Pre-fail Always
- 1858
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always
- 17
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always
- 0
7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always
- 0
9 Power_On_Hours 0x0032 100 100 000 Old_age Always
- 178
Or faulty AHCI controller or driver?
[ 9.746277] ahci 0000:00:17.0: version 3.0
[ 9.746499] ahci 0000:00:17.0: AHCI 0001.0301 32 slots 1 ports 6 Gbps
0x1 impl SATA mode
[ 9.746501] ahci 0000:00:17.0: flags: 64bit ncq pm led clo only pio
slum part deso sadm sds apst
[ 9.753844] scsi host0: ahci
[ 9.754648] ata1: SATA max UDMA/133 abar m2048@...f14d000 port
0xdf14d100 irq 275
I'm super puzzled right now :-(
Cheers,
Johannes
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists