linux-ext4 - Re: Frequent ext4 oopses with 4.4.0 on Intel NUC6i3SYB

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <087b53e5-b23b-d3c2-6b8e-980bdcbf75c1@gmx.de>
Date:   Tue, 4 Oct 2016 21:55:58 +0200
From:   Johannes Bauer <dfnsonfsduifb@....de>
To:     Andrey Korolyov <andrey@...l.ru>
Cc:     Jan Kara <jack@...e.cz>, linux-ext4@...r.kernel.org,
        linux-mm@...ck.org
Subject: Re: Frequent ext4 oopses with 4.4.0 on Intel NUC6i3SYB

On 04.10.2016 20:45, Andrey Korolyov wrote:
>> Damn bad idea to build on the instable target. Lots of gcc segfaults and
>> weird stuff, even without a kernel panic. The system appears to be
>> instable as hell. Wonder how it can even run and how much of the root fs
>> is already corrupted :-(
>>
>> Rebuilding 4.8 on a different host.
> 
> Looks like a platform itself is somewhat faulty: [1]. Also please bear
> in mind that standalone memory testers would rather not expose certain
> classes of memory failures, I`d suggest to test allocator`s work
> against gcc runs on tmpfs, almost same as you did before. Frequency of
> crashes due to wrong pointer contents of an fs cache is most probably
> a direct outcome from its relative memory footprint.

So there's some interesting new data points that I couldn't make sense
of. Maybe you can.

First off, 4.8.0 shows the same symptoms. When I try to build 4.8.0 in
/usr/src/linux using make -j4, I get bus errors and segfaults in gcc
pretty soon.

Doing the same thing in /dev/shm, however, builds like a charm. Three
kernels built, all ran through perfectly. Not one try in /usr/src did
that, all my attempts failed.

What could cause this? Faulty hard drive? It's brand new:

Model Family:     Western Digital Red
Device Model:     WDC WD10JFCX-68N6GN0
Firmware Version: 82.00A82

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE
UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always
      -       0
  3 Spin_Up_Time            0x0027   182   181   021    Pre-fail  Always
      -       1858
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always
      -       17
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always
      -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always
      -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always
      -       178

Or faulty AHCI controller or driver?

[    9.746277] ahci 0000:00:17.0: version 3.0
[    9.746499] ahci 0000:00:17.0: AHCI 0001.0301 32 slots 1 ports 6 Gbps
0x1 impl SATA mode
[    9.746501] ahci 0000:00:17.0: flags: 64bit ncq pm led clo only pio
slum part deso sadm sds apst
[    9.753844] scsi host0: ahci
[    9.754648] ata1: SATA max UDMA/133 abar m2048@...f14d000 port
0xdf14d100 irq 275

I'm super puzzled right now :-(

Cheers,
Johannes
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html