lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9cb08428-66ed-2306-d2f2-ae734863c68d@gmail.com>
Date:   Wed, 18 Apr 2018 10:32:21 +0200
From:   Pavlos Parissis <pavlos.parissis@...il.com>
To:     Jan Kara <jack@...e.cz>
Cc:     Guillaume Morin <guillaume@...infr.org>, stable@...r.kernel.org,
        decui@...rosoft.com, jack@...e.com, linux-fsdevel@...r.kernel.org,
        linux-kernel@...r.kernel.org, mszeredi@...hat.com
Subject: Re: kernel panics with 4.14.X versions

On 17/04/2018 02:12 μμ, Jan Kara wrote:
> On Tue 17-04-18 01:31:24, Pavlos Parissis wrote:
>> On 16/04/2018 04:40 μμ, Jan Kara wrote:
> 
> <snip>
> 
>>> How easily can you hit this?
>>
>> Very easily, I only need to wait 1-2 days for a crash to occur.
> 
> I wouldn't call that very easily but opinions may differ :). Anyway it's
> good (at least for debugging) that it's reproducible.
> 

Unfortunately, I can't reproduce it, so waiting 1-2 days is the only option I have.

>>> Are you able to run debug kernels
>>
>> Well, I was under the impression I do as I have:
>>   grep -E 'DEBUG_KERNEL|DEBUG_INFO' /boot/config-4.14.32-1.el7.x86_64
>>   CONFIG_DEBUG_INFO=y
>>   # CONFIG_DEBUG_INFO_REDUCED is not set
>>   # CONFIG_DEBUG_INFO_SPLIT is not set
>>   # CONFIG_DEBUG_INFO_DWARF4 is not set
>>   CONFIG_DEBUG_KERNEL=y
>>
>> Do you think that my kernel doesn't produce a proper crash dump?
>> I have a production cluster where I can run any kernel we need, so if I need
>> to compile again with different settings I can certainly do that.
> 
> OK, good. So please try running 4.16 as you mention below to verify whether
> this is just a -stable regression or also a problem in the current upstream
> kernel. Based on your results with 4.16 I'll prepare a debug patch for you to
> apply on top of 4.14.32 so that we can debug this further.
> 
>>> / inspect
>>> crash dumps when the issue occurs?
>>
>> I can't do that as the server isn't responsive and I can only power cycle it.
> 
> Well, kernel crash dumps work in that situation as well - when the kernel
> panics, it will kexec into a new kernel and dump memory of the old kernel
> to disk. It can then be investigated with the 'crash' utility. But
> obviously you don't have this set up and don't have experience with this so
> let's go via a standard 'debug patch' route.
> 
>>> Also testing with the latest mainline
>>> kernel (4.16) would be welcome whether this isn't just an issue with the
>>> backport of fsnotify fixes from Miklos.
>>
>> I can try the kernel-ml-4.16.2 from elrepo (we use CentOS 7).
> 
> Yes, that would be good.
> 

I have production server running 4.16.2 and no kernel crash dumps yet.
Let's wait another day before we say anything.

Cheers,
Pavlos



Download attachment "signature.asc" of type "application/pgp-signature" (834 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ