[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9cb08428-66ed-2306-d2f2-ae734863c68d@gmail.com>
Date: Wed, 18 Apr 2018 10:32:21 +0200
From: Pavlos Parissis <pavlos.parissis@...il.com>
To: Jan Kara <jack@...e.cz>
Cc: Guillaume Morin <guillaume@...infr.org>, stable@...r.kernel.org,
decui@...rosoft.com, jack@...e.com, linux-fsdevel@...r.kernel.org,
linux-kernel@...r.kernel.org, mszeredi@...hat.com
Subject: Re: kernel panics with 4.14.X versions
On 17/04/2018 02:12 μμ, Jan Kara wrote:
> On Tue 17-04-18 01:31:24, Pavlos Parissis wrote:
>> On 16/04/2018 04:40 μμ, Jan Kara wrote:
>
> <snip>
>
>>> How easily can you hit this?
>>
>> Very easily, I only need to wait 1-2 days for a crash to occur.
>
> I wouldn't call that very easily but opinions may differ :). Anyway it's
> good (at least for debugging) that it's reproducible.
>
Unfortunately, I can't reproduce it, so waiting 1-2 days is the only option I have.
>>> Are you able to run debug kernels
>>
>> Well, I was under the impression I do as I have:
>> grep -E 'DEBUG_KERNEL|DEBUG_INFO' /boot/config-4.14.32-1.el7.x86_64
>> CONFIG_DEBUG_INFO=y
>> # CONFIG_DEBUG_INFO_REDUCED is not set
>> # CONFIG_DEBUG_INFO_SPLIT is not set
>> # CONFIG_DEBUG_INFO_DWARF4 is not set
>> CONFIG_DEBUG_KERNEL=y
>>
>> Do you think that my kernel doesn't produce a proper crash dump?
>> I have a production cluster where I can run any kernel we need, so if I need
>> to compile again with different settings I can certainly do that.
>
> OK, good. So please try running 4.16 as you mention below to verify whether
> this is just a -stable regression or also a problem in the current upstream
> kernel. Based on your results with 4.16 I'll prepare a debug patch for you to
> apply on top of 4.14.32 so that we can debug this further.
>
>>> / inspect
>>> crash dumps when the issue occurs?
>>
>> I can't do that as the server isn't responsive and I can only power cycle it.
>
> Well, kernel crash dumps work in that situation as well - when the kernel
> panics, it will kexec into a new kernel and dump memory of the old kernel
> to disk. It can then be investigated with the 'crash' utility. But
> obviously you don't have this set up and don't have experience with this so
> let's go via a standard 'debug patch' route.
>
>>> Also testing with the latest mainline
>>> kernel (4.16) would be welcome whether this isn't just an issue with the
>>> backport of fsnotify fixes from Miklos.
>>
>> I can try the kernel-ml-4.16.2 from elrepo (we use CentOS 7).
>
> Yes, that would be good.
>
I have production server running 4.16.2 and no kernel crash dumps yet.
Let's wait another day before we say anything.
Cheers,
Pavlos
Download attachment "signature.asc" of type "application/pgp-signature" (834 bytes)
Powered by blists - more mailing lists