lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAM_iQpXzUB7rLARnWx31cg9+Wp=YGSRMPy48W9pSZDe_-23c5g@mail.gmail.com>
Date:   Mon, 8 Apr 2019 20:41:04 -0700
From:   Cong Wang <xiyou.wangcong@...il.com>
To:     Reindl Harald <h.reindl@...lounge.net>
Cc:     Linux Kernel Network Developers <netdev@...r.kernel.org>
Subject: Re: random crashes, kdump and so on

On Mon, Apr 8, 2019 at 7:22 PM Reindl Harald <h.reindl@...lounge.net> wrote:
>
>
>
> Am 25.03.19 um 23:10 schrieb Reindl Harald:
> >>> fact is that around 4.19.x the kernel had a ton of issues starting with
> >>> conncount broken over months (again: with a simple method get the
> >>> stacktrace it would have been easily discovered), the scheduler issue in
> >>> 4.19.x eating peoples data and so on
> >>
> >> If kexec-tools doesn't work for you, try something else like netconsole
> >> to save the stack traces. Again, depends on the type of crash, just stack
> >> trace may not even be enough to debugging it. Of course, having a
> >> stack trace is still much better than having nothing.
> >
> > for now it looks that the tonights 5.0.4 F29 build works without the
> > random crashes, kdump this time also didn't refuse to start and
> > /var/crash is now a dedicated virtual disk with 3 GB
> >
> > fingers crossing, after the last days this looks good at fierst sight,
> > on the oher hand there where days up to weeks with no panic, so god knows
>
> after two weeks and 27 Mio. accepted connections 5.0.4 crashed too
>
> "vmcore-dmesg" piped through "sort | uniq" is reduced to 399 lines
> containing just rate-limited "-j LOG" iptables events and nothing else
> repeatet 32487 times until the dedicated virtual disk was full
>
> what a mess.....
>
> -rw------- 1 harry verwaltung    0 2019-04-09 03:01 vmcore-incomplete
> -rw-r----- 1 harry verwaltung  93K 2019-04-09 03:09 filtered.txt
> -rw-r----- 1 harry verwaltung 2,9G 2019-04-09 03:01
> vmcore-dmesg-incomplete.txt
>
> cat vmcore-dmesg-incomplete.txt | grep "1248098\.543887" | wc -l
> 32487

Not surprised, we saw TB sized vmcore dmesg in our data center
due to disk errors flood.

I don't look into it, but it looks like a bug somewhere. Even we have
the default printk buffer size, the dmesg should not be so huge.
A blind guess would be something wrong in /proc/vmcore notes.

Did your kernel crash happen before or after the flooded iptables
log? Kernel is supposed to jump to the crash kernel immediately
after crash, so if not it could be a kernel kexec bug.

Thanks.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ