linux-kernel - Re: Major KVM issues with kernel 4.5 on the host

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20160421165106.GK28821@pd.tnic>
Date:	Thu, 21 Apr 2016 18:51:06 +0200
From:	Borislav Petkov <bp@...en8.de>
To:	Marc Haber <mh+linux-kernel@...schlus.de>
Cc:	Paolo Bonzini <pbonzini@...hat.com>, linux-kernel@...r.kernel.org,
	kvm ML <kvm@...r.kernel.org>
Subject: Re: Major KVM issues with kernel 4.5 on the host

On Thu, Apr 21, 2016 at 04:50:05PM +0200, Marc Haber wrote:
> What bothers me is that since I ended up with a "suspect" commit that
> actually results in a "good" kernel (running for 22 hours now), I must
> have said "bad" to an actually "good" kernel, which means that I had
> an unrelated crash or corruption. Is that reasoning correct?

Hmm, did that "unrelated crash or corruption" have the same symptoms as
the original one?

> That one qualified as "good" six days ago. I'll retry, maybe I just
> didn't wait long enough.

So if the trigger time is varying so much, I'd try to double that to
make sure I'm fairly certain about each commit I'm testing.

Also, this is a single box we're talking about, right? And you're sure
it hasn't had any corruption issues so far?

I see you have amd64_edac loading, so it must have ECC DIMMs. Have you
had any reports in the past of ECC errors in dmesg? Or other MCEs,
lockups, etc? Can you grep your logs for stuff like "hardware error",
"mce", "edac" etc? Do a case-insensitive search.

> "Trying" means make oldconfig, make deb-pkg in my case right? Does it
> matter what I answer to the numerous config questions that keep coming
> up during the oldconfig step?

What I do is:

$ git bisect <good|bad>

to mark the current commit after having tested it. Then I do

$ yes "" | make oldconfig

to set the new config options. Then

$ make -j7
$ make modules_install install

and reboot into the new kernel. Kernel name will possibly change each
time so I write down on paper which kernel I'm testing. You can verify
when booting it by doing:

$ dmesg | head
[    0.000000] Linux version 4.6.0-rc2+ (boris@pd) (gcc version 5.3.1 20160101 (Debian 5.3.1-5) ) #1 SMP PREEMPT Wed Apr 6 20:22:51 CEST 2016
...

that date at the end of the line and number "#1" should be current.
Number is also in .version and gets issued when you finish building:

Kernel: arch/x86/boot/bzImage is ready  (#1)

> Would it help to explicitly mark
> 0e749e54244eec87b2a3cd0a4314e60bc6781115 as good so that the knowledge
> gained during the last week is not completely lost?

I'd do the whole thing again, just to be sure.

I know, bisection is very time-consuming :-\ And it is particularly
annoying if it is done on the box I'm normally using daily.

> So I need to git log | grep 46896c73c1a4 and apply the patch again
> each time the commit is found?

I think you can let git do that for ya:

$ git branch --contains 46896c73c1a4
* (HEAD detached at 46896c73c1a4)

that lists that the current checked out HEAD contains that commit. If you do

$ git checkout 46896c73c1a4~1

then that "(HEAD detached..." line is not in the list of branches
containing it.

HTH.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.