linux-kernel - Re: [BUG 4.2-rc8] Interrupt occurs while apply

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20150901062022.GA19002@redhat.com>
Date:	Tue, 1 Sep 2015 07:20:23 +0100
From:	"Richard W.M. Jones" <rjones@...hat.com>
To:	Chuck Ebbert <cebbert.lkml@...il.com>
Cc:	linux-kernel@...r.kernel.org, x86@...nel.org,
	Thomas Gleixner <tglx@...utronix.de>,
	Ingo Molnar <mingo@...hat.com>,
	"H. Peter Anvin" <hpa@...or.com>
Subject: Re: [BUG 4.2-rc8] Interrupt occurs while apply_alternatives() is
 patching the handler

On Sun, Aug 30, 2015 at 10:37:57PM -0400, Chuck Ebbert wrote:
> This is from https://bugzilla.redhat.com/show_bug.cgi?id=1258223
> 
> [    0.036000] BUG: unable to handle kernel paging request at 55501e06
[...]
> [    0.036000]  [<c0409c80>] ? add_nops+0x90/0xa0
> [    0.036000]  [<c040a054>] apply_alternatives+0x274/0x630
> [    0.036000]  [<c07f1cf0>] ? wait_for_xmitr+0xa0/0xa0
> [    0.036000]  [<c071a6fc>] ? sprintf+0x1c/0x20
> [    0.036000]  [<c0aae480>] ? irq_entries_start+0x698/0x698
> [    0.036000]  [<c071be4b>] ? memcpy+0xb/0x30
> [    0.036000]  [<c07f3950>] ? serial8250_set_termios+0x20/0x20
[...]
> Interrupt 0x30 occurred while the alternatives code was replacing the
> initial 0x90,0x90,0x90 NOPs (from the ASM_CLAC macro) with the optimized
> version, 0x8d,0x76,0x00. Only the first byte has been replaced so far,
> and it makes a mess out of the insn decoding.

Chuck, thanks for reporting this.

I have only been able to reproduce this so far using qemu and TCG (not
KVM) which of course raises a range of questions: could it be a qemu
bug or a TCG bug?  Could it be that an atomic op is not correctly
implemented by qemu?  I will keep trying on KVM.  Because I don't have
a convenient server with 32 bit kernel and a serial port that I can
reboot thousands of times, I have not tried to reproduce on baremetal yet.

Here's how to reproduce it.  (The host can be x86-64)

(1) Grab the 32 bit Fedora kernel we are using from
https://kojipkgs.fedoraproject.org//packages/kernel/4.2.0/1.fc24/i686/kernel-core-4.2.0-1.fc24.i686.rpm
(from http://koji.fedoraproject.org/koji/buildinfo?buildID=681723)

(2) Unpack it to extract vmlinuz:

cd /tmp
rpm2cpio /mnt/scratch/kernel-core-4.2.0-1.fc24.i686.rpm | cpio -id
cp ./lib/modules/4.2.0-1.fc24.i686/vmlinuz .

(3) Boot the kernel under qemu/KVM.  The following single line command
repeatedly boots the kernel until the bug is hit:

while qemu-system-x86_64 -nographic -no-reboot -M accel=kvm:tcg -kernel vmlinuz -append 'console=ttyS0 panic=1' -serial stdio -monitor none >& log; ! grep add_nops log; do echo -n .; done

It takes many iterations (100s with TCG) to hit the bug.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-p2v converts physical machines to virtual machines.  Boot with a
live CD or over the network (PXE) and turn machines into KVM guests.
http://libguestfs.org/virt-v2v
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/