lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180104152516.3sql2ayoemlephig@toau>
Date:   Thu, 4 Jan 2018 16:25:16 +0100
From:   Thomas Zeitlhofer <thomas.zeitlhofer+lkml@...it.at>
To:     Greg Kroah-Hartman <gregkh@...uxfoundation.org>
Cc:     Thomas Gleixner <tglx@...utronix.de>,
        Hugh Dickins <hughd@...gle.com>, linux-kernel@...r.kernel.org
Subject: Re: "BUG: using smp_processor_id() in preemptible" with KPTI on
 4.14.11

On Thu, Jan 04, 2018 at 01:55:28PM +0100, Greg Kroah-Hartman wrote:
> On Thu, Jan 04, 2018 at 01:43:20PM +0100, Thomas Zeitlhofer wrote:
> > On Thu, Jan 04, 2018 at 11:51:11AM +0100, Greg Kroah-Hartman wrote:
> > > On Thu, Jan 04, 2018 at 11:20:29AM +0100, Thomas Zeitlhofer wrote:
> > > > On Thu, Jan 04, 2018 at 02:59:06AM +0100, Thomas Zeitlhofer wrote:
> > > > > Hello,
> > > > > 
> > > > > on an Ivybridge CPU, I get with 4.14.11:
> > > > > 
> > > > >    BUG: using smp_processor_id() in preemptible [00000000] code: ovsdb-server/4510
> > > > >    caller is native_flush_tlb_single+0x57/0xc0
> > > > >    CPU: 3 PID: 4510 Comm: ovsdb-server Not tainted 4.14.11-kvm-00434-gcd0b8eb84f5c #3
> > > > >    Hardware name: MSI MS-7798/B75MA-P45 (MS-7798), BIOS V1.9 09/30/2013
> > > > >    Call Trace:
> > > > >     dump_stack+0x5c/0x86
> > > > >     check_preemption_disabled+0xdd/0xe0
> > > > >     native_flush_tlb_single+0x57/0xc0
> > > > >     ? __set_pte_vaddr+0x2d/0x40
> > > > >     __set_pte_vaddr+0x2d/0x40
> > > > >     set_pte_vaddr+0x2f/0x40
> > > > >     cea_set_pte+0x30/0x40
> > > > >     ds_update_cea.constprop.4+0x4d/0x70
> > > > >     reserve_ds_buffers+0x159/0x410
> > > > >     ? wp_page_copy+0x36d/0x6a0
> > > > >     x86_reserve_hardware+0x150/0x160
> > > > >     x86_pmu_event_init+0x3e/0x1f0
> > > > >     perf_try_init_event+0x69/0x80
> > > > >     perf_event_alloc+0x652/0x740
> > > > >     SyS_perf_event_open+0x3f6/0xd60
> > > > >     do_syscall_64+0x5c/0x190
> > > > >     entry_SYSCALL64_slow_path+0x25/0x25
> > > > >    RIP: 0033:0x74a1d94580b9
> > > > >    RSP: 002b:00007fff0c01d5d8 EFLAGS: 00000206 ORIG_RAX: 000000000000012a
> > > > >    RAX: ffffffffffffffda RBX: 00007fff0c01d7b0 RCX: 000074a1d94580b9
> > > > >    RDX: 00000000ffffffff RSI: 0000000000000000 RDI: 00007fff0c01d5e0
> > > > >    RBP: 0000000000000000 R08: 0000000000000000 R09: 0000007000000000
> > > > >    R10: 00000000ffffffff R11: 0000000000000206 R12: 0000000000000008
> > > > >    R13: 0000000000000000 R14: 00007fff0c01d790 R15: 00005df43a799600
> > > > > 
> > > > > This does not show up when booting with pti=off.
> > > > > 
> > > > > Maybe it is related to the issue that is fixed for the upcoming 4.4.110
> > > > > release by https://lkml.org/lkml/2018/1/3/692
> > > 
> > > I don't understand this link.  
> > 
> > I found that link when trying to search for the error message. That
> > patch touches __native_flush_tlb_single() and mentions hardware
> > differences in Ivybridge and below:
> > 
> > 	"We have many machines (Westmere, Sandybridge, Ivybridge)
> > 	supporting PCID but not INVPCID..."
> > 
> > As I see the error message only on Ivybridge and not on Haswell, I came
> > up with the vague guess that this could be related.
> > 
> > > The 4.4 and 4.9 backports are much different than the 4.14 tree.
> > 
> > Yes, I have seen that.
> > 
> > > > JFYI, the very same kernel does not show this issue on a Haswell CPU.
> > > 
> > > I have now queued up a bunch of patches that are in Linus's tree, can
> > > you test these out as well:
> > > 	https://git.kernel.org/pub/scm/linux/kernel/git/stable/stable-queue.git/tree/queue-4.14
> > 
> > Does not seem to make any difference - with those patches applied I
> > still get:
> > 
> >    BUG: using smp_processor_id() in preemptible [00000000] code: ovsdb-server/4383
> >    caller is native_flush_tlb_single+0x57/0xc0
> >    CPU: 3 PID: 4383 Comm: ovsdb-server Not tainted 4.14.11-kvm-00435-g3138001170c9 #3
> >    Hardware name: MSI MS-7798/B75MA-P45 (MS-7798), BIOS V1.9 09/30/2013
> >    Call Trace:
> >     dump_stack+0x5c/0x86
> >     check_preemption_disabled+0xdd/0xe0
> >     native_flush_tlb_single+0x57/0xc0
> >     ? __set_pte_vaddr+0x2d/0x40
> >     __set_pte_vaddr+0x2d/0x40
> >     set_pte_vaddr+0x2f/0x40
> >     cea_set_pte+0x30/0x40
> >     ds_update_cea.constprop.4+0x4d/0x70
> >     reserve_ds_buffers+0x159/0x410
> >     ? wp_page_copy+0x36d/0x6a0
> >     x86_reserve_hardware+0x150/0x160
> >     x86_pmu_event_init+0x3e/0x1f0
> >     perf_try_init_event+0x69/0x80
> >     perf_event_alloc+0x652/0x740
> >     SyS_perf_event_open+0x3f6/0xd60
> >     do_syscall_64+0x5c/0x190
> >     entry_SYSCALL64_slow_path+0x25/0x25
> >    RIP: 0033:0x755c0b8580b9
> >    RSP: 002b:00007fffc87cf9e8 EFLAGS: 00000206 ORIG_RAX: 000000000000012a
> >    RAX: ffffffffffffffda RBX: 00007fffc87cfbc0 RCX: 0000755c0b8580b9
> >    RDX: 00000000ffffffff RSI: 0000000000000000 RDI: 00007fffc87cf9f0
> >    RBP: 0000000000000000 R08: 0000000000000000 R09: 0000007000000000
> >    R10: 00000000ffffffff R11: 0000000000000206 R12: 0000000000000008
> >    R13: 0000000000000000 R14: 00007fffc87cfba0 R15: 000062ea2cbff600
> > 
> 
> Odd, does 4.15-rc6 also trigger the same error? 

Yes:

   BUG: using smp_processor_id() in preemptible [00000000] code: ovsdb-server/4498
   caller is native_flush_tlb_single+0x57/0xc0
   CPU: 2 PID: 4498 Comm: ovsdb-server Not tainted 4.15.0-rc6-kvm-00423-gea1908c252eb #3
   Hardware name: MSI MS-7798/B75MA-P45 (MS-7798), BIOS V1.9 09/30/2013
   Call Trace:
    dump_stack+0x5c/0x86
    check_preemption_disabled+0xdd/0xe0
    native_flush_tlb_single+0x57/0xc0
    ? __set_pte_vaddr+0x2d/0x40
    __set_pte_vaddr+0x2d/0x40
    set_pte_vaddr+0x2f/0x40
    cea_set_pte+0x30/0x40
    ds_update_cea.constprop.4+0x4d/0x70
    reserve_ds_buffers+0x159/0x410
    ? wp_page_copy+0x370/0x6c0
    x86_reserve_hardware+0x150/0x160
    x86_pmu_event_init+0x3e/0x1f0
    perf_try_init_event+0x69/0x80
    perf_event_alloc+0x652/0x740
    SyS_perf_event_open+0x3f6/0xd60
    do_syscall_64+0x5c/0x190
    entry_SYSCALL64_slow_path+0x25/0x25
   RIP: 0033:0x72bff0a3c0b9
   RSP: 002b:00007ffed11c2f18 EFLAGS: 00000206 ORIG_RAX: 000000000000012a
   RAX: ffffffffffffffda RBX: 00007ffed11c30f0 RCX: 000072bff0a3c0b9
   RDX: 00000000ffffffff RSI: 0000000000000000 RDI: 00007ffed11c2f20
   RBP: 0000000000000000 R08: 0000000000000000 R09: 0000007000000000
   R10: 00000000ffffffff R11: 0000000000000206 R12: 0000000000000008
   R13: 0000000000000000 R14: 00007ffed11c30d0 R15: 000060986ecfb600
   device ovs-system entered promiscuous mode
   netlink: 'ovs-vswitchd': attribute type 5 has an invalid length.

In addition, with v4.15-rc6, netlink messages like in the last line show
up, but I guess this is a different openvswitch related issue.

> Thomas is working on an
> issue with KALSR (see lkml with:
> 	Subject: Re: "bad pmd" errors + oops with KPTI on 4.14.11 after loading X.509 certs
> )

Yes, I have also seen that thread, but I did not see any similarities to
my issue. Anyway, I also tried out the patch proposed in
https://lkml.org/lkml/2018/1/4/313 but it does not change anything here.

Thanks,

Thomas

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ