[<prev] [next>] [day] [month] [year] [list]
Message-ID: <947546736.71760.1454946810767.JavaMail.open-xchange@www.ud-mail.de>
Date: Mon, 8 Feb 2016 16:53:30 +0100 (CET)
From: Karol Herbst <nouveau@...olherbst.de>
To: linux-kernel <linux-kernel@...r.kernel.org>
Cc: rostedt <rostedt@...dmis.org>, mingo <mingo@...hat.com>,
nouveau <nouveau@...ts.freedesktop.org>
Subject: PROBLEM: mmiotracing issue with nvidia kernel module
Hi all,
some nouveau users and developers have an issue with mmiotracing the nvidia
driver. After some digging I think it might has something todo with hugepages
and/or the
tracing infrastructure in general,
but generally I am still clueless what is causing this.
Generally it can be triggered by starting a X server on the nvidia gpu with the
nvidia
kernel module loaded and mmiotracer active
below is the main part of the kernel log showing the issue. As you see the small
pages are handled nicely,
but the hugepage causes issues as in ffffc90010000000 to ffffc90010000fff might
be mapped right
and an access to 0xffffc90010001070 causes the hugepage for ffffc90010000000 to
be loaded a second time?
I am still guessing here, but this would explain the issue we are having.
Also this is quite urgent, because this really annoys a few devs and stops them
from REing the binary driver (myself included).
Many Thanks
Karol Herbst
kernel version:
Linux version 4.4.1-gentoo (root@) (gcc version 5.3.0 (Gentoo 5.3.0 p1.0,
pie-0.6.5) ) #11 SMP PREEMPT Mon Feb 8 08:42:46 CET 2016
cpuinfo:
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 60
model name : Intel(R) Core(TM) i7-4700MQ CPU @ 2.40GHz
stepping : 3
microcode : 0x17
cpu MHz : 2400.281
cache size : 6144 KB
physical id : 0
siblings : 8
core id : 0
cpu cores : 4
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb
rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc
aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg
fma cx16 xtpr pdcm pcid sse4_1 sse4_2 movbe popcnt tsc_deadline_timer aes xsave
avx f16c rdrand lahf_lm abm ida arat epb pln pts dtherm tpr_shadow vnmi
flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid
xsaveopt
bugs :
bogomips : 4789.40
clflush size : 64
cache_alignment : 64
address sizes : 39 bits physical, 48 bits virtual
power management:
kernel log:
Feb 07 22:29:34 kernel: mmiotrace: ioremap_*(0xf6000000, 0x1000) =
ffffc90000044000
Feb 07 22:29:34 kernel: mmiotrace: Unmapping ffffc90000044000.
Feb 07 22:29:34 kernel: mmiotrace: ioremap_*(0xf6000000, 0x1000) =
ffffc90000046000
Feb 07 22:29:34 kernel: mmiotrace: Unmapping ffffc90000046000.
Feb 07 22:29:34 kernel: nvidia-nvlink: Nvlink Core is being initialized, major
device number 246
Feb 07 22:29:34 kernel: [drm] Initialized nvidia-drm 0.0.0 20150116 for
0000:01:00.0 on minor 1
Feb 07 22:29:34 kernel: NVRM: loading NVIDIA UNIX x86_64 Kernel Module 361.18
Sat Jan 9 21:27:18 PST 2016
Feb 07 22:29:34 kernel: mmiotrace: ioremap_*(0xf6000000, 0x1000000) =
ffffc90010000000
Feb 07 22:29:34 kernel: mmiotrace: unexpected secondary hit for address
0xffffc90010001070 on CPU 0.
Feb 07 22:29:34 kernel: BUG: unable to handle kernel paging request at
ffff8800f6000008
Feb 07 22:29:35 kernel: IP: [<ffffffff81082a3d>] vmalloc_fault+0x1bd/0x280
Feb 07 22:29:35 kernel: PGD 218b067 PUD 42fdfd067 PMD 0
Feb 07 22:29:35 kernel: Oops: 0000 [#1] PREEMPT SMP
Feb 07 22:29:35 kernel: Modules linked in: nvidia(POE) btusb btintel zram iwldvm
iwlwifi
Feb 07 22:29:35 kernel: CPU: 0 PID: 3433 Comm: nvidia-smi Tainted: P U W
OE 4.4.1-gentoo #6
Feb 07 22:29:35 kernel: Hardware name: Notebook P15SM
/P15SM , BIOS 1.03.04PM v2
03/12/2014
Feb 07 22:29:35 kernel: task: ffff88038668a340 ti: ffff880090414000 task.ti:
ffff880090414000
Feb 07 22:29:35 kernel: RIP: 0010:[<ffffffff81082a3d>] [<ffffffff81082a3d>]
vmalloc_fault+0x1bd/0x280
Feb 07 22:29:35 kernel: RSP: 0018:ffff880090417948 EFLAGS: 00010082
Feb 07 22:29:35 kernel: RAX: ffff880000000000 RBX: ffff880000000008 RCX:
80000000f60001f2
Feb 07 22:29:35 kernel: RDX: 00000000f6000000 RSI: 0000000003d80000 RDI:
00003ffffffff000
Feb 07 22:29:35 kernel: RBP: ffffc90010001070 R08: 0000000000000080 R09:
00003ffffffff000
Feb 07 22:29:35 kernel: R10: 0000000000000000 R11: 0000000000000000 R12:
ffff8800904179d8
Feb 07 22:29:35 kernel: R13: ffff88038668a340 R14: ffffffffa033f0ba R15:
ffff880419c67700
Feb 07 22:29:35 kernel: FS: 00007f613acdf700(0000) GS:ffff88042fa00000(0000)
knlGS:0000000000000000
Feb 07 22:29:35 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Feb 07 22:29:35 kernel: CR2: ffff8800f6000008 CR3: 00000003c8fc3000 CR4:
00000000001406f0
Feb 07 22:29:35 kernel: Stack:
Feb 07 22:29:35 kernel: 0000000000000000 ffffffff81083b79 ffff880419c67768
0000000000000000
Feb 07 22:29:35 kernel: 0000000000000000 0000000000000000 0000000000000000
ffff8800904179d8
Feb 07 22:29:35 kernel: 0000000000000000 ffff8803c8f58cc8 ffffffffa033f0ba
0000000000000000
Feb 07 22:29:35 kernel: Call Trace:
Feb 07 22:29:35 kernel: [<ffffffff81083b79>] ? __do_page_fault+0x319/0x3b0
Feb 07 22:29:35 kernel: [<ffffffffa033f0ba>] ? _nv009391rm+0x177a/0x1a80
[nvidia]
Feb 07 22:29:35 kernel: [<ffffffff81083c4b>] ? do_page_fault+0x1b/0x60
Feb 07 22:29:35 kernel: [<ffffffff818cffa2>] ? page_fault+0x22/0x30
Feb 07 22:29:35 kernel: [<ffffffffa033f0ba>] ? _nv009391rm+0x177a/0x1a80
[nvidia]
Feb 07 22:29:35 kernel: [<ffffffffa033eff0>] ? _nv009391rm+0x16b0/0x1a80
[nvidia]
Feb 07 22:29:35 kernel: [<ffffffffa05aaa00>] ? _nv014076rm+0x10/0x40 [nvidia]
Feb 07 22:29:35 kernel: [<ffffffffa033f108>] ? _nv009391rm+0x17c8/0x1a80
[nvidia]
Feb 07 22:29:35 kernel: [<ffffffffa034b7d0>] ? _nv009390rm+0x30/0x50 [nvidia]
Feb 07 22:29:35 kernel: [<ffffffffa034b828>] ? _nv009361rm+0x18/0x30 [nvidia]
Feb 07 22:29:35 kernel: [<ffffffffa033cd33>] ? _nv004860rm+0x4e3/0x10c0
[nvidia]
Feb 07 22:29:35 kernel: [<ffffffffa0535b11>] ? _nv009636rm+0x151/0x2e0 [nvidia]
Feb 07 22:29:35 kernel: [<ffffffffa05b9494>] ? _nv014158rm+0x384/0x4f0 [nvidia]
Feb 07 22:29:35 kernel: [<ffffffffa05bb01a>] ? _nv000726rm+0xca/0x6b0 [nvidia]
Feb 07 22:29:35 kernel: [<ffffffffa05b04aa>] ? rm_init_adapter+0x6a/0x100
[nvidia]
Feb 07 22:29:35 kernel: [<ffffffff811035b0>] ? __setup_irq+0x400/0x580
Feb 07 22:29:35 kernel: [<ffffffffa0099079>] ? nv_open_device+0x109/0x5d0
[nvidia]
Feb 07 22:29:35 kernel: [<ffffffffa0099738>] ? nvidia_open+0x138/0x2c0 [nvidia]
Feb 07 22:29:35 kernel: [<ffffffffa00982d0>] ? nvidia_frontend_open+0x50/0xb0
[nvidia]
Feb 07 22:29:35 kernel: [<ffffffff811b8346>] ? chrdev_open+0x96/0x1b0
Feb 07 22:29:35 kernel: [<ffffffff811b82b0>] ? cdev_put+0x20/0x20
Feb 07 22:29:35 kernel: [<ffffffff811b261b>] ? do_dentry_open+0x1eb/0x2e0
Feb 07 22:29:35 kernel: [<ffffffff811c09b8>] ? path_openat+0x4b8/0x1010
Feb 07 22:29:35 kernel: [<ffffffff811c1d6e>] ? filename_lookup+0xae/0x110
Feb 07 22:29:35 kernel: [<ffffffff811c2729>] ? do_filp_open+0x79/0xd0
Feb 07 22:29:35 kernel: [<ffffffff818cdd2d>] ? _raw_write_unlock+0xd/0x20
Feb 07 22:29:35 kernel: [<ffffffff811ce0e6>] ? __alloc_fd+0xb6/0x180
Feb 07 22:29:35 kernel: [<ffffffff811b3a08>] ? do_sys_open+0x128/0x210
Feb 07 22:29:35 kernel: [<ffffffff818ce2d7>] ?
entry_SYSCALL_64_fastpath+0x12/0x6a
Feb 07 22:29:35 kernel: Code: 00 48 21 f2 48 89 d6 48 c1 ee 06 48 39 f0 0f 85 8d
00 00 00 48 b8 00 00 00 00 00 88 ff ff 48 c1 eb 09 81 e3 f8 0f 00 00 48 01 c3
<48> 8b 34 13 f7 c6 01 01 00 00 74 6c 48 b8 00 f0 ff ff ff 3f 00
Feb 07 22:29:35 kernel: RIP [<ffffffff81082a3d>] vmalloc_fault+0x1bd/0x280
Feb 07 22:29:36 kernel: RSP <ffff880090417948>
Feb 07 22:29:36 kernel: CR2: ffff8800f6000008
Feb 07 22:29:36 kernel: ---[ end trace f8c4ae57609eb500 ]---
Powered by blists - more mailing lists