lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <CABXGCsNDYzcpDCM5P0fVWF30N+TMD62CXjv902z39mrCWULsjA@mail.gmail.com>
Date:   Mon, 5 Aug 2019 03:23:42 +0500
From:   Mikhail Gavrilov <mikhail.v.gavrilov@...il.com>
To:     amd-gfx list <amd-gfx@...ts.freedesktop.org>, linux-mm@...ck.org,
        Linux List Kernel Mailing <linux-kernel@...r.kernel.org>,
        dri-devel@...ts.freedesktop.org
Subject: The issue with page allocation 5.3 rc1-rc2 (seems drm culprit here)

Hi folks,
Two weeks ago when commit 22051d9c4a57 coming to my system.
Started happen randomly errors:
"gnome-shell: page allocation failure: order:4,
mode:0x40cc0(GFP_KERNEL|__GFP_COMP),
nodemask=(null),cpuset=/,mems_allowed=0"
Symptoms:
The screen goes out as in energy saving.
And it is impossible to wake the computer in a few minutes.

I am making bisect and looks like the first bad commit is 476e955dd679.
Here full bisect logs: https://mega.nz/#F!kgYFxAIb!v1tcHANPy2ns1lh4LQLeIg

I wrote about my find to the amd-gfx mailing list, but no one answer me.
Until yesterday, I thought it was a bug in the amdgpu driver.
But yesterday, after the next occurrence of an error, the system hangs
completely already with another error.

[ 3225.317560] BUG: unable to handle page fault for address: 000000000000c9f4
[ 3225.317562] #PF: supervisor read access in kernel mode
[ 3225.317563] #PF: error_code(0x0000) - not-present page
[ 3225.317565] PGD 0 P4D 0
[ 3225.317567] Oops: 0000 [#1] SMP NOPTI
[ 3225.317571] CPU: 2 PID: 12717 Comm: Xorg Tainted: G        W
 5.3.0-0.rc2.git4.1.fc31.x86_64 #1
[ 3225.317572] Hardware name: System manufacturer System Product
Name/ROG STRIX X470-I GAMING, BIOS 2406 06/21/2019
[ 3225.317625] RIP: 0010:dc_resource_state_copy_construct+0x18/0xf0 [amdgpu]
[ 3225.317627] Code: 00 49 83 c4 01 44 39 e0 7f b5 5b 5d 41 5c 41 5d
c3 c3 0f 1f 44 00 00 41 56 ba f8 c9 00 00 41 55 41 54 49 89 f4 55 4c
89 e5 53 <44> 8b ae f4 c9 00 00 48 89 fe 4c 89 e7 e8 16 86 48 f7 49 8d
84 24
[ 3225.317630] RSP: 0018:ffffb439c3e377d0 EFLAGS: 00010246
[ 3225.317631] RAX: ffff9b0ba19a0000 RBX: ffffffffc08380b0 RCX: 0000000000000006
[ 3225.317633] RDX: 000000000000c9f8 RSI: 0000000000000000 RDI: ffff9b0ab7fc0000
[ 3225.317635] RBP: 0000000000000000 R08: 000002eef3c694b7 R09: 0000000000000000
[ 3225.317636] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[ 3225.317638] R13: ffff9b0bb5381000 R14: ffff9b09acc68598 R15: ffff9b09acc68540
[ 3225.317640] FS:  00007fdde56cbf00(0000) GS:ffff9b0bba400000(0000)
knlGS:0000000000000000
[ 3225.317641] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3225.317643] CR2: 000000000000c9f4 CR3: 00000007382ee000 CR4: 00000000003406e0
[ 3225.317644] Call Trace:
[ 3225.317714]  amdgpu_dm_atomic_commit_tail.cold+0xad/0xe1 [amdgpu]
[ 3225.317719]  ? lockdep_hardirqs_on+0xf0/0x180
[ 3225.317723]  ? debug_check_no_obj_freed+0x107/0x1d8
[ 3225.317786]  ? dm_determine_update_type_for_commit+0x34c/0x420 [amdgpu]
[ 3225.317850]  ? dm_determine_update_type_for_commit+0x34c/0x420 [amdgpu]
[ 3225.317855]  ? kfree+0x1b6/0x3b0
[ 3225.317918]  ? dm_determine_update_type_for_commit+0x34c/0x420 [amdgpu]
[ 3225.317923]  ? __lock_acquire+0x247/0x1910
[ 3225.317928]  ? find_held_lock+0x32/0x90
[ 3225.317931]  ? mark_held_locks+0x50/0x80
[ 3225.317934]  ? _raw_spin_unlock_irq+0x29/0x40
[ 3225.317937]  ? lockdep_hardirqs_on+0xf0/0x180
[ 3225.317939]  ? _raw_spin_unlock_irq+0x29/0x40
[ 3225.317942]  ? wait_for_completion_timeout+0x75/0x190
[ 3225.317954]  ? commit_tail+0x3c/0x70 [drm_kms_helper]
[ 3225.317960]  commit_tail+0x3c/0x70 [drm_kms_helper]
[ 3225.317968]  drm_atomic_helper_commit+0xe3/0x150 [drm_kms_helper]
[ 3225.317975]  drm_atomic_helper_disable_plane+0x82/0xb0 [drm_kms_helper]
[ 3225.317994]  drm_mode_cursor_universal+0x12c/0x240 [drm]
[ 3225.318011]  drm_mode_cursor_common+0xd8/0x230 [drm]
[ 3225.318026]  ? drm_mode_setplane+0x1a0/0x1a0 [drm]
[ 3225.318038]  drm_mode_cursor_ioctl+0x4d/0x70 [drm]
[ 3225.318049]  drm_ioctl_kernel+0xaa/0xf0 [drm]
[ 3225.318061]  drm_ioctl+0x208/0x390 [drm]
[ 3225.318075]  ? drm_mode_setplane+0x1a0/0x1a0 [drm]
[ 3225.318079]  ? lockdep_hardirqs_on+0xf0/0x180
[ 3225.318145]  amdgpu_drm_ioctl+0x49/0x80 [amdgpu]
[ 3225.318164]  do_vfs_ioctl+0x411/0x750
[ 3225.318175]  ksys_ioctl+0x5e/0x90
[ 3225.318179]  __x64_sys_ioctl+0x16/0x20
[ 3225.318188]  do_syscall_64+0x5c/0xb0
[ 3225.318191]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 3225.318194] RIP: 0033:0x7fdde5b4007b
[ 3225.318203] Code: 0f 1e fa 48 8b 05 0d 9e 0c 00 64 c7 00 26 00 00
00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00
00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d dd 9d 0c 00 f7 d8 64 89
01 48
[ 3225.318209] RSP: 002b:00007ffec481a6d8 EFLAGS: 00000246 ORIG_RAX:
0000000000000010
[ 3225.318213] RAX: ffffffffffffffda RBX: 00007ffec481a710 RCX: 00007fdde5b4007b
[ 3225.318215] RDX: 00007ffec481a710 RSI: 00000000c01c64a3 RDI: 000000000000000e
[ 3225.318217] RBP: 00000000c01c64a3 R08: 0000000000000080 R09: 0000000000000000
[ 3225.318218] R10: 0000000000000004 R11: 0000000000000246 R12: 00000000000006f1
[ 3225.318220] R13: 000000000000000e R14: 000056201b5b5490 R15: 000056201bbe7820
[ 3225.318225] Modules linked in: macvtap macvlan tap rfcomm
xt_CHECKSUM xt_MASQUERADE nf_nat_tftp nf_conntrack_tftp tun bridge stp
llc nf_conntrack_netbios_ns nf_conntrack_broadcast xt_CT ip6t_REJECT
nf_reject_ipv6 ip6t_rpfilter ipt_REJECT nf_reject_ipv4 xt_conntrack
ebtable_nat ip6table_nat ip6table_mangle ip6table_raw
ip6table_security iptable_nat nf_nat iptable_mangle iptable_raw
iptable_security nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c
ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables
iptable_filter cmac bnep sunrpc vfat fat snd_hda_codec_realtek
edac_mce_amd snd_hda_codec_generic ledtrig_audio kvm_amd rtwpci
snd_hda_codec_hdmi rtw88 kvm snd_hda_intel snd_usb_audio snd_hda_codec
mac80211 snd_hda_core snd_usbmidi_lib irqbypass snd_rawmidi uvcvideo
snd_hwdep snd_seq videobuf2_vmalloc videobuf2_memops btusb
videobuf2_v4l2 crct10dif_pclmul snd_seq_device videobuf2_common btrtl
crc32_pclmul eeepc_wmi snd_pcm btbcm btintel asus_wmi xpad snd_timer
sparse_keymap
[ 3225.318261]  videodev ff_memless bluetooth joydev
ghash_clmulni_intel cfg80211 video snd mc k10temp wmi_bmof soundcore
ecdh_generic sp5100_tco ecc rfkill ccp i2c_piix4 libarc4 gpio_amdpt
gpio_generic acpi_cpufreq binfmt_misc ip_tables hid_logitech_hidpp
amdgpu amd_iommu_v2 gpu_sched ttm drm_kms_helper drm igb crc32c_intel
dca i2c_algo_bit hid_logitech_dj nvme nvme_core wmi pinctrl_amd
[ 3225.318283] CR2: 000000000000c9f4

Every time when I see "SMP NOPTI" error I think that something wrong
happens with memory management.
So I decided to ask for help on the linux-mm mailing list.
Anyway for unknown reasons AMD developers ignored me.

Thanks.

--
Best Regards,
Mike Gavrilov.

View attachment "dmesg.txt" of type "text/plain" (229742 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ