linux-kernel - Re: [BUG] i2c_nvidia_gpu takes long time and makes system suspend & resume failed with NVIDIA cards

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20200402103447.GD1886416@kuha.fi.intel.com>
Date:   Thu, 2 Apr 2020 13:34:47 +0300
From:   Heikki Krogerus <heikki.krogerus@...ux.intel.com>
To:     Jian-Hong Pan <jian-hong@...lessm.com>,
        Ajay Gupta <ajayg@...dia.com>
Cc:     linux-i2c@...r.kernel.org,
        Linux Kernel <linux-kernel@...r.kernel.org>,
        linux-usb@...r.kernel.org,
        Linux Upstreaming Team <linux@...lessm.com>
Subject: Re: [BUG] i2c_nvidia_gpu takes long time and makes system suspend &
 resume failed with NVIDIA cards

Hi,

On Thu, Apr 02, 2020 at 06:22:14PM +0800, Jian-Hong Pan wrote:
> Hi,
> 
> We got some machines like Acer desktop equipped with NVIDIA GTX 1660
> card, Acer Predator PH315-52 equipped with NVIDIA GeForce RTX 2060
> Mobile and ASUS UX581LV equipped with NNVIDIA GeForce RTX 2060.
> We found them take long time (more than 50 seconds) to resume after
> suspend.  During the resuming time, the screen is blank.  And check
> the dmesg, found the error during resume:
> 
> [   28.060831] PM: suspend entry (deep)
> [   28.144260] Filesystems sync: 0.083 seconds
> [   28.150219] Freezing user space processes ...
> [   48.153282] Freezing of tasks failed after 20.003 seconds (1 tasks
> refusing to freeze, wq_busy=0):
> [   48.153447] systemd-udevd   D13440   382    330 0x80004124
> [   48.153457] Call Trace:
> [   48.153504]  ? __schedule+0x272/0x5a0
> [   48.153558]  ? hrtimer_start_range_ns+0x18c/0x2c0
> [   48.153622]  schedule+0x45/0xb0
> [   48.153668]  schedule_hrtimeout_range_clock+0x8f/0x100
> [   48.153738]  ? hrtimer_init_sleeper+0x80/0x80
> [   48.153798]  usleep_range+0x5a/0x80
> [   48.153850]  gpu_i2c_check_status.isra.0+0x3a/0xa0 [i2c_nvidia_gpu]
> [   48.153933]  gpu_i2c_master_xfer+0x155/0x20e [i2c_nvidia_gpu]
> [   48.154012]  __i2c_transfer+0x163/0x4c0
> [   48.154067]  i2c_transfer+0x6e/0xc0
> [   48.154120]  ccg_read+0x11f/0x170 [ucsi_ccg]
> [   48.154182]  get_fw_info+0x17/0x50 [ucsi_ccg]
> [   48.154242]  ucsi_ccg_probe+0xf4/0x200 [ucsi_ccg]
> [   48.154312]  ? ucsi_ccg_init+0xe0/0xe0 [ucsi_ccg]
> [   48.154377]  i2c_device_probe+0x113/0x210
> [   48.154435]  really_probe+0xdf/0x280
> [   48.154487]  driver_probe_device+0x4b/0xc0
> [   48.154545]  device_driver_attach+0x4e/0x60
> [   48.154604]  __driver_attach+0x44/0xb0
> [   48.154657]  ? device_driver_attach+0x60/0x60
> [   48.154717]  bus_for_each_dev+0x6c/0xb0
> [   48.154772]  bus_add_driver+0x172/0x1c0
> [   48.154824]  driver_register+0x67/0xb0
> [   48.154877]  i2c_register_driver+0x39/0x70
> [   48.154932]  ? 0xffffffffc00ac000
> [   48.154978]  do_one_initcall+0x3e/0x1d0
> [   48.155032]  ? free_vmap_area_noflush+0x8d/0xe0
> [   48.155093]  ? _cond_resched+0x10/0x20
> [   48.155145]  ? kmem_cache_alloc_trace+0x3a/0x1b0
> [   48.155208]  do_init_module+0x56/0x200
> [   48.155260]  load_module+0x21fe/0x24e0
> [   48.155322]  ? __do_sys_finit_module+0xbf/0xe0
> [   48.155381]  __do_sys_finit_module+0xbf/0xe0
> [   48.155441]  do_syscall_64+0x3d/0x130
> [   48.156841]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [   48.158074] RIP: 0033:0x7fba3b4bc2a9
> [   48.158707] Code: Bad RIP value.
> [   48.158990] RSP: 002b:00007ffe1da3a6d8 EFLAGS: 00000246 ORIG_RAX:
> 0000000000000139
> [   48.159259] RAX: ffffffffffffffda RBX: 000055ca6922c470 RCX: 00007fba3b4bc2a9
> [   48.159566] RDX: 0000000000000000 RSI: 00007fba3b3c0cad RDI: 0000000000000010
> [   48.159842] RBP: 00007fba3b3c0cad R08: 0000000000000000 R09: 0000000000000000
> [   48.160117] R10: 0000000000000010 R11: 0000000000000246 R12: 0000000000000000
> [   48.160412] R13: 000055ca6922f940 R14: 0000000000020000 R15: 000055ca6922c470
> 
> I have filed this to bugzilla and more detail:
> https://bugzilla.kernel.org/show_bug.cgi?id=206653
> 
> Any comment will be appreciated.

You are using an outdated kernel, 5.4.0. Please make sure that you can
reproduce the issue with mainline, or at least with the longterm
5.4.x.

Ajay, based on the backtrace, the issue seems to be starting from your
I2C driver. Please take a look at this.


thanks,

-- 
heikki