linux-kernel - Re: mainline/master boot bisection: v4.20-rc5-79-gabb8d6ecbd8f on jetson-tk1

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20e8dbdc-a49e-8c2a-b4dd-fcbc3bbb9440@collabora.com>
Date:   Mon, 10 Dec 2018 18:47:30 +0000
From:   Guillaume Tucker <guillaume.tucker@...labora.com>
To:     Steven Rostedt <rostedt@...dmis.org>,
        Ravi Bangoria <ravi.bangoria@...ux.ibm.com>
Cc:     Srikar Dronamraju <srikar@...ux.vnet.ibm.com>,
        tomeu.vizoso@...labora.com, Oleg Nesterov <oleg@...hat.com>,
        broonie@...nel.org, matthew.hart@...aro.org, khilman@...libre.com,
        enric.balletbo@...labora.com, Namhyung Kim <namhyung@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        linux-kernel@...r.kernel.org, Ingo Molnar <mingo@...hat.com>,
        Jiri Olsa <jolsa@...hat.com>,
        Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
        Arnaldo Carvalho de Melo <acme@...nel.org>
Subject: Re: mainline/master boot bisection: v4.20-rc5-79-gabb8d6ecbd8f on
 jetson-tk1

On 10/12/2018 18:19, Steven Rostedt wrote:
> On Mon, 10 Dec 2018 16:23:19 +0530
> Ravi Bangoria <ravi.bangoria@...ux.ibm.com> wrote:
> 
>> Hi,
>>
>> Can you please provide more details. I don't understand how this patch
>> can cause boot failure.
>>
>> >From the log found at  
>> https://storage.kernelci.org/mainline/master/v4.20-rc5-79-gabb8d6ecbd8f/arm/multi_v7_defconfig+CONFIG_EFI=y+CONFIG_ARM_LPAE=y/lab-baylibre/boot-tegra124-jetson-tk1.html
>>
>> 23:21:06.680269  [    7.500733] Unable to handle kernel NULL pointer dereference at virtual address 00000064
>> 23:21:06.680455  [    7.508893] pgd = (ptrval)
>> 23:21:06.721940  [    7.511591] [00000064] *pgd=ad7d8003, *pmd=f9d5d003
>> 23:21:06.722241  [    7.516500] Internal error: Oops: 207 [#1] SMP ARM
>>  ...
>> 23:21:06.722724  [    7.546706] CPU: 0 PID: 122 Comm: udevd Not tainted 4.20.0-rc5 #1
>> 23:21:06.722911  [    7.552785] Hardware name: NVIDIA Tegra SoC (Flattened Device Tree)
>> 23:21:06.765203  [    7.559045] PC is at drm_plane_register_all+0x18/0x50
>> 23:21:06.765493  [    7.564094] LR is at drm_modeset_register_all+0xc/0x6c
>> 23:21:06.765698  [    7.569217] pc : [<c09a8700>]    lr : [<c09ab240>]    psr: a0000013
>> 23:21:06.765882  [    7.575470] sp : c3451c70  ip : 2d827000  fp : c1804c48
>> 23:21:06.766053  [    7.580680] r10: 00000000  r9 : ec9cc300  r8 : 00000000
>> 23:21:06.766229  [    7.585893] r7 : bf193c80  r6 : 00000000  r5 : c3694224  r4 : fffffffc
>> 23:21:06.766403  [    7.592404] r3 : 00002000  r2 : 0002f000  r1 : eef92cf0  r0 : c3694000
>>  ...
>> 23:21:07.068237  [    7.880215] [<c09a8700>] (drm_plane_register_all) from [<c09ab240>] (drm_modeset_register_all+0xc/0x6c)
>> 23:21:07.068493  [    7.889603] [<c09ab240>] (drm_modeset_register_all) from [<c0992054>] (drm_dev_register+0x16c/0x1c4)
>> 23:21:07.109960  [    7.898915] [<c0992054>] (drm_dev_register) from [<bf0ec0d8>] (nouveau_platform_probe+0x54/0x8c [nouveau])
>> 23:21:07.110285  [    7.908750] [<bf0ec0d8>] (nouveau_platform_probe [nouveau]) from [<c0a45968>] (platform_drv_probe+0x48/0x98)
>> 23:21:07.110515  [    7.918572] [<c0a45968>] (platform_drv_probe) from [<c0a43bd8>] (really_probe+0x228/0x2d0)
>> 23:21:07.110706  [    7.926832] [<c0a43bd8>] (really_probe) from [<c0a43de4>] (driver_probe_device+0x60/0x174)
>> 23:21:07.110893  [    7.935093] [<c0a43de4>] (driver_probe_device) from [<c0a43fc8>] (__driver_attach+0xd0/0xd4)
>> 23:21:07.153794  [    7.943528] [<c0a43fc8>] (__driver_attach) from [<c0a41e8c>] (bus_for_each_dev+0x74/0xb4)
>> 23:21:07.154133  [    7.951688] [<c0a41e8c>] (bus_for_each_dev) from [<c0a42ff0>] (bus_add_driver+0x18c/0x210)
>> 23:21:07.154352  [    7.959946] [<c0a42ff0>] (bus_add_driver) from [<c0a44b24>] (driver_register+0x74/0x108)
>> 23:21:07.154544  [    7.968212] [<c0a44b24>] (driver_register) from [<bf1bb170>] (nouveau_drm_init+0x170/0x1000 [nouveau])
>> 23:21:07.154739  [    7.977692] [<bf1bb170>] (nouveau_drm_init [nouveau]) from [<c0402d6c>] (do_one_initcall+0x54/0x1fc)
>> 23:21:07.197008  [    7.986820] [<c0402d6c>] (do_one_initcall) from [<c04d276c>] (do_init_module+0x64/0x1f4)
>> 23:21:07.197344  [    7.994906] [<c04d276c>] (do_init_module) from [<c04d1980>] (load_module+0x1ee8/0x23c8)
>> 23:21:07.197553  [    8.002907] [<c04d1980>] (load_module) from [<c04d2080>] (sys_finit_module+0xac/0xd8)
>> 23:21:07.197751  [    8.010722] [<c04d2080>] (sys_finit_module) from [<c0401000>] (ret_fast_syscall+0x0/0x4c)
>> 23:21:07.197935  [    8.018884] Exception stack(0xc3451fa8 to 0xc3451ff0)
>>
>>
>> Both PC and LR are pointing to drm_* code. I don't see this anyway related to
>> uprobes. Did I miss anything?
>>
> 
> The bot sometimes gets confused during the bisect. This looks to be one
> of those times. I'd simply ignore it because the code path of the
> commit it points out is obviously never hit.
> 
> The bug may be a race condition that will cause havoc with automated
> bisects.

Update: It turns out this was in fact the result of some network
infrastructure issue in the test lab.  There are checks at the
end of the bisection, to verify that the "breaking" revision does
fail to boot 3 times in a row and then succeed to boot 3 times in
a row after reverting the change.  As unlikely as it sounds,
downloading the kernel binary failed 3 times for the "bad" checks
and succeeded 3 times for the "good" checks... (probably caused
by caching).  All the logs can be found here:

   http://lava.baylibre.com:10080/scheduler/alljobs?length=25&search=lava-bisect-11491#table

There's a fix coming to avoid this issue in the future and
discard lab infrastructure errors.  Sorry for the noise.

Guillaume