lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 10 Jun 2021 19:59:36 +0200
From:   Christian König <christian.koenig@....com>
To:     Ondrej Zary <linux@...y.sk>
Cc:     Ben Skeggs <bskeggs@...hat.com>, dri-devel@...ts.freedesktop.org,
        nouveau@...ts.freedesktop.org, linux-kernel@...r.kernel.org
Subject: Re: nouveau broken on Riva TNT2 in 5.13.0-rc4: NULL pointer
 dereference in nouveau_bo_sync_for_device

Am 10.06.21 um 19:50 schrieb Ondrej Zary:
> On Thursday 10 June 2021 08:43:06 Christian König wrote:
>> Am 09.06.21 um 22:00 schrieb Ondrej Zary:
>>> On Wednesday 09 June 2021 11:21:05 Christian König wrote:
>>>> Am 09.06.21 um 09:10 schrieb Ondrej Zary:
>>>>> On Wednesday 09 June 2021, Christian König wrote:
>>>>>> Am 09.06.21 um 08:57 schrieb Ondrej Zary:
>>>>>>> [SNIP]
>>>>>>>> Thanks for the heads up. So the problem with my patch is already fixed,
>>>>>>>> isn't it?
>>>>>>> The NULL pointer dereference in nouveau_bo_wr16 introduced in
>>>>>>> 141b15e59175aa174ca1f7596188bd15a7ca17ba was fixed by
>>>>>>> aea656b0d05ec5b8ed5beb2f94c4dd42ea834e9d.
>>>>>>>
>>>>>>> That's the bug I hit when bisecting the original problem:
>>>>>>> NULL pointer dereference in nouveau_bo_sync_for_device
>>>>>>> It's caused by:
>>>>>>> # first bad commit: [e34b8feeaa4b65725b25f49c9b08a0f8707e8e86] drm/ttm: merge ttm_dma_tt back into ttm_tt
>>>>>> Good that I've asked :)
>>>>>>
>>>>>> Ok that's a bit strange. e34b8feeaa4b65725b25f49c9b08a0f8707e8e86 was
>>>>>> created mostly automated.
>>>>>>
>>>>>> Do you have the original backtrace of that NULL pointer deref once more?
>>>>> The original backtrace is here: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flkml.org%2Flkml%2F2021%2F6%2F5%2F350&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7C657222345e3242e7a6a608d92c383f66%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637589442963348551%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=ZkJs%2FR8MeQKUxwhJUC%2FG4Hi3T%2FMIftt%2FWRh%2B1%2BU5rUE%3D&amp;reserved=0
>>>> And the problem is that ttm_dma->dma_address is NULL, right? Mhm, I
>>>> don't see how that can happen since nouveau is using ttm_sg_tt_init().
>>>>
>>>> Apart from that what nouveau does here is rather questionable since you
>>>> need a coherent architecture for most things anyway, but that's not what
>>>> we are trying to fix here.
>>>>
>>>> Can you try to narrow down if ttm_sg_tt_init is called before calling
>>>> this function for the tt object in question?
>>> ttm_sg_tt_init is not called:
>>> [   12.150124] nouveau 0000:01:00.0: DRM: VRAM: 31 MiB
>>> [   12.150133] nouveau 0000:01:00.0: DRM: GART: 128 MiB
>>> [   12.150143] nouveau 0000:01:00.0: DRM: BMP version 5.6
>>> [   12.150151] nouveau 0000:01:00.0: DRM: No DCB data found in VBIOS
>>> [   12.151362] ttm_tt_init
>>> [   12.151370] ttm_tt_init_fields
>>> [   12.151374] ttm_tt_alloc_page_directory
>>> [   12.151615] BUG: kernel NULL pointer dereference, address: 00000000
>> Please add dump_stack(); to ttm_tt_init() and report back with the
>> backtrace.
>>
>> I can't see how this is called from the nouveau code, only possibility I
>> see is that it is maybe called through the AGP code somehow.
> Yes, you're right:
> [   13.192663] Call Trace:
> [   13.192678]  dump_stack+0x54/0x68
> [   13.192690]  ttm_tt_init+0x11/0x8a [ttm]
> [   13.192699]  ttm_agp_tt_create+0x39/0x51 [ttm]
> [   13.192840]  nouveau_ttm_tt_create+0x17/0x22 [nouveau]
> [   13.192856]  ttm_tt_create+0x78/0x8c [ttm]
> [   13.192864]  ttm_bo_handle_move_mem+0x7d/0xca [ttm]
> [   13.192873]  ttm_bo_validate+0x92/0xc8 [ttm]
> [   13.192883]  ttm_bo_init_reserved+0x216/0x243 [ttm]
> [   13.192892]  ttm_bo_init+0x45/0x65 [ttm]
> [   13.193018]  ? nouveau_bo_del_io_reserve_lru+0x48/0x48 [nouveau]
> [   13.193150]  nouveau_bo_init+0x8c/0x94 [nouveau]
> [   13.193273]  ? nouveau_bo_del_io_reserve_lru+0x48/0x48 [nouveau]
> [   13.193407]  nouveau_bo_new+0x44/0x57 [nouveau]
> [   13.193537]  nouveau_channel_prep+0xa3/0x269 [nouveau]
> [   13.193665]  nouveau_channel_new+0x3c/0x5f7 [nouveau]
> [   13.193679]  ? slab_free_freelist_hook+0x3b/0xa7
> [   13.193686]  ? kfree+0x9e/0x11a
> [   13.193781]  ? nvif_object_sclass_put+0xd/0x16 [nouveau]
> [   13.193908]  nouveau_drm_device_init+0x2e2/0x646 [nouveau]
> [   13.193924]  ? pci_enable_device_flags+0x1e/0xac
> [   13.194052]  nouveau_drm_probe+0xeb/0x188 [nouveau]
> [   13.194182]  ? nouveau_drm_device_init+0x646/0x646 [nouveau]
> [   13.194195]  pci_device_probe+0x89/0xe9
> [   13.194205]  really_probe+0x127/0x2a7
> [   13.194212]  driver_probe_device+0x5b/0x87
> [   13.194219]  device_driver_attach+0x2e/0x41
> [   13.194226]  __driver_attach+0x7c/0x83
> [   13.194232]  bus_for_each_dev+0x4c/0x66
> [   13.194238]  driver_attach+0x14/0x16
> [   13.194244]  ? device_driver_attach+0x41/0x41
> [   13.194251]  bus_add_driver+0xc5/0x16c
> [   13.194258]  driver_register+0x87/0xb9
> [   13.194265]  __pci_register_driver+0x38/0x3b
> [   13.194271]  ? 0xf0c0d000
> [   13.194362]  nouveau_drm_init+0x14c/0x1000 [nouveau]
>
> How is ttm_dma_tt->dma_address allocated?

Mhm, I need to double check how AGP is supposed to work.

Since barely anybody is using it these days it is something which breaks 
from time to time.

Thanks for the backtrace,
Christian.

>   I cannot find any assignment
> executed (in the working code):
>
> $ git grep dma_address\ = drivers/gpu/
> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c:       sg->sgl->dma_address = addr;
> drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c:                dma_address = &dma->dma_address[offset >> PAGE_SHIFT];
> drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c:                dma_address = (mm_node->start << PAGE_SHIFT) + offset;
> drivers/gpu/drm/i915/gvt/scheduler.c:   sg->dma_address = addr;
> drivers/gpu/drm/i915/i915_gpu_error.c:  sg->dma_address = it;
> drivers/gpu/drm/ttm/ttm_tt.c:   ttm->dma_address = (void *) (ttm->ttm.pages + ttm->ttm.num_pages);
> drivers/gpu/drm/ttm/ttm_tt.c:   ttm->dma_address = kvmalloc_array(ttm->ttm.num_pages,
> drivers/gpu/drm/ttm/ttm_tt.c:   ttm_dma->dma_address = NULL;
> drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c:             viter->dma_address = &__vmw_piter_phys_addr;
> drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c:             viter->dma_address = &__vmw_piter_dma_addr;
> drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c:             viter->dma_address = &__vmw_piter_sg_addr;
>
> The 2 cases in ttm_tt.c are in ttm_dma_tt_alloc_page_directory() and
> ttm_sg_tt_alloc_page_directory().
> Confirmed by adding printk()s that they're NOT called.
>
>

Powered by blists - more mailing lists