lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <f5b5ecc6-c0d5-5dac-314a-00799980068d@amd.com>
Date:   Mon, 14 Jun 2021 13:07:17 +0200
From:   Christian König <christian.koenig@....com>
To:     Ondrej Zary <linux@...y.sk>
Cc:     Ben Skeggs <bskeggs@...hat.com>, dri-devel@...ts.freedesktop.org,
        nouveau@...ts.freedesktop.org, linux-kernel@...r.kernel.org
Subject: Re: nouveau broken on Riva TNT2 in 5.13.0-rc4: NULL pointer
 dereference in nouveau_bo_sync_for_device



Am 11.06.21 um 20:23 schrieb Ondrej Zary:
> On Friday 11 June 2021 14:38:18 Christian König wrote:
>> Am 10.06.21 um 19:59 schrieb Christian König:
>>> Am 10.06.21 um 19:50 schrieb Ondrej Zary:
>>>> [SNIP]
>>>>> I can't see how this is called from the nouveau code, only
>>>>> possibility I
>>>>> see is that it is maybe called through the AGP code somehow.
>>>> Yes, you're right:
>>>> [   13.192663] Call Trace:
>>>> [   13.192678]  dump_stack+0x54/0x68
>>>> [   13.192690]  ttm_tt_init+0x11/0x8a [ttm]
>>>> [   13.192699]  ttm_agp_tt_create+0x39/0x51 [ttm]
>>>> [   13.192840]  nouveau_ttm_tt_create+0x17/0x22 [nouveau]
>>>> [   13.192856]  ttm_tt_create+0x78/0x8c [ttm]
>>>> [   13.192864]  ttm_bo_handle_move_mem+0x7d/0xca [ttm]
>>>> [   13.192873]  ttm_bo_validate+0x92/0xc8 [ttm]
>>>> [   13.192883]  ttm_bo_init_reserved+0x216/0x243 [ttm]
>>>> [   13.192892]  ttm_bo_init+0x45/0x65 [ttm]
>>>> [   13.193018]  ? nouveau_bo_del_io_reserve_lru+0x48/0x48 [nouveau]
>>>> [   13.193150]  nouveau_bo_init+0x8c/0x94 [nouveau]
>>>> [   13.193273]  ? nouveau_bo_del_io_reserve_lru+0x48/0x48 [nouveau]
>>>> [   13.193407]  nouveau_bo_new+0x44/0x57 [nouveau]
>>>> [   13.193537]  nouveau_channel_prep+0xa3/0x269 [nouveau]
>>>> [   13.193665]  nouveau_channel_new+0x3c/0x5f7 [nouveau]
>>>> [   13.193679]  ? slab_free_freelist_hook+0x3b/0xa7
>>>> [   13.193686]  ? kfree+0x9e/0x11a
>>>> [   13.193781]  ? nvif_object_sclass_put+0xd/0x16 [nouveau]
>>>> [   13.193908]  nouveau_drm_device_init+0x2e2/0x646 [nouveau]
>>>> [   13.193924]  ? pci_enable_device_flags+0x1e/0xac
>>>> [   13.194052]  nouveau_drm_probe+0xeb/0x188 [nouveau]
>>>> [   13.194182]  ? nouveau_drm_device_init+0x646/0x646 [nouveau]
>>>> [   13.194195]  pci_device_probe+0x89/0xe9
>>>> [   13.194205]  really_probe+0x127/0x2a7
>>>> [   13.194212]  driver_probe_device+0x5b/0x87
>>>> [   13.194219]  device_driver_attach+0x2e/0x41
>>>> [   13.194226]  __driver_attach+0x7c/0x83
>>>> [   13.194232]  bus_for_each_dev+0x4c/0x66
>>>> [   13.194238]  driver_attach+0x14/0x16
>>>> [   13.194244]  ? device_driver_attach+0x41/0x41
>>>> [   13.194251]  bus_add_driver+0xc5/0x16c
>>>> [   13.194258]  driver_register+0x87/0xb9
>>>> [   13.194265]  __pci_register_driver+0x38/0x3b
>>>> [   13.194271]  ? 0xf0c0d000
>>>> [   13.194362]  nouveau_drm_init+0x14c/0x1000 [nouveau]
>>>>
>>>> How is ttm_dma_tt->dma_address allocated?
>>> Mhm, I need to double check how AGP is supposed to work.
>>>
>>> Since barely anybody is using it these days it is something which
>>> breaks from time to time.
>> I have no idea how that ever worked in the first place since AGP isn't
>> supposed to sync between CPU/GPU. Everything is coherent for that case.
>>
>> Anyway here is a patch which adds a check to those functions if the
>> dma_address array is allocated in the first place. Please test it.
> Thanks, the patch fixes the problem and nouveau now works!
> Should be applied to 5.12-stable too (5.11 is affected too but EOL).

I will just add a CC stable tag before pushing.

>
> It's weird that it worked before.
> Looks like dma_address was used uninitialized - it contained some random
> crap:
> [   12.293304] nouveau_bo_sync_for_device: ttm_dma->dma_address=3e055971 ttm_dma->ttm.num_pages=18
> [   12.293321] ttm_dma->dma_address[0]=0x0
> [   12.293341] ttm_dma->dma_address[1]=0x0
> [   12.293360] ttm_dma->dma_address[2]=0xee728980
> [   12.293379] ttm_dma->dma_address[3]=0xed1cb120
> [   12.293397] ttm_dma->dma_address[4]=0x12
> [   12.293416] ttm_dma->dma_address[5]=0x0
> [   12.293434] ttm_dma->dma_address[6]=0x1
> [   12.293453] ttm_dma->dma_address[7]=0x0
> [   12.293471] ttm_dma->dma_address[8]=0x10000
> [   12.293490] ttm_dma->dma_address[9]=0x0
> [   12.293510] ttm_dma->dma_address[10]=0x101
> [   12.293528] ttm_dma->dma_address[11]=0xee7289ec
> [   12.293546] ttm_dma->dma_address[12]=0xee7289ec
> [   12.293564] ttm_dma->dma_address[13]=0x0
> [   12.293581] ttm_dma->dma_address[14]=0x0
> [   12.293599] ttm_dma->dma_address[15]=0x0
> [   12.293616] ttm_dma->dma_address[16]=0x0
> [   12.293634] ttm_dma->dma_address[17]=0x0
> But it did not matter as dma_sync_single_for_device is a no-op here.
> When dma_address is properly initialized to NULL, it crashes...

Ok that explains things, but essentially means that this only worked by 
coincident.

Just send out the patch to Ben, the list and you once more. Please reply 
with a rb, ak-by and/or tested-by so that I can push it ASAP.

Thanks,
Christian.

>
>> Thanks,
>> Christian.
>>
>>> Thanks for the backtrace,
>>> Christian.
>>>
>>>>    I cannot find any assignment
>>>> executed (in the working code):
>>>>
>>>> $ git grep dma_address\ = drivers/gpu/
>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c:
>>>> sg->sgl->dma_address = addr;
>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c: dma_address =
>>>> &dma->dma_address[offset >> PAGE_SHIFT];
>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c: dma_address =
>>>> (mm_node->start << PAGE_SHIFT) + offset;
>>>> drivers/gpu/drm/i915/gvt/scheduler.c:   sg->dma_address = addr;
>>>> drivers/gpu/drm/i915/i915_gpu_error.c:  sg->dma_address = it;
>>>> drivers/gpu/drm/ttm/ttm_tt.c:   ttm->dma_address = (void *)
>>>> (ttm->ttm.pages + ttm->ttm.num_pages);
>>>> drivers/gpu/drm/ttm/ttm_tt.c:   ttm->dma_address =
>>>> kvmalloc_array(ttm->ttm.num_pages,
>>>> drivers/gpu/drm/ttm/ttm_tt.c:   ttm_dma->dma_address = NULL;
>>>> drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c: viter->dma_address =
>>>> &__vmw_piter_phys_addr;
>>>> drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c: viter->dma_address =
>>>> &__vmw_piter_dma_addr;
>>>> drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c: viter->dma_address =
>>>> &__vmw_piter_sg_addr;
>>>>
>>>> The 2 cases in ttm_tt.c are in ttm_dma_tt_alloc_page_directory() and
>>>> ttm_sg_tt_alloc_page_directory().
>>>> Confirmed by adding printk()s that they're NOT called.
>>>>
>>>>
>>
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ