[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5a314713-c1ee-2d34-bee1-60beae274742@huawei.com>
Date: Mon, 23 Nov 2020 15:45:55 +0000
From: John Garry <john.garry@...wei.com>
To: Marc Zyngier <maz@...nel.org>
CC: Thomas Gleixner <tglx@...utronix.de>, <gregkh@...uxfoundation.org>,
<rafael@...nel.org>, <martin.petersen@...cle.com>,
<jejb@...ux.ibm.com>, <linuxarm@...wei.com>,
<linux-scsi@...r.kernel.org>, <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v2 1/3] genirq/affinity: Add irq_update_affinity_desc()
Hi Marc,
>> So is there a reason for which irq dispose mapping is not a
>> requirement for drivers when finished with the irq? because of shared
>> interrupts?
>
> For a bunch of reasons: IRQ number used to be created 1:1 with their
> physical counterpart, so there wasn't a need to "get rid" of the
> associated data structures. Also, people expected their drivers
> to be there for the lifetime of the system (believe it or not,
> hotplug devices are pretty new!).
>
> Shared interrupts are just another part of the problem (although we
> should be able to work out whether there is any action attached to
> the descriptor before blowing it away.
>
OK, understood.
>> But it looks like there is more to it than that, which I'm worried is
>> far from non-trivial. For example, just calling irq_dispose_mapping()
>> for removal and then plaform_get_irq()->acpi_get_irq() second time
>> fails as it looks like more tidy-up is needed for removal...
>
> Most probably. I could imagine things failing if there is any trace
> of an existing translation in the ITS or in the platform-MSI layer,
> for example, or if the interrupt is still active...
So this looks to be a problem I have. So if I hack the code to skip the
check in acpi_get_irq() for the irq already being init'ed, I run into a
use-after-free in the gic-v3-its driver. I may be skipping something
with this hack, but I'll ask anyway.
So initially in the msi_prepare method we setup the its dev - this is
from the mbigen probe. Then when all the irqs are unmapped later for end
device driver removal, we release this its device in
its_irq_domain_free(). But I don't see anything to set it up again. Is
it improper to have released the its device in this scenario? Commenting
out the release makes things "good" again.
Thanks,
john
Complete splat:
[ 35.751627]
==================================================================
[ 35.758892] BUG: KASAN: use-after-free in its_irq_domain_alloc+0x44/0x1a8
[ 35.765714] Read of size 8 at addr ffff04101bc93210 by task swapper/0/1
[ 35.772357]
[ 35.773854] CPU: 1 PID: 1 Comm: swapper/0 Not tainted
5.10.0-rc4-18055-g3247a1b07719 #955
[ 35.782072] Hardware name: Huawei Taishan 2280 /D05, BIOS Hisilicon
D05 IT21 Nemo 2.0 RC0 04/18/2018
[ 35.791250] Call trace:
[ 35.793706] dump_backtrace+0x0/0x2d0
[ 35.797384] show_stack+0x18/0x68
[ 35.800713] dump_stack+0x100/0x16c
[ 35.804216] print_address_description.constprop.12+0x6c/0x4e8
[ 35.810078] kasan_report+0x130/0x200
[ 35.813755] __asan_load8+0x9c/0xd8
[ 35.817257] its_irq_domain_alloc+0x44/0x1a8
[ 35.821548] irq_domain_alloc_irqs_parent+0x68/0x88
[ 35.826447] msi_domain_alloc+0x98/0x180
[ 35.830387] irq_domain_alloc_irqs_hierarchy+0x58/0x78
[ 35.835549] msi_domain_populate_irqs+0x16c/0x1d8
[ 35.840275] platform_msi_domain_alloc+0x9c/0xd0
[ 35.844914] mbigen_irq_domain_alloc+0x100/0x180
[ 35.849553] __irq_domain_alloc_irqs+0x188/0x498
[ 35.854191] irq_create_fwspec_mapping+0x21c/0x4e0
[ 35.859005] acpi_irq_get+0x1f0/0x218
[ 35.862683] platform_get_irq_optional+0x1c4/0x4a0
[ 35.867495] platform_get_irq+0x20/0x68
[ 35.871348] hisi_sas_v2_probe+0x2c/0x88
[ 35.875287] platform_drv_probe+0x70/0xd0
[ 35.879313] really_probe+0x414/0x640
[ 35.882990] driver_probe_device+0x78/0xe8
[ 35.887103] device_driver_attach+0x9c/0xa8
[ 35.891304] __driver_attach+0x74/0x120
[ 35.895156] bus_for_each_dev+0xec/0x160
[ 35.899094] driver_attach+0x34/0x48
[ 35.902684] bus_add_driver+0x1b8/0x2c0
[ 35.906535] driver_register+0xc0/0x1e0
[ 35.910386] __platform_driver_register+0x80/0x90
[ 35.915115] hisi_sas_v2_driver_init+0x1c/0x28
[ 35.919578] do_one_initcall+0xd4/0x268
[ 35.923431] kernel_init_freeable+0x270/0x2d8
[ 35.927807] kernel_init+0x14/0x124
[ 35.931309] ret_from_fork+0x10/0x34
[ 35.934895]
[ 35.936386] Allocated by task 1:
[ 35.939628] stack_trace_save+0x94/0xc8
[ 35.943480] kasan_save_stack+0x28/0x58
[ 35.947331] __kasan_kmalloc.isra.6+0xcc/0xf0
[ 35.951706] kasan_kmalloc+0x10/0x20
[ 35.955295] its_create_device+0x1b8/0x448
[ 35.959408] its_msi_prepare+0x120/0x1d8
[ 35.963347] its_pmsi_prepare+0x20c/0x280
[ 35.967373] msi_domain_prepare_irqs+0x80/0x98
[ 35.971836] __platform_msi_create_device_domain+0xa8/0x130
[ 35.977435] mbigen_device_probe+0x298/0x340
[ 35.981723] platform_drv_probe+0x70/0xd0
[ 35.985748] really_probe+0x414/0x640
[ 35.989424] driver_probe_device+0x78/0xe8
[ 35.993537] device_driver_attach+0x9c/0xa8
[ 35.997737] __driver_attach+0x74/0x120
[ 36.001589] bus_for_each_dev+0xec/0x160
[ 36.005527] driver_attach+0x34/0x48
[ 36.009118] bus_add_driver+0x1b8/0x2c0
[ 36.012968] driver_register+0xc0/0x1e0
[ 36.016819] __platform_driver_register+0x80/0x90
[ 36.021545] mbigen_platform_driver_init+0x1c/0x28
[ 36.026356] do_one_initcall+0xd4/0x268
[ 36.030207] kernel_init_freeable+0x270/0x2d8
[ 36.034583] kernel_init+0x14/0x124
[ 36.038084] ret_from_fork+0x10/0x34
[ 36.041671]
[ 36.043162] Freed by task 1:
[ 36.046053] stack_trace_save+0x94/0xc8
[ 36.049904] kasan_save_stack+0x28/0x58
[ 36.053754] kasan_set_track+0x28/0x40
[ 36.057518] kasan_set_free_info+0x24/0x48
[ 36.061631] __kasan_slab_free+0x104/0x188
[ 36.065744] kasan_slab_free+0x14/0x20
[ 36.069510] slab_free_freelist_hook+0x8c/0x190
[ 36.074060] kfree+0x308/0x448
[ 36.077125] its_irq_domain_free+0x31c/0x338
[ 36.081414] irq_domain_free_irqs_common+0xd8/0x128
[ 36.086314] irq_domain_free_irqs_top+0x70/0x88
[ 36.090864] msi_domain_free+0xa8/0xc8
[ 36.094629] irq_domain_free_irqs_common+0xd8/0x128
[ 36.099528] platform_msi_domain_free+0xdc/0x1b0
[ 36.104167] mbigen_irq_domain_free+0x10/0x20
[ 36.108543] irq_domain_free_irqs+0x184/0x208
[ 36.112918] irq_dispose_mapping+0x54/0x98
[ 36.117033] devm_platform_get_irqs_affinity_release+0xbc/0xdc
[ 36.122893] release_nodes+0x350/0x3e8
[ 36.126657] devres_release_all+0x54/0x78
[ 36.130683] really_probe+0x234/0x640
[ 36.134359] driver_probe_device+0x78/0xe8
[ 36.138473] device_driver_attach+0x9c/0xa8
[ 36.142673] __driver_attach+0x74/0x120
[ 36.146524] bus_for_each_dev+0xec/0x160
[ 36.150462] driver_attach+0x34/0x48
[ 36.154052] bus_add_driver+0x1b8/0x2c0
[ 36.157903] driver_register+0xc0/0x1e0
[ 36.161754] __platform_driver_register+0x80/0x90
[ 36.166479] hisi_sas_v2_driver_init+0x1c/0x28
[ 36.170942] do_one_initcall+0xd4/0x268
[ 36.174793] kernel_init_freeable+0x270/0x2d8
[ 36.179168] kernel_init+0x14/0x124
[ 36.182670] ret_from_fork+0x10/0x34
[ 36.186256]
[ 36.187749] The buggy address belongs to the object at ffff04101bc93200
[ 36.187749] which belongs to the cache kmalloc-128 of size 128
[ 36.200335] The buggy address is located 16 bytes inside of
[ 36.200335] 128-byte region [ffff04101bc93200, ffff04101bc93280)
[ 36.212046] The buggy address belongs to the page:
[ 36.216864] page:(____ptrval____) refcount:1 mapcount:0
mapping:0000000000000000 index:0x0 pfn:0x4101bc92
[ 36.226479] head:(____ptrval____) order:1 compound_mapcount:0
[ 36.232253] flags: 0x2bfffc0000010200(slab|head)
[ 36.236895] raw: 2bfffc0000010200 dead000000000100 dead000000000122
ffff00104000fc00
[ 36.244679] raw: 0000000000000000 0000000000200020 00000001ffffffff
0000000000000000
[ 36.252459] page dumped because: kasan: bad access detected
[ 36.258055]
[ 36.259545] Memory state around the buggy address:
[ 36.264358] ffff04101bc93100: fa fb fb fb fb fb fb fb fb fb fb fb fb
fb fb fb
[ 36.271616] ffff04101bc93180: fc fc fc fc fc fc fc fc fc fc fc fc fc
fc fc fc
[ 36.278874] >ffff04101bc93200: fa fb fb fb fb fb fb fb fb fb fb fb fb
fb fb fb
[ 36.286130] ^
[ 36.289893] ffff04101bc93280: fc fc fc fc fc fc fc fc fc fc fc fc fc
fc fc fc
[ 36.297151] ffff04101bc93300: fa fb fb fb fb fb fb fb fb fb fb fb fb
fb fb fb
[ 36.304406]
==================================================================
[ 36.311663] Disabling lock debugging due to kernel taint
Powered by blists - more mailing lists