[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c24a174d-c8f3-4267-87ae-cf77fa587e82@lucifer.local>
Date: Mon, 28 Oct 2024 18:57:25 +0000
From: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
To: Mark Brown <broonie@...nel.org>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
"Liam R . Howlett" <Liam.Howlett@...cle.com>,
Vlastimil Babka <vbabka@...e.cz>, Jann Horn <jannh@...gle.com>,
linux-kernel@...r.kernel.org, linux-mm@...ck.org,
Linus Torvalds <torvalds@...ux-foundation.org>,
Peter Xu <peterx@...hat.com>, linux-arm-kernel@...ts.infradead.org,
Catalin Marinas <catalin.marinas@....com>,
Will Deacon <will@...nel.org>, Aishwarya TCV <Aishwarya.TCV@....com>
Subject: Re: [PATCH hotfix 6.12 v2 4/8] mm: resolve faulty mmap_region()
error path behaviour
On Mon, Oct 28, 2024 at 06:29:36PM +0000, Mark Brown wrote:
> On Wed, Oct 23, 2024 at 09:38:29PM +0100, Lorenzo Stoakes wrote:
> > The mmap_region() function is somewhat terrifying, with spaghetti-like
> > control flow and numerous means by which issues can arise and incomplete
> > state, memory leaks and other unpleasantness can occur.
>
> Today's pending-fixes is showing a fairly large set of failures in the
> arm64 MTE selftests on all the platforms that have MTE (currently just
> the software ones). Bisection points at this change which is
> 0967bf7fbd0e0 in -next which seems plausible but I didn't investigate in
> any meaingful detail. There's nothing particularly instructive in the
> test logs, just plain reports that the tests failed:
Ugh yep ok. Thanks for the report, this is likely then because MTE relies in
some way on merge behaviour or the ->mmap() hook in an unfortunate way that we
haven't accounted for here.
Bad time for my arm64 qemu to be broken :)
Would it be possible for you to assist me with investigating this a little as
you have things pretty well set up?
On these memory allocation failures, could you tell me what errno is? Could you
check dmesg for anything strange?
>
> # # FAIL: mmap allocation
Interesting that it MAP_FAIL's though. This could be arch_validate_flags() being
moved around.
Could you do me a further favour then and try a kernel at this commit with:
/* Allow architectures to sanity-check the vm_flags. */
if (!arch_validate_flags(vm_flags))
return -EINVAL;
In mmap_region() commented out?
That and the errno would be hugely useful information thank you!
Wondering if somehow the driver hook changes flags that makes the arch validate
flags pass but not with the original flags.
OK looking at thet source 99% certain it's the move of this check, as arm64 in
its hook for this does:
/* only allow VM_MTE if VM_MTE_ALLOWED has been set previously */
return !(vm_flags & VM_MTE) || (vm_flags & VM_MTE_ALLOWED);
So likely hook on your mapping changes flags to set VM_MTE | VM_MTE_ALLOWED and
expects this to be checked after (ugh).
Thanks!
> # # FAIL: memory allocation
> # not ok 17 Check initial tags with private mapping, sync error mode and mmap memory
> # ok 18 Check initial tags with private mapping, sync error mode and mmap/mprotect memory
> # # FAIL: mmap allocation
> # # FAIL: memory allocation
> # not ok 19 Check initial tags with shared mapping, sync error mode and mmap memory
> # ok 20 Check initial tags with shared mapping, sync error mode and mmap/mprotect memory
> # # Totals: pass:18 fail:2 xfail:0 xpass:0 skip:0 error:0
> not ok 42 selftests: arm64: check_buffer_fill # exit=1
>
> (and more, mainly on mmap related things). A full log for a sample run
> on the FVP can be seen at:
>
> https://lava.sirena.org.uk/scheduler/job/901638#L3693
>
> and one from qemu here:
>
> https://lava.sirena.org.uk/scheduler/job/901630#L3031
>
> Both of these logs include links to filesystem/firmware images and
> command lines to run the model.
>
> Bisects converge cleanly (there's some random extra good commits logged
> at the start as my tooling feeds test results it already has on hand
> between the good and bad commits into the bisect):
>
> # bad: [6560005f01c3c14aab4c2ce35d97b75796d33d81] Merge branch 'for-linux-next-fixes' of https://gitlab.freedesktop.org/drm/misc/kernel.git
> # good: [ea1fda89f5b23734e10c62762990120d5ae23c43] Merge tag 'x86_urgent_for_v6.12_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> # good: [6668610b4d8ce9a3ee3ed61a9471f62fb5f05bf9] ASoC: Intel: sst: Support LPE0F28 ACPI HID
> # good: [2feb023110843acce790e9089e72e9a9503d9fa5] regulator: rtq2208: Fix uninitialized use of regulator_config
> # good: [0107f28f135231da22a9ad5756bb16bd5cada4d5] ASoC: Intel: bytcr_rt5640: Add DMI quirk for Vexia Edu Atla 10 tablet
> # good: [25f00a13dccf8e45441265768de46c8bf58e08f6] spi: spi-fsl-dspi: Fix crash when not using GPIO chip select
> # good: [032532f91a1d06d0750f16c49a9698ef5374a68f] ASoC: codecs: rt5640: Always disable IRQs from rt5640_cancel_work()
> # good: [d48696b915527b5bcdd207a299aec03fb037eb17] ASoC: Intel: bytcr_rt5640: Add support for non ACPI instantiated codec
> # good: [d0ccf760a405d243a49485be0a43bd5b66ed17e2] spi: geni-qcom: Fix boot warning related to pm_runtime and devres
> # good: [f2b5b8201b1545ef92e050735e9c768010d497aa] spi: mtk-snfi: fix kerneldoc for mtk_snand_is_page_ops()
> # good: [b5a468199b995bd8ee3c26f169a416a181210c9e] spi: stm32: fix missing device mode capability in stm32mp25
> git bisect start '6560005f01c3c14aab4c2ce35d97b75796d33d81' 'ea1fda89f5b23734e10c62762990120d5ae23c43' '6668610b4d8ce9a3ee3ed61a9471f62fb5f05bf9' '2feb023110843acce790e9089e72e9a9503d9fa5' '0107f28f135231da22a9ad5756bb16bd5cada4d5' '25f00a13dccf8e45441265768de46c8bf58e08f6' '032532f91a1d06d0750f16c49a9698ef5374a68f' 'd48696b915527b5bcdd207a299aec03fb037eb17' 'd0ccf760a405d243a49485be0a43bd5b66ed17e2' 'f2b5b8201b1545ef92e050735e9c768010d497aa' 'b5a468199b995bd8ee3c26f169a416a181210c9e'
> # bad: [6560005f01c3c14aab4c2ce35d97b75796d33d81] Merge branch 'for-linux-next-fixes' of https://gitlab.freedesktop.org/drm/misc/kernel.git
> git bisect bad 6560005f01c3c14aab4c2ce35d97b75796d33d81
> # bad: [4a2901b5d394f58cdc60bc25e32c381bb2b83891] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless.git
> git bisect bad 4a2901b5d394f58cdc60bc25e32c381bb2b83891
> # bad: [4093d34d740447b23a1ea916dabcf902aa767812] Merge branch 'fs-current' of linux-next
> git bisect bad 4093d34d740447b23a1ea916dabcf902aa767812
> # bad: [0967bf7fbd0e03cee0525035762150a91ba1bb7c] mm: resolve faulty mmap_region() error path behaviour
> git bisect bad 0967bf7fbd0e03cee0525035762150a91ba1bb7c
> # good: [633e7df6cfdf97f8acf2a59fbfead01e31d0e492] tools: testing: add expand-only mode VMA test
> git bisect good 633e7df6cfdf97f8acf2a59fbfead01e31d0e492
> # good: [315add1ace71306a7d8518fd417466d938041ff1] mseal: update mseal.rst
> git bisect good 315add1ace71306a7d8518fd417466d938041ff1
> # good: [bcbb8b25ab80347994e33c358481e65f95f665fd] mm: fix PSWPIN counter for large folios swap-in
> git bisect good bcbb8b25ab80347994e33c358481e65f95f665fd
> # good: [8438cf67b86bf8c966f32612a7e12b2eb910396b] mm: unconditionally close VMAs on error
> git bisect good 8438cf67b86bf8c966f32612a7e12b2eb910396b
> # good: [a220e219d89c2d574ad9ffda627575e11334fede] mm: refactor map_deny_write_exec()
> git bisect good a220e219d89c2d574ad9ffda627575e11334fede
> # first bad commit: [0967bf7fbd0e03cee0525035762150a91ba1bb7c] mm: resolve faulty mmap_region() error path behaviour
Powered by blists - more mailing lists