linux-kernel - Re: [PATCH v2 0/5] md: fix uaf for sync

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <606b1388-10e7-a0ae-f314-52274b0942dd@deltatee.com>
Date:   Wed, 15 Mar 2023 16:55:56 -0600
From:   Logan Gunthorpe <logang@...tatee.com>
To:     Paul Menzel <pmenzel@...gen.mpg.de>
Cc:     Yu Kuai <yukuai1@...weicloud.com>, agk@...hat.com,
        snitzer@...nel.org, song@...nel.org, linux-kernel@...r.kernel.org,
        linux-raid@...r.kernel.org, yukuai3@...wei.com,
        yi.zhang@...wei.com, yangerkun@...wei.com
Subject: Re: [PATCH v2 0/5] md: fix uaf for sync_thread

On 2023-03-15 02:30, Paul Menzel wrote:
> Am 15.03.23 um 07:18 schrieb Yu Kuai:
>> I tested this pathset with mdadm tests, and there are no new regression,
>> by the way, following test will failed with or without this patchset:
>>
>> 01raid6integ
>> 04r1update
>> 05r6tor0
>> 10ddf-create
>> 10ddf-fail-spare
>> 10ddf-fail-stop-readd
>> 10ddf-geometry
> 
> As you improved the tests in the past, can you confirm, these failed on
> your test systems too and are fixed now?

Hmm, well Yu did not claim that those tests were fixed. If you re-read
what was said, the tests listed failed with or without the new changes.
As I read it, Yu asserts no new regressions were created with the patch
set, not that failing tests were fixed.

Unfortunately, the tests listed are largely not ones I saw failing the
last time I ran the tests (though it's been a few months since I last
tried). I know 01raid6integ used to fail some of the time, but the other
6 tests mentioned worked the last time I ran them; and there are many
other tests that failed when I ran them. (My notes on which tests are
broken are included in the most recent mdadm tree in tests/*.broken)

I was going to try and confirm that no new regressions were introduced
by Yu's patches, but seems the tests are getting worse. I tried running
the tests on the current md-next branch and found that one of the early
tests, 00raid5-zero, hangs indefinitely. I quickly ran the same test on
v6.3-rc2 and found that it runs just fine there. So it looks like
there's already a regression in md-next that is not part of this series
and I don't have the time to dig into the root cause right now.

Yu's patches don't apply cleanly to v6.3-rc2 and I can't run the tests
against md-next; so I didn't bother running them, but I did do a quick
review. The locking changes make sense to me so it might be worth
merging for correctness. However, I'm not entirely sure it's the best
solution -- the md thread stuff seems like a bit of a mess and passing
an mddev to thread functions that were not related to the mddev to get a
lock seems to just make the mess a bit worse.

For example, it seems a bit ugly to me for the lock mddev->thread_lock
to protect the access of a pointer in struct r5l_log. Just spit-balling,
but perhaps RCU would be more appropriate here. Then md_wakeup_thread()
would just need to hold the RCU read lock when dereferencing, and
md_unregister_thread() would just need to synchronize_rcu() before
stopping and freeing the thread. This has the benefit of not requiring
the mddev object for every md_thread and would probably require a lot
less churn than the current patches.

Logan