lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CABXGCsOM0j7ME4iUDbf5fpLMxZicXHwT9aBGWXCNWUVSPUO0Sw@mail.gmail.com>
Date:   Sat, 2 Sep 2023 14:51:55 +0500
From:   Mikhail Gavrilov <mikhail.v.gavrilov@...il.com>
To:     Hugh Dickins <hughd@...gle.com>
Cc:     Andrew Morton <akpm@...ux-foundation.org>,
        Bagas Sanjaya <bagasdotme@...il.com>,
        linux-kernel@...r.kernel.org, linux-mm@...ck.org,
        regressions@...ts.linux.dev
Subject: Re: 6.6/regression/bisected - after commit a349d72fd9efc87c8fd1d16d3164752d84a7275b
 system stopped booting

On Sat, Sep 2, 2023 at 3:48 AM Hugh Dickins <hughd@...gle.com> wrote:
> That was very disappointing: I found it hard to explain, but was thinking
> of sending you a similar patch, doing the same check on all your 32 CPUs -
> maybe the stall being on CPU 0 in your photo was accidental.
>
> But now I think I have the shameful answer (which studying your dmesg,
> and the 82328 jiffies at 86 seconds in your photo, did help me towards).
>
> That mm/pagewalk fix I put into 6.5 has a grievous oversight (and a
> video of your failing 6.6 bootup would likely have shown a WARN_ON_ONCE
> from the underflow in __rcu_read_unlock()).
>
> Please revert the debug patch I sent yesterday (or earlier today), please
> try booting with this one on top of a349d72fd9ef; and if that's successful,
> then please go back to your original Rawhide tree and apply this on top of
> that, to confirm that boots to a working system too - thanks.
>
> With my apologies,
>
> [PATCH] mm/pagewalk: fix bootstopping regression from extra pte_unmap()
>
> [ Commit message yet to be written: it's actually something to go to
> 6.5 stable, to correct i386 CONFIG_HIGHPTE there - though we know of
> no case where it is actually hit. ]
>
> Signed-off-by: Hugh Dickins <hughd@...gle.com>
> ---
>  mm/pagewalk.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/pagewalk.c b/mm/pagewalk.c
> index 2022333805d3..9e7d0276c38a 100644
> --- a/mm/pagewalk.c
> +++ b/mm/pagewalk.c
> @@ -58,7 +58,7 @@ static int walk_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
>                         pte = pte_offset_map(pmd, addr);
>                 if (pte) {
>                         err = walk_pte_range_inner(pte, addr, end, walk);
> -                       if (walk->mm != &init_mm)
> +                       if (walk->mm != &init_mm && addr < TASK_SIZE)
>                                 pte_unmap(pte);
>                 }
>         } else {
> --
> 2.35.3

Great, this is the right patch.
Both build a349d72fd9ef and latest in Rawhide (now it is 99d99825fc07)
works fine after applying this patch.
So thank you a lot.
Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@...il.com>

-- 
Best Regards,
Mike Gavrilov.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ