lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAOUHufY8it25rBbV1QeO3-wF3g32VkDwrsT6mL4fQUNZsMGkKw@mail.gmail.com>
Date:   Mon, 21 Nov 2022 01:18:02 -0700
From:   Yu Zhao <yuzhao@...gle.com>
To:     Juergen Gross <jgross@...e.com>,
        Sander Eikelenboom <linux@...elenboom.it>
Cc:     linux-kernel <linux-kernel@...r.kernel.org>,
        Xen-devel <xen-devel@...ts.xen.org>
Subject: Re: Xen-unstable Linux-6.1.0-rc5 BUG: unable to handle page fault for
 address: ffff8880083374d0

On Mon, Nov 21, 2022 at 12:10 AM Juergen Gross <jgross@...e.com> wrote:
>
> On 19.11.22 09:28, Sander Eikelenboom wrote:
> > Hi Yu / Juergen,

Hi Sander / Juergen,

Thanks for the report and the analysis.

> > This night I got a dom0 kernel crash on my new Ryzen box running Xen-unstable
> > and a Linux-6.1.0-rc5 kernel.
> > I did enable the new and shiny MGLRU, could this be related ?
>
> It might be related, but I think it could happen independently from it.

Yes, I think it's related.

> > Nov 19 06:30:11 serveerstertje kernel: [68959.647371] BUG: unable to handle page
> > fault for address: ffff8880083374d0
> > Nov 19 06:30:11 serveerstertje kernel: [68959.663555] #PF: supervisor write
> > access in kernel mode
> > Nov 19 06:30:11 serveerstertje kernel: [68959.677542] #PF: error_code(0x0003) -
> > permissions violation
> > Nov 19 06:30:11 serveerstertje kernel: [68959.691181] PGD 3026067 P4D 3026067
> > PUD 3027067 PMD 7fee5067 PTE 8010000008337065
> > Nov 19 06:30:11 serveerstertje kernel: [68959.705084] Oops: 0003 [#1] PREEMPT
> > SMP NOPTI
> > Nov 19 06:30:11 serveerstertje kernel: [68959.718710] CPU: 7 PID: 158 Comm:
> > kswapd0 Not tainted 6.1.0-rc5-20221118-doflr-mac80211debug+ #1
> > Nov 19 06:30:11 serveerstertje kernel: [68959.732457] Hardware name: To Be
> > Filled By O.E.M. To Be Filled By O.E.M./B450 Pro4 R2.0, BIOS P5.60 10/20/2022
> > Nov 19 06:30:11 serveerstertje kernel: [68959.746391] RIP:
> > e030:pmdp_test_and_clear_young+0x25/0x40
>
> The kernel tired to reset the "accessed" bit in the pmd entry.

Correct.

> It does so only since commit eed9a328aa1ae. Before that
> pmdp_test_and_clear_young() could be called only for huge pages, which are
> disabled in Xen PV guests.

Correct. After that commit, we also can clear the accessed bit in
non-leaf PMD entries (pointing to PTE tables).

> pmdp_test_and_clear_young() does a test_and_clear_bit() of the pmd entry, which
> is failing since the hypervisor is emulating pte entry modifications only (pmd
> and pud entries can be set via hypercalls only).
>
> Could you please test the attached patch whether it fixes the issue for you?

There is a runtime kill switch for ARCH_HAS_NONLEAF_PMD_YOUNG, since I
wasn't able to verify this capability on all x86 varieties. The following
should do it:

  # cat /sys/kernel/mm/lru_gen/enabled
  0x0007
  # echo 3 >/sys/kernel/mm/lru_gen/enabled

Details are in Documentation/admin-guide/mm/multigen_lru.rst.

Alternatively, we could make ARCH_HAS_NONLEAF_PMD_YOUNG a runtime
check similar to arch_has_hw_pte_young() on arm64.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ