lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20231125121059.GAZWHkU27odMLns7TZ@fat_crate.local>
Date:   Sat, 25 Nov 2023 13:10:59 +0100
From:   Borislav Petkov <bp@...en8.de>
To:     Shuai Xue <xueshuai@...ux.alibaba.com>
Cc:     rafael@...nel.org, wangkefeng.wang@...wei.com,
        tanxiaofei@...wei.com, mawupeng1@...wei.com, tony.luck@...el.com,
        linmiaohe@...wei.com, naoya.horiguchi@....com, james.morse@....com,
        gregkh@...uxfoundation.org, will@...nel.org, jarkko@...nel.org,
        linux-acpi@...r.kernel.org, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org, akpm@...ux-foundation.org,
        linux-edac@...r.kernel.org, acpica-devel@...ts.linuxfoundation.org,
        stable@...r.kernel.org, x86@...nel.org, justin.he@....com,
        ardb@...nel.org, ying.huang@...el.com, ashish.kalra@....com,
        baolin.wang@...ux.alibaba.com, tglx@...utronix.de,
        mingo@...hat.com, dave.hansen@...ux.intel.com, lenb@...nel.org,
        hpa@...or.com, robert.moore@...el.com, lvying6@...wei.com,
        xiexiuqi@...wei.com, zhuo.song@...ux.alibaba.com
Subject: Re: [PATCH v9 0/2] ACPI: APEI: handle synchronous errors in task
 work with proper si_code

On Sat, Nov 25, 2023 at 02:44:52PM +0800, Shuai Xue wrote:
> - an AR error consumed by current process is deferred to handle in a
>   dedicated kernel thread, but memory_failure() assumes that it runs in the
>   current context

On x86? ARM?

Please point to the exact code flow.

> - another page fault is not unnecessary, we can send sigbus to current
>   process in the first Synchronous External Abort SEA on arm64 (analogy
>   Machine Check Exception on x86)

I have no clue what that means. What page fault?

> I just give an example that the user space process *really* relys on the
> si_code of signal to handle hardware errors

No, don't give examples.

Explain what the exact problem is you're seeing, in your use case, point
to the code and then state how you think it should be fixed and why.

Right now your text is "all over the place" and I have no clue what you
even want.

> The SIGBUS si_codes defined in include/uapi/asm-generic/siginfo.h says:
> 
>     /* hardware memory error consumed on a machine check: action required */
>     #define BUS_MCEERR_AR	4
>     /* hardware memory error detected in process but not consumed: action optional*/
>     #define BUS_MCEERR_AO	5
> 
> When a synchronous error is consumed by Guest, the kernel should send a
> signal with BUS_MCEERR_AR instead of BUS_MCEERR_AO.

Can you drop this "synchronous" bla and concentrate on the error
*severity*?

I think you want to say that there are some types of errors for which
error handling needs to happen immediately and for some reason that
doesn't happen.

Which errors are those? Types?

Why do you need them to be handled immediately?

> Exactly.

No, not exactly. Why is it ok to do that? What are the implications of
this?

Is immediate killing the right decision?

Is this ok for *every* possible kernel running out there - not only for
your use case?

And so on and so on...

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ