lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZWiQ-II9CvGv8EWK@tiehlicka>
Date:   Thu, 30 Nov 2023 14:41:12 +0100
From:   Michal Hocko <mhocko@...e.com>
To:     Baoquan He <bhe@...hat.com>
Cc:     Donald Dutile <ddutile@...hat.com>, Jiri Bohac <jbohac@...e.cz>,
        Pingfan Liu <piliu@...hat.com>, Tao Liu <ltao@...hat.com>,
        Vivek Goyal <vgoyal@...hat.com>,
        Dave Young <dyoung@...hat.com>, kexec@...ts.infradead.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH 0/4] kdump: crashkernel reservation from CMA

On Thu 30-11-23 20:31:44, Baoquan He wrote:
[...]
> > > which doesn't use the proper pinning API (which would migrate away from
> > > the CMA) then what is the worst case? We will get crash kernel corrupted
> > > potentially and fail to take a proper kernel crash, right? Is this
> > > worrisome? Yes. Is it a real roadblock? I do not think so. The problem
> 
> We may fail to take a proper kernel crash, why isn't it a roadblock?

It would be if the threat was practical. So far I only see very
theoretical what-if concerns. And I do not mean to downplay those at
all. As already explained proper CMA users shouldn't ever leak out any
writes across kernel reboot.

> We
> have stable way with a little more memory, why would we take risk to
> take another way, just for saving memory? Usually only high end server
> needs the big memory for crashkernel and the big end server usually have
> huge system ram. The big memory will be a very small percentage relative
> to huge system RAM.

Jiri will likely talk more specific about that but our experience tells
that proper crashkernel memory scaling has turned out a real
maintainability problem because existing setups tend to break with major
kernel version upgrades or non trivial changes.
 
> > > seems theoretical to me and it is not CMA usage at fault here IMHO. It
> > > is the said theoretical driver that needs fixing anyway.
> 
> Now, what we want to make clear is if it's a theoretical possibility, or
> very likely happen. We have met several on-flight DMA stomping into
> kexec kernel's initrd in the past two years because device driver didn't
> provide shutdown() methor properly. For kdump, once it happen, the pain
> is we don't know how to debug. For kexec reboot, customer allows to
> login their system to reproduce and figure out the stomping. For kdump,
> the system corruption rarely happend, and the stomping could rarely
> happen too.

yes, this is understood.
 
> The code change looks simple and the benefit is very attractive. I
> surely like it if finally people confirm there's no risk. As I said, we
> can't afford to take the risk if it possibly happen. But I don't object
> if other people would rather take risk, we can let it land in kernel.

I think it is fair to be cautious and I wouldn't impose the new method
as a default. Only time can tell how safe this really is. It is hard to
protect agains theoretical issues though. Bugs should be fixed.
I believe this option would allow to configure kdump much easier and
less fragile.
 
> My personal opinion, thanks for sharing your thought.

Thanks for sharing.

-- 
Michal Hocko
SUSE Labs

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ