lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <24a95dea-9ea6-a904-7c0b-197961afa1d1@bytedance.com>
Date:   Fri, 27 May 2022 14:32:52 +0800
From:   zhenwei pi <pizhenwei@...edance.com>
To:     Peter Xu <peterx@...hat.com>, Jue Wang <juew@...gle.com>
Cc:     Andrew Morton <akpm@...ux-foundation.org>,
        David Hildenbrand <david@...hat.com>, jasowang@...hat.com,
        LKML <linux-kernel@...r.kernel.org>,
        Linux MM <linux-mm@...ck.org>, mst@...hat.com,
        HORIGUCHI NAOYA(堀口 直也) 
        <naoya.horiguchi@....com>, Paolo Bonzini <pbonzini@...hat.com>,
        qemu-devel@...gnu.org, virtualization@...ts.linux-foundation.org
Subject: Re: Re: [PATCH 0/3] recover hardware corrupted page by virtio balloon

On 5/27/22 02:37, Peter Xu wrote:
> On Wed, May 25, 2022 at 01:16:34PM -0700, Jue Wang wrote:
>> The hypervisor _must_ emulate poisons identified in guest physical
>> address space (could be transported from the source VM), this is to
>> prevent silent data corruption in the guest. With a paravirtual
>> approach like this patch series, the hypervisor can clear some of the
>> poisoned HVAs knowing for certain that the guest OS has isolated the
>> poisoned page. I wonder how much value it provides to the guest if the
>> guest and workload are _not_ in a pressing need for the extra KB/MB
>> worth of memory.
> 
> I'm curious the same on how unpoisoning could help here.  The reasoning
> behind would be great material to be mentioned in the next cover letter.
> 
> Shouldn't we consider migrating serious workloads off the host already
> where there's a sign of more severe hardware issues, instead?
> 
> Thanks,
> 

I'm maintaining 1000,000+ virtual machines, from my experience:
UE is quite unusual and occurs randomly, and I did not hit UE storm case 
in the past years. The memory also has no obvious performance drop after 
hitting UE.

I hit several CE storm case, the performance memory drops a lot. But I 
can't find obvious relationship between UE and CE.

So from the point of my view, to fix the corrupted page for VM seems 
good enough. And yes, unpoisoning several pages does not help 
significantly, but it is still a chance to make the virtualization better.

-- 
zhenwei pi

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ