lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20201020002334.GA61232@L-31X9LVDL-1304.local>
Date:   Tue, 20 Oct 2020 08:23:34 +0800
From:   Wei Yang <richard.weiyang@...ux.alibaba.com>
To:     David Hildenbrand <david@...hat.com>
Cc:     Wei Yang <richard.weiyang@...ux.alibaba.com>,
        linux-kernel@...r.kernel.org, linux-mm@...ck.org,
        virtualization@...ts.linux-foundation.org,
        Andrew Morton <akpm@...ux-foundation.org>,
        "Michael S . Tsirkin" <mst@...hat.com>,
        Jason Wang <jasowang@...hat.com>,
        Pankaj Gupta <pankaj.gupta.linux@...il.com>,
        Michal Hocko <mhocko@...nel.org>,
        Oscar Salvador <osalvador@...e.de>
Subject: Re: [PATCH v1 29/29] virtio-mem: Big Block Mode (BBM) - safe memory
 hotunplug

On Mon, Oct 19, 2020 at 10:50:39AM +0200, David Hildenbrand wrote:
>On 19.10.20 09:54, Wei Yang wrote:
>> On Mon, Oct 12, 2020 at 02:53:23PM +0200, David Hildenbrand wrote:
>>> Let's add a safe mechanism to unplug memory, avoiding long/endless loops
>>> when trying to offline memory - similar to in SBM.
>>>
>>> Fake-offline all memory (via alloc_contig_range()) before trying to
>>> offline+remove it. Use this mode as default, but allow to enable the other
>>> mode explicitly (which could give better memory hotunplug guarantees in
>> 
>> I don't get the point how unsafe mode would have a better guarantees?
>
>It's primarily only relevant when there is a lot of concurrent action
>going on while unplugging memory. Using alloc_contig_range() on
>ZONE_MOVABLE can fail more easily than memory offlining.
>
>alloc_contig_range() doesn't try as hard as memory offlining code to
>isolate memory. There are known issues with temporary page pinning
>(e.g., when a process dies) and the PCP. (mostly discovered via CMA
>allocations)
>
>See the TODO I add in patch #14.
>
>[...]
>>>
>>> +	if (bbm_safe_unplug) {
>>> +		/*
>>> +		 * Start by fake-offlining all memory. Once we marked the device
>>> +		 * block as fake-offline, all newly onlined memory will
>>> +		 * automatically be kept fake-offline. Protect from concurrent
>>> +		 * onlining/offlining until we have a consistent state.
>>> +		 */
>>> +		mutex_lock(&vm->hotplug_mutex);
>>> +		virtio_mem_bbm_set_bb_state(vm, bb_id,
>>> +					    VIRTIO_MEM_BBM_BB_FAKE_OFFLINE);
>>> +
>> 
>> State is set here.
>> 
>>> +		for (pfn = start_pfn; pfn < end_pfn; pfn += PAGES_PER_SECTION) {
>>> +			page = pfn_to_online_page(pfn);
>>> +			if (!page)
>>> +				continue;
>>> +
>>> +			rc = virtio_mem_fake_offline(pfn, PAGES_PER_SECTION);
>>> +			if (rc) {
>>> +				end_pfn = pfn;
>>> +				goto rollback_safe_unplug;
>>> +			}
>>> +		}
>>> +		mutex_unlock(&vm->hotplug_mutex);
>>> +	}
>>> +
>>> 	rc = virtio_mem_bbm_offline_and_remove_bb(vm, bb_id);
>>> -	if (rc)
>>> +	if (rc) {
>>> +		if (bbm_safe_unplug) {
>>> +			mutex_lock(&vm->hotplug_mutex);
>>> +			goto rollback_safe_unplug;
>>> +		}
>>> 		return rc;
>>> +	}
>>>
>>> 	rc = virtio_mem_bbm_unplug_bb(vm, bb_id);
>>> 	if (rc)
>> 
>> And changed to PLUGGED or UNUSED based on rc.
>
>Right, after offlining+remove succeeded. So no longer added to Linux.
>
>The final state depends on the success of the unplug request towards the
>hypervisor.
>
>> 
>>> @@ -1987,6 +2069,17 @@ static int virtio_mem_bbm_offline_remove_and_unplug_bb(struct virtio_mem *vm,
>>> 		virtio_mem_bbm_set_bb_state(vm, bb_id,
>>> 					    VIRTIO_MEM_BBM_BB_UNUSED);
>>> 	return rc;
>>> +
>>> +rollback_safe_unplug:
>>> +	for (pfn = start_pfn; pfn < end_pfn; pfn += PAGES_PER_SECTION) {
>>> +		page = pfn_to_online_page(pfn);
>>> +		if (!page)
>>> +			continue;
>>> +		virtio_mem_fake_online(pfn, PAGES_PER_SECTION);
>>> +	}
>>> +	virtio_mem_bbm_set_bb_state(vm, bb_id, VIRTIO_MEM_BBM_BB_ADDED);
>> 
>> And changed to ADDED if failed.
>
>Right, back to the initial state when entering this function.
>
>> 
>>> +	mutex_unlock(&vm->hotplug_mutex);
>>> +	return rc;
>>> }
>> 
>> So in which case, the bbm state is FAKE_OFFLINE during
>> virtio_mem_bbm_notify_going_offline() and
>> virtio_mem_bbm_notify_cancel_offline() ?
>
>Exactly, so we can do our magic with fake-offline pages and our
>virtio_mem_bbm_offline_and_remove_bb() can actually succeed.

Ah, my fault. The exact code flow is this:

    virtio_mem_bbm_offline_remove_and_unplug_bb()
        virtio_mem_bbm_set_bb_state(vm, bb_id, VIRTIO_MEM_BBM_BB_FAKE_OFFLINE)
	virtio_mem_fake_offline(pfn, PAGES_PER_SECTION)
	virtio_mem_bbm_offline_and_remove_bb(vm, bb_id)
	    offline and trigger memory notification  --- 1)

The notification is necessary at 1) to release the refcount, which is grabbed
during fake offline.

>
>
>-- 
>Thanks,
>
>David / dhildenb

-- 
Wei Yang
Help you, Help me

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ