linux-kernel - Re: [PATCH v1 1/4] mm: handle poisoning of pfn without struct pages

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <878264ae-f6f6-04d9-2d52-fb7ae29dca85@huawei.com>
Date:   Sat, 23 Sep 2023 11:20:19 +0800
From:   Miaohe Lin <linmiaohe@...wei.com>
To:     <ankita@...dia.com>
CC:     <aniketa@...dia.com>, <cjia@...dia.com>, <kwankhede@...dia.com>,
        <targupta@...dia.com>, <vsethi@...dia.com>, <acurrid@...dia.com>,
        <anuaggarwal@...dia.com>, <linux-kernel@...r.kernel.org>,
        <linux-mm@...ck.org>, <linux-edac@...r.kernel.org>,
        <kvm@...r.kernel.org>, <jgg@...dia.com>,
        <alex.williamson@...hat.com>, <akpm@...ux-foundation.org>,
        <tony.luck@...el.com>, <bp@...en8.de>, <naoya.horiguchi@....com>
Subject: Re: [PATCH v1 1/4] mm: handle poisoning of pfn without struct pages

On 2023/9/20 22:02, ankita@...dia.com wrote:
> From: Ankit Agrawal <ankita@...dia.com>
> 
> The kernel MM currently does not handle ECC errors / poison on a memory
> region that is not backed by struct pages. If a memory region is mapped
> using remap_pfn_range(), but not added to the kernel, MM will not have
> associated struct pages. Add a new mechanism to handle memory failure
> on such memory.
> 
> Make kernel MM expose a function to allow modules managing the device
> memory to register a failure function and the physical address space
> associated with the device memory. MM maintains this information as
> interval tree. The registered memory failure function is used by MM to
> notify the kernel module managing the PFN, so that the module may take
> any required action. The module for example may use the information
> to track the poisoned pages.
> 
> In this implementation, kernel MM follows the following sequence similar
> (mostly) to the memory_failure() handler for struct page backed memory:
> 1. memory_failure() is triggered on reception of a poison error. An
> absence of struct page is detected and consequently memory_failure_pfn()
> is executed.
> 2. memory_failure_pfn() call the newly introduced failure handler exposed
> by the module managing the poisoned memory to notify it of the problematic
> PFN.
> 3. memory_failure_pfn() unmaps the stage-2 mapping to the PFN.
> 4. memory_failure_pfn() collects the processes mapped to the PFN.
> 5. memory_failure_pfn() sends SIGBUS (BUS_MCEERR_AO) to all the processes
> mapping the faulty PFN using kill_procs().
> 6. An access to the faulty PFN by an operation in VM at a later point
> is trapped and user_mem_abort() is called.
> 7. The vma ops fault function gets called due to the absence of Stage-2
> mapping. It is expected to return VM_FAULT_HWPOISON on the PFN.
> 8. __gfn_to_pfn_memslot() then returns KVM_PFN_ERR_HWPOISON, which cause
> the poison with SIGBUS (BUS_MCEERR_AR) to be sent to the QEMU process
> through kvm_send_hwpoison_signal().
> 
> Signed-off-by: Ankit Agrawal <ankita@...dia.com>

Thanks for your patch.

<snip>

>  /*
>   * Return values:
>   *   1:   the page is dissolved (if needed) and taken off from buddy,
> @@ -422,15 +428,15 @@ static unsigned long dev_pagemap_mapping_shift(struct vm_area_struct *vma,
>   * Schedule a process for later kill.
>   * Uses GFP_ATOMIC allocations to avoid potential recursions in the VM.
>   *
> - * Note: @fsdax_pgoff is used only when @p is a fsdax page and a
> - * filesystem with a memory failure handler has claimed the
> - * memory_failure event. In all other cases, page->index and
> - * page->mapping are sufficient for mapping the page back to its
> + * Notice: @pgoff is used either when @p is a fsdax page or a PFN is not
> + * backed by struct page and a filesystem with a memory failure handler
> + * has claimed the memory_failure event. In all other cases, page->index
> + * and page->mapping are sufficient for mapping the page back to its
>   * corresponding user virtual address.
>   */
>  static void __add_to_kill(struct task_struct *tsk, struct page *p,
>  			  struct vm_area_struct *vma, struct list_head *to_kill,
> -			  unsigned long ksm_addr, pgoff_t fsdax_pgoff)
> +			  unsigned long ksm_addr, pgoff_t pgoff)
>  {
>  	struct to_kill *tk;
>  
> @@ -440,13 +446,18 @@ static void __add_to_kill(struct task_struct *tsk, struct page *p,
>  		return;
>  	}
>  
> -	tk->addr = ksm_addr ? ksm_addr : page_address_in_vma(p, vma);
> -	if (is_zone_device_page(p)) {
> -		if (fsdax_pgoff != FSDAX_INVALID_PGOFF)
> -			tk->addr = vma_pgoff_address(fsdax_pgoff, 1, vma);
> -		tk->size_shift = dev_pagemap_mapping_shift(vma, tk->addr);
> -	} else
> -		tk->size_shift = page_shift(compound_head(p));
> +	if (vma->vm_flags | PFN_MAP) {

if (vma->vm_flags | PFN_MAP)? So this branch is always selected?

> +		tk->addr = vma_pgoff_address(pgoff, 1, vma);
> +		tk->size_shift = PAGE_SHIFT;
> +	} else {
> +		tk->addr = ksm_addr ? ksm_addr : page_address_in_vma(p, vma);
> +		if (is_zone_device_page(p)) {
> +			if (pgoff != FSDAX_INVALID_PGOFF)
> +				tk->addr = vma_pgoff_address(pgoff, 1, vma);
> +			tk->size_shift = dev_pagemap_mapping_shift(vma, tk->addr);
> +		} else
> +			tk->size_shift = page_shift(compound_head(p));
> +	}
>  

IIUC, the page passed to __add_to_kill is NULL in this case. So when tk->addr == -EFAULT, we will have problem
to do the page_to_pfn(p) in the following pr_info:

	if (tk->addr == -EFAULT) {
		pr_info("Unable to find user space address %lx in %s\n",
			page_to_pfn(p), tsk->comm);

>  	/*
>  	 * Send SIGKILL if "tk->addr == -EFAULT". Also, as
> @@ -666,8 +677,7 @@ static void collect_procs_file(struct page *page, struct list_head *to_kill,
>  	i_mmap_unlock_read(mapping);
>  }
>  

<snip>

>  /**
>   * memory_failure - Handle memory failure of a page.
>   * @pfn: Page Number of the corrupted page
> @@ -2183,6 +2271,11 @@ int memory_failure(unsigned long pfn, int flags)
>  	if (!(flags & MF_SW_SIMULATED))
>  		hw_memory_failure = true;
>  
> +	if (!pfn_valid(pfn) && !arch_is_platform_page(PFN_PHYS(pfn))) {

Could it be better to add a helper here to detect the pfns without struct page?

> +		res = memory_failure_pfn(pfn, flags);
> +		goto unlock_mutex;
> +	}
> +
>  	p = pfn_to_online_page(pfn);
>  	if (!p) {
>  		res = arch_memory_failure(pfn, flags);
> 

Thanks.