lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Thu, 16 May 2024 11:46:07 +0200
From: Oscar Salvador <osalvador@...e.de>
To: Jane Chu <jane.chu@...cle.com>
Cc: linmiaohe@...wei.com, nao.horiguchi@...il.com,
	akpm@...ux-foundation.org, linux-mm@...ck.org,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 3/5] mm/memory-failure: improve memory failure
 action_result messages

On Fri, May 10, 2024 at 12:26:00AM -0600, Jane Chu wrote:
> Added two explicit MF_MSG messages describing failure in get_hwpoison_page.
> Attemped to document the definition of various action names, and made a few
> adjustment to the action_result() calls.
> 
> Signed-off-by: Jane Chu <jane.chu@...cle.com>
..  
> +/*
> + * MF_IGNORED - Either the m-f() handler did nothing to recover, or it did
"or if it "
> + *   something, then caught in a race condition which renders the effort sort
"it was caught"

I would also add to MF_IGNORED that we mark the page hwpoisoned anyway.
  
> @@ -1018,7 +1034,7 @@ static int me_unknown(struct page_state *ps, struct page *p)
>  {
>  	pr_err("%#lx: Unknown page state\n", page_to_pfn(p));
>  	unlock_page(p);
> -	return MF_FAILED;
> +	return MF_IGNORED;
>  }

I was confused because I saw you replaced all MF_MSG_UNKNOWN, so I
wondered how we can end up here until I saw this is a catch-all in case
we fail to make sense of the page->flags.
While you are improving this, I would suggest to add a little comment
above the function explaining how we can reach it.

>  /*
> @@ -2055,6 +2071,7 @@ static int try_memory_failure_hugetlb(unsigned long pfn, int flags, int *hugetlb
>  		if (flags & MF_ACTION_REQUIRED) {
>  			folio = page_folio(p);
>  			res = kill_accessing_process(current, folio_pfn(folio), flags);
> +			return action_result(pfn, MF_MSG_ALREADY_POISONED, MF_FAILED);

I do not understand why are you doing this.

First of all, why is this considered a failure? We did not fail this
time, did we? We went right ahead and kill the process which was
re-accessing the hwpoisoned page. Is that considered a failure?

Second, you are know supressing -EHWPOISON with whatever action_result()
will gives us, which judging from the actual code would be -EBUSY?
I do not think that that is right, and we should be returning
-EHWPOISON. Or whatever error code kill_accessing_process() gives us.


> @@ -2231,6 +2248,7 @@ int memory_failure(unsigned long pfn, int flags)
>  			res = kill_accessing_process(current, pfn, flags);
>  		if (flags & MF_COUNT_INCREASED)
>  			put_page(p);
> +		action_result(pfn, MF_MSG_ALREADY_POISONED, MF_FAILED);

This is not coherent with what you did in try_memory_failure_hugetlb
for MF_MSG_ALREADY_POISONED, I __think__ that in there we should be
doing the same as we do here.

 

-- 
Oscar Salvador
SUSE Labs

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ