linux-kernel - Re: [PATCH 1/2] x86/sgx: Do not fail on incomplete sanitization on premature stop of ksgxd

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <a5fa56bdc57d6472a306bd8d795afc674b724538.camel@intel.com>
Date:   Mon, 5 Sep 2022 07:50:33 +0000
From:   "Huang, Kai" <kai.huang@...el.com>
To:     "linux-sgx@...r.kernel.org" <linux-sgx@...r.kernel.org>,
        "jarkko@...nel.org" <jarkko@...nel.org>
CC:     "pmenzel@...gen.mpg.de" <pmenzel@...gen.mpg.de>,
        "dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>,
        "bp@...en8.de" <bp@...en8.de>,
        "Dhanraj, Vijay" <vijay.dhanraj@...el.com>,
        "Chatre, Reinette" <reinette.chatre@...el.com>,
        "mingo@...hat.com" <mingo@...hat.com>,
        "tglx@...utronix.de" <tglx@...utronix.de>,
        "x86@...nel.org" <x86@...nel.org>,
        "haitao.huang@...ux.intel.com" <haitao.huang@...ux.intel.com>,
        "stable@...r.kernel.org" <stable@...r.kernel.org>,
        "hpa@...or.com" <hpa@...or.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 1/2] x86/sgx: Do not fail on incomplete sanitization on
 premature stop of ksgxd

On Sat, 2022-09-03 at 13:26 +0300, Jarkko Sakkinen wrote:
> >   static int ksgxd(void *p)
> >   {
> > +	unsigned long left_dirty;
> > +
> >   	set_freezable();
> >   
> >   	/*
> >   	 * Sanitize pages in order to recover from kexec(). The 2nd pass is
> >   	 * required for SECS pages, whose child pages blocked EREMOVE.
> >   	 */
> > -	__sgx_sanitize_pages(&sgx_dirty_page_list);
> > -	__sgx_sanitize_pages(&sgx_dirty_page_list);
> > +	left_dirty = __sgx_sanitize_pages(&sgx_dirty_page_list);
> > +	pr_debug("%ld unsanitized pages\n", left_dirty);
>                   %lu
> 

I assume the intention is to print out the unsanitized SECS pages, but what is
the value of printing it? To me it doesn't provide any useful information, even
for debug.

Besides, the first call of __sgx_sanitize_pages() can return 0, due to either
kthread_should_stop() being true, or all EPC pages are EREMOVED successfully. 
So in this case kernel will print out "0 unsanitized pages\n", which doesn't
make a lot sense?

> >   
> > -	/* sanity check: */
> > -	WARN_ON(!list_empty(&sgx_dirty_page_list));
> > +	left_dirty = __sgx_sanitize_pages(&sgx_dirty_page_list);
> > +	/*
> > +	 * Never expected to happen in a working driver. If it happens the
> > bug
> > +	 * is expected to be in the sanitization process, but successfully
> > +	 * sanitized pages are still valid and driver can be used and most
> > +	 * importantly debugged without issues. To put short, the global
> > state
> > +	 * of kernel is not corrupted so no reason to do any more
> > complicated
> > +	 * rollback.
> > +	 */
> > +	if (left_dirty)
> > +		pr_err("%ld unsanitized pages\n", left_dirty);
>                         %lu

No strong opinion, but IMHO we can still just WARN() when it is driver bug:

1) There's no guarantee the driver can continue to work if it has bug;

2) WARN() can panic() the kernel if /proc/sys/kernel/panic_on_warn is set is
fine.  It's expected behaviour.  If I understand correctly, there are many
places in the kernel that uses WARN() to catch bugs.

In fact, we can even view WARN() as an advantage. For instance, if we only print
out "xx unsanitized pages" in the existing code, people may even wouldn't have
noticed this bug.

From this perspective, if you want to print out, I think you may want to make
the message more visible, that people can know it's driver bug.  Perhaps
something like "The driver has bug, please report to kernel community..", etc.

3) Changing WARN() to pr_err() conceptually isn't mandatory to fix this
particular bug.  So, it's kinda mixing things together.

But again, no strong opinion here.

-- 
Thanks,
-Kai