lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190724142235.GE8463@hao-dev>
Date:   Wed, 24 Jul 2019 22:22:35 +0800
From:   Wu Hao <hao.wu@...el.com>
To:     Greg KH <gregkh@...uxfoundation.org>
Cc:     mdf@...nel.org, linux-fpga@...r.kernel.org,
        linux-kernel@...r.kernel.org, linux-api@...r.kernel.org,
        linux-doc@...r.kernel.org, atull@...nel.org,
        Ananda Ravuri <ananda.ravuri@...el.com>,
        Xu Yilun <yilun.xu@...el.com>
Subject: Re: [PATCH v3 01/12] fpga: dfl: fme: support 512bit data width PR

On Wed, Jul 24, 2019 at 11:35:32AM +0200, Greg KH wrote:
> On Tue, Jul 23, 2019 at 12:51:24PM +0800, Wu Hao wrote:
> > In early partial reconfiguration private feature, it only
> > supports 32bit data width when writing data to hardware for
> > PR. 512bit data width PR support is an important optimization
> > for some specific solutions (e.g. XEON with FPGA integrated),
> > it allows driver to use AVX512 instruction to improve the
> > performance of partial reconfiguration. e.g. programming one
> > 100MB bitstream image via this 512bit data width PR hardware
> > only takes ~300ms, but 32bit revision requires ~3s per test
> > result.
> > 
> > Please note now this optimization is only done on revision 2
> > of this PR private feature which is only used in integrated
> > solution that AVX512 is always supported. This revision 2
> > hardware doesn't support 32bit PR.
> > 
> > Signed-off-by: Ananda Ravuri <ananda.ravuri@...el.com>
> > Signed-off-by: Xu Yilun <yilun.xu@...el.com>
> > Signed-off-by: Wu Hao <hao.wu@...el.com>
> > Acked-by: Alan Tull <atull@...nel.org>
> > Signed-off-by: Moritz Fischer <mdf@...nel.org>
> > ---
> > v2: remove DRV/MODULE_VERSION modifications
> > ---
> >  drivers/fpga/dfl-fme-mgr.c | 110 ++++++++++++++++++++++++++++++++++++++-------
> >  drivers/fpga/dfl-fme-pr.c  |  43 +++++++++++-------
> >  drivers/fpga/dfl-fme.h     |   2 +
> >  drivers/fpga/dfl.h         |   5 +++
> >  4 files changed, 129 insertions(+), 31 deletions(-)
> > 
> > diff --git a/drivers/fpga/dfl-fme-mgr.c b/drivers/fpga/dfl-fme-mgr.c
> > index b3f7eee..46e17f0 100644
> > --- a/drivers/fpga/dfl-fme-mgr.c
> > +++ b/drivers/fpga/dfl-fme-mgr.c
> > @@ -22,6 +22,7 @@
> >  #include <linux/io-64-nonatomic-lo-hi.h>
> >  #include <linux/fpga/fpga-mgr.h>
> >  
> > +#include "dfl.h"
> >  #include "dfl-fme-pr.h"
> >  
> >  /* FME Partial Reconfiguration Sub Feature Register Set */
> > @@ -30,6 +31,7 @@
> >  #define FME_PR_STS		0x10
> >  #define FME_PR_DATA		0x18
> >  #define FME_PR_ERR		0x20
> > +#define FME_PR_512_DATA		0x40 /* Data Register for 512bit datawidth PR */
> >  #define FME_PR_INTFC_ID_L	0xA8
> >  #define FME_PR_INTFC_ID_H	0xB0
> >  
> > @@ -67,8 +69,43 @@
> >  #define PR_WAIT_TIMEOUT   8000000
> >  #define PR_HOST_STATUS_IDLE	0
> >  
> > +#if defined(CONFIG_X86) && defined(CONFIG_AS_AVX512)
> > +
> > +#include <linux/cpufeature.h>
> > +#include <asm/fpu/api.h>
> > +
> > +static inline int is_cpu_avx512_enabled(void)
> > +{
> > +	return cpu_feature_enabled(X86_FEATURE_AVX512F);
> > +}
> 
> That's a very arch specific function, why would a driver ever care about
> this?

Yes, this is only applied to a specific FPGA solution, which FPGA
has been integrated with XEON. Hardware indicates this using register
to software. As it's cpu integrated solution, so CPU always has this
AVX512 capability. The only check we do, is make sure this is not
manually disabled by kernel.

With this hardware, software could use AVX512 to accelerate the FPGA
partial reconfiguration as mentioned in the patch commit message.
It brings performance benifits to people who uses it. This is only one
optimization (512 vs 32bit data write to hw) for a specific hardware.

For other discrete solutions, e.g. FPGA PCIe Card, this is not used
at all as driver does check hardware register to avoid any AVX512 code.

> 
> > +
> > +static inline void copy512(const void *src, void __iomem *dst)
> > +{
> > +	kernel_fpu_begin();
> > +
> > +	asm volatile("vmovdqu64 (%0), %%zmm0;"
> > +		     "vmovntdq %%zmm0, (%1);"
> > +		     :
> > +		     : "r"(src), "r"(dst)
> > +		     : "memory");
> > +
> > +	kernel_fpu_end();
> > +}
> 
> Shouldn't this be an arch-specific function somewhere?  Burying this in
> a random driver is not ok.  Please make this generic for all systems.

If more people need the same avx operation like this in kernel, then maybe
this can be moved to some arch-specific lib code somewhere as some common
functions to everybody, but i am not very sure if this is the case. Let me
think about this more.

> 
> > +#else
> > +static inline int is_cpu_avx512_enabled(void)
> > +{
> > +	return 0;
> > +}
> > +
> > +static inline void copy512(const void *src, void __iomem *dst)
> > +{
> > +	WARN_ON_ONCE(1);
> 
> Are you trying to get reports from syzbot?  :)

Oh.. no.. I will remove it. :)

Thank you very much!

Hao

> 
> Please fix this all up.
> 
> greg k-h

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ