linux-kernel - RE: [RFC PATCH v2] uacce: Add uacce

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <96b655ade2534a65974a378bb68383ee@hisilicon.com>
Date:   Mon, 25 Jan 2021 22:21:14 +0000
From:   "Song Bao Hua (Barry Song)" <song.bao.hua@...ilicon.com>
To:     Jason Gunthorpe <jgg@...pe.ca>,
        "Wangzhou (B)" <wangzhou1@...ilicon.com>
CC:     Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Arnd Bergmann <arnd@...db.de>,
        Zhangfei Gao <zhangfei.gao@...aro.org>,
        "linux-accelerators@...ts.ozlabs.org" 
        <linux-accelerators@...ts.ozlabs.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "iommu@...ts.linux-foundation.org" <iommu@...ts.linux-foundation.org>,
        "linux-mm@...ck.org" <linux-mm@...ck.org>,
        "Liguozhu (Kenneth)" <liguozhu@...ilicon.com>,
        "chensihang (A)" <chensihang1@...ilicon.com>
Subject: RE: [RFC PATCH v2] uacce: Add uacce_ctrl misc device



> -----Original Message-----
> From: Jason Gunthorpe [mailto:jgg@...pe.ca]
> Sent: Tuesday, January 26, 2021 4:47 AM
> To: Wangzhou (B) <wangzhou1@...ilicon.com>
> Cc: Greg Kroah-Hartman <gregkh@...uxfoundation.org>; Arnd Bergmann
> <arnd@...db.de>; Zhangfei Gao <zhangfei.gao@...aro.org>;
> linux-accelerators@...ts.ozlabs.org; linux-kernel@...r.kernel.org;
> iommu@...ts.linux-foundation.org; linux-mm@...ck.org; Song Bao Hua (Barry Song)
> <song.bao.hua@...ilicon.com>; Liguozhu (Kenneth) <liguozhu@...ilicon.com>;
> chensihang (A) <chensihang1@...ilicon.com>
> Subject: Re: [RFC PATCH v2] uacce: Add uacce_ctrl misc device
> 
> On Mon, Jan 25, 2021 at 04:34:56PM +0800, Zhou Wang wrote:
> 
> > +static int uacce_pin_page(struct uacce_pin_container *priv,
> > +			  struct uacce_pin_address *addr)
> > +{
> > +	unsigned int flags = FOLL_FORCE | FOLL_WRITE;
> > +	unsigned long first, last, nr_pages;
> > +	struct page **pages;
> > +	struct pin_pages *p;
> > +	int ret;
> > +
> > +	first = (addr->addr & PAGE_MASK) >> PAGE_SHIFT;
> > +	last = ((addr->addr + addr->size - 1) & PAGE_MASK) >> PAGE_SHIFT;
> > +	nr_pages = last - first + 1;
> > +
> > +	pages = vmalloc(nr_pages * sizeof(struct page *));
> > +	if (!pages)
> > +		return -ENOMEM;
> > +
> > +	p = kzalloc(sizeof(*p), GFP_KERNEL);
> > +	if (!p) {
> > +		ret = -ENOMEM;
> > +		goto free;
> > +	}
> > +
> > +	ret = pin_user_pages_fast(addr->addr & PAGE_MASK, nr_pages,
> > +				  flags | FOLL_LONGTERM, pages);
> 
> This needs to copy the RLIMIT_MEMLOCK and can_do_mlock() stuff from
> other places, like ib_umem_get
> 
> > +	ret = xa_err(xa_store(&priv->array, p->first, p, GFP_KERNEL));
> 
> And this is really weird, I don't think it makes sense to make handles
> for DMA based on the starting VA.
> 
> > +static int uacce_unpin_page(struct uacce_pin_container *priv,
> > +			    struct uacce_pin_address *addr)
> > +{
> > +	unsigned long first, last, nr_pages;
> > +	struct pin_pages *p;
> > +
> > +	first = (addr->addr & PAGE_MASK) >> PAGE_SHIFT;
> > +	last = ((addr->addr + addr->size - 1) & PAGE_MASK) >> PAGE_SHIFT;
> > +	nr_pages = last - first + 1;
> > +
> > +	/* find pin_pages */
> > +	p = xa_load(&priv->array, first);
> > +	if (!p)
> > +		return -ENODEV;
> > +
> > +	if (p->nr_pages != nr_pages)
> > +		return -EINVAL;
> > +
> > +	/* unpin */
> > +	unpin_user_pages(p->pages, p->nr_pages);
> 
> And unpinning without guaranteeing there is no ongoing DMA is really
> weird

In SVA case, kernel has no idea if accelerators are accessing
the memory so I would assume SVA has a method to prevent
the pages being transferred from migration or release. Otherwise,
SVA will crash easily in a system with high memory pressure.

Anyway, This is a problem worth further investigating.

> 
> Are you abusing this in conjunction with a SVA scheme just to prevent
> page motion? Why wasn't mlock good enough?

Page migration won't cause any disfunction in SVA case as IO page
fault will get a valid page again. It is only a performance issue
as IO page fault has larger latency than the usual page fault,
would be 3-80slower than page fault[1]

mlock, while certainly be able to prevent swapping out, it won't
be able to stop page moving due to:
* memory compaction in alloc_pages()
* making huge pages
* numa balance
* memory compaction in CMA
etc.

[1] https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7482091&tag=1
> 
> Jason

Thanks
Barry