[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <16416f38-2d00-be53-23b1-894064beeaaf@gmail.com>
Date: Tue, 16 Nov 2021 13:12:20 +0200
From: Ivaylo Dimitrov <ivo.g.dimitrov.75@...il.com>
To: Tomi Valkeinen <tomi.valkeinen@...asonboard.com>
Cc: matthijsvanduin@...il.com, airlied@...ux.ie, daniel@...ll.ch,
linux-kernel@...r.kernel.org, linux-omap@...r.kernel.org,
Laurent Pinchart <laurent.pinchart@...asonboard.com>,
Merlijn Wajer <merlijn@...zup.org>,
Carl Philipp Klemm <philipp@...s.xyz>
Subject: Re: [PATCH v2] drm: omapdrm: Export correct scatterlist for TILER
backed BOs
On 16.11.21 г. 12:20 ч., Tomi Valkeinen wrote:
> On 16/11/2021 10:27, Ivaylo Dimitrov wrote:
>> Hi,
>>
>> On 16.11.21 г. 8:42 ч., Tomi Valkeinen wrote:
>>> On 15/11/2021 19:15, Ivaylo Dimitrov wrote:
>>>> Hi,
>>>>
>>>> On 15.11.21 г. 17:37 ч., Tomi Valkeinen wrote:
>>>>> On 15/11/2021 15:55, Ivaylo Dimitrov wrote:
>>>>>> Hi Tomi,
>>>>>>
>>>>>> On 15.11.21 г. 10:42 ч., Tomi Valkeinen wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> On 13/11/2021 11:53, Ivaylo Dimitrov wrote:
>>>>>>>> Memory of BOs backed by TILER is not contiguous, but
>>>>>>>> omap_gem_map_dma_buf()
>>>>>>>> exports it like it is. This leads to (possibly) invalid memory
>>>>>>>> accesses if
>>>>>>>> another device imports such a BO.
>>>>>>>
>>>>>>> This is one reason why TILER hasn't been officially supported.
>>>>>>> But the above is not exactly right, or at least not the whole truth.
>>>>>>>
>>>>>>
>>>>>> Definitely, not only these BOs lie about their memory layout, they
>>>>>> lie about size and alignment as well. I have 2 more patches here
>>>>>> (one is to align TILER memory on page, as proposed by Matthijs in
>>>>>> the other mail, the other to set the correct size when exporting
>>>>>> TILER BO), but I wanted to hear from you first, like, what is the
>>>>>> general trend :) .
>>>>>
>>>>> My thoughts here are that the current code doesn't work in
>>>>> practice, so if you get it fixed, it's great =).
>>>>>
>>>>>> Also, I have another patch in mind, that will enable exporting of
>>>>>> buffers that are not TILER backed, but are not CMA backed either.
>>>>>> SGX for example does not need CMA memory to render to.
>>>>>
>>>>> What do you mean with this? DSS needs contiguous memory, so the
>>>>> memory has to be 1) physically contiguous, 2) mapped with DMM or 3)
>>>>> mapped with TILER. There's no reason for the driver to export
>>>>> non-contiguous memory.
>>>>>
>>>>
>>>> DSS yes, but, omapdrm is used to allocate non-scanout buffers as
>>>> well, which do not need to be (and in practice are not) contiguous.
>>>> GPU (or anyone with MMU) can render on them (DRI buffers for
>>>> example) and later on those buffers can be copied (blit) to the
>>>> framebuffer. Yes, not zero-copy, but if you're doing compositing,
>>>> there is no option anyway.
>>>>
>>>> Exactly this is done by omap-video driver for example. GBM BOs are
>>>> allocated through omapdrm as well.
>>>
>>> That is not correct and shouldn't be done. omapdrm is not a generic
>>> memory allocator. We have real generic allocators, so those should be
>>> used. Or, if the buffer is only used for a single device, the buffer
>>> should be allocated from that device's driver.
>>>
>>
>> Yes, I saw the comment in kernel headers that dumb buffers should not
>> be used for rendering
>> (https://elixir.bootlin.com/linux/latest/source/include/drm/drm_drv.h#L361).
>> This makes no sense to me at all, but maybe I am missing the point.
>
> I believe that comments refers to another issue: a dumb buffer from may
> not be usable for rendering. It's only guaranteed to be
> readable/writable by the CPU.
>
> What I'm talking about is that a driver must support memory allocations
> for buffers that the device handled by the driver can use. In many cases
> that allocated buffer also works with other devices, and thus dmabuf
> export/import can be used. But a driver supporting memory allocations
> for buffers that the device itself cannot use is just wrong.
>
>> Also, it could be that the implementation of omap-video and/or PVR
>> userspace blobs is against the specs, but I see omap-video calling
>> DRM_IOCTL_OMAP_GEM_NEW for DRI buffers without OMAP_BO_SCANOUT and
>> libdbm.so calling DRM_IOCTL_MODE_CREATE_DUMB to create buffers then
>> used for rendering.
>
> I think neither of those are exactly material to be used as examples on
> how to do things. And there's lots of history there. We didn't have
> generic allocators back then.
>
>> This is not an issue on omap4 an later, because when export of that
>> buffer is requested, omapdrm uses DMM and exports a single scatterlist
>> entry, IIUC.
>>
>> But, on omap3, given there is no DMM, export is simply refused. I
>> don't see that as a consistent behaviour - we shall either a) export
>> non-scanout buffers (scattered ones) using whatever is supported (DMM
>> and single scatterlist entry on omap4 (and later), multiple-entry
>> scatterlist on omap3) or b) always require OMAP_BO_SCANOUT for BOs to
>> be exported and refuse to export if no such flag is set. I would say
>> b) is not a good option which leaves a) only.
>
> I think we should always require OMAP_BO_SCANOUT, or rather, drop the
> flag totally and always expect the buffer to be a scanout buffer. The
> only use for DSS is scanout, and those are the only buffers that omapdrm
> needs to support. But that would be breaking the uAPI, so I think we
> just have to support what we do now.
>
>> BTW, I think DMM is not really needed unless userspace requests
>> mmap(), in theory we can provide userspace with view through DMM but
>> give device drivers multiple entry scatterlist, potentially saving DMM
>> space.
>
> The userspace (CPU) doesn't need the DMM, the CPU has an MMU. I thought
> we already skip the DMM when mapping to the userspace. But in TILER case
> we always need TILER, even with the CPU.
>
>> I hope I made it clearer now why I think this feature shall be
>> implemented.
>
> I think it's just adding more wrong on top of the old wrong =).
>
> Also, if we need DMM/TILER allocations for other devices than DSS (but
> so far this hasn't been mentioned), then I think the DMM/TILER
> functionality should be separated from omapdrm and moved to (I think)
> dma-heap.
>
I see. Ok, it is not that much of an issue for me to carry one
out-of-tree patch if needed.
>>>>>> 2. Set exp_info.size = omap_gem_mmap_size(obj); when exporting a
>>>>>> BO. That way importer knows the real BO memory size (including
>>>>>> alignment etc) so he will be able to calculate the number of pages
>>>>>> he needs to map the scatterlist.
>>>>>
>>>>> Can you elaborate what this means?
>>>>>
>>>>
>>>> When we align to page, we shall report the size including the
>>>> alignment, no? Or, it is the importer that shall take care to
>>>> calculate BO size( including the alignment) based on scatterlist if
>>>> he needs to?
>>>
>>> I'm not sure... But I guess the export size should include the
>>> alignment.
>>>
>>
>> My understanding as well. Will sent that change as a part of page
>> alignment patch.
>>
>>> Hmm... I haven't had enough coffee yet, but how does this go... Let's
>>> say we have a tiled fb, and the width gets expanded to a page. What
>>> happens to reads/writes that happen outside the fb, but still within
>>> the page? Those should cause an error or do nothing, but is it
>>> possible that they go through TILER and get mapped to some real
>>> memory location?
>>>
>>
>> I lack the details here, but reading through TRM leaves me with the
>> impression that TILER smallest unit is a tile, and every tile is
>> backed by a real memory page (4KiB), so outside read-writes will end
>> up in memory that's there but unused and will do nothing.
>>
>> omap_gem_new() calls tiler_align(), which in turn seems to return
>> page-aligned size, so I think there is no issue here.
>
> Maybe, but, consider this example, with numbers totally out of thin air:
> We have a fb with the width of 32 pixels, so 128 bytes. If we have tiles
> which cover 32 x 32 pixels (so 4096 bytes with 4 bpp), we need one tile
> to cover the width. But we have all the rest of the page mapped, so 3968
> bytes that are not covered with a tile (or rather, we haven't configured
> that tile, or maybe the tile contains old configuration).
>
> I could be totally wrong here, as I don't remember the details. But I do
> think that it's very easy to get this wrong, creating memory corruptions
> and/or security violations.
>
I see what you mean and I think you are right. What about configuring
(if we use your made-up values) those 31 'unused' TILER entries to point
to the same page tile 1 points to? If we need more than one tile (say N)
to cover the width, apply the same logic, like, tile N+1 points to page
1, N+2 to page 2 and so on. Or, allocate one extra page and setup all
'unused' tiles to point to it? Both will guarantee no other memory can
be accessed, IIUC, though I prefer the 'overlap' workaround.
Also, we can have a single page used to back all the 'unused' tiler
entries, for all the TILER BOs, but with such an attributes that it is
not actually accessible (not sure how such a page shall be allocated
though), so if someone tries to access memory outside of the allowed
region and hits that page, we'll either have SEGFAULT or OOPS.
Ivo
Powered by blists - more mailing lists