[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3ef2ed2d-f9bc-b75e-b28b-a926bbdc3f78@gmail.com>
Date: Tue, 16 Nov 2021 18:10:39 +0200
From: Ivaylo Dimitrov <ivo.g.dimitrov.75@...il.com>
To: Tomi Valkeinen <tomi.valkeinen@...asonboard.com>
Cc: matthijsvanduin@...il.com, airlied@...ux.ie, daniel@...ll.ch,
linux-kernel@...r.kernel.org, linux-omap@...r.kernel.org,
Laurent Pinchart <laurent.pinchart@...asonboard.com>,
Merlijn Wajer <merlijn@...zup.org>,
Carl Philipp Klemm <philipp@...s.xyz>
Subject: Re: [PATCH v2] drm: omapdrm: Export correct scatterlist for TILER
backed BOs
On 16.11.21 г. 13:12 ч., Ivaylo Dimitrov wrote:
> On 16.11.21 г. 12:20 ч., Tomi Valkeinen wrote:
>> On 16/11/2021 10:27, Ivaylo Dimitrov wrote:
>>> Hi,
>>>
>>> On 16.11.21 г. 8:42 ч., Tomi Valkeinen wrote:
>>>> On 15/11/2021 19:15, Ivaylo Dimitrov wrote:
>>>>> Hi,
>>>>>
>>>>> On 15.11.21 г. 17:37 ч., Tomi Valkeinen wrote:
>>>>>> On 15/11/2021 15:55, Ivaylo Dimitrov wrote:
>>>>>>> Hi Tomi,
>>>>>>>
>>>>>>> On 15.11.21 г. 10:42 ч., Tomi Valkeinen wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> On 13/11/2021 11:53, Ivaylo Dimitrov wrote:
>>>>>>>>> Memory of BOs backed by TILER is not contiguous, but
>>>>>>>>> omap_gem_map_dma_buf()
>>>>>>>>> exports it like it is. This leads to (possibly) invalid memory
>>>>>>>>> accesses if
>>>>>>>>> another device imports such a BO.
>>>>>>>>
>>>>>>>> This is one reason why TILER hasn't been officially supported.
>>>>>>>> But the above is not exactly right, or at least not the whole
>>>>>>>> truth.
>>>>>>>>
>>>>>>>
>>>>>>> Definitely, not only these BOs lie about their memory layout,
>>>>>>> they lie about size and alignment as well. I have 2 more patches
>>>>>>> here (one is to align TILER memory on page, as proposed by
>>>>>>> Matthijs in the other mail, the other to set the correct size
>>>>>>> when exporting TILER BO), but I wanted to hear from you first,
>>>>>>> like, what is the general trend :) .
>>>>>>
>>>>>> My thoughts here are that the current code doesn't work in
>>>>>> practice, so if you get it fixed, it's great =).
>>>>>>
>>>>>>> Also, I have another patch in mind, that will enable exporting of
>>>>>>> buffers that are not TILER backed, but are not CMA backed either.
>>>>>>> SGX for example does not need CMA memory to render to.
>>>>>>
>>>>>> What do you mean with this? DSS needs contiguous memory, so the
>>>>>> memory has to be 1) physically contiguous, 2) mapped with DMM or
>>>>>> 3) mapped with TILER. There's no reason for the driver to export
>>>>>> non-contiguous memory.
>>>>>>
>>>>>
>>>>> DSS yes, but, omapdrm is used to allocate non-scanout buffers as
>>>>> well, which do not need to be (and in practice are not) contiguous.
>>>>> GPU (or anyone with MMU) can render on them (DRI buffers for
>>>>> example) and later on those buffers can be copied (blit) to the
>>>>> framebuffer. Yes, not zero-copy, but if you're doing compositing,
>>>>> there is no option anyway.
>>>>>
>>>>> Exactly this is done by omap-video driver for example. GBM BOs are
>>>>> allocated through omapdrm as well.
>>>>
>>>> That is not correct and shouldn't be done. omapdrm is not a generic
>>>> memory allocator. We have real generic allocators, so those should
>>>> be used. Or, if the buffer is only used for a single device, the
>>>> buffer should be allocated from that device's driver.
>>>>
>>>
>>> Yes, I saw the comment in kernel headers that dumb buffers should not
>>> be used for rendering
>>> (https://elixir.bootlin.com/linux/latest/source/include/drm/drm_drv.h#L361).
>>> This makes no sense to me at all, but maybe I am missing the point.
>>
>> I believe that comments refers to another issue: a dumb buffer from
>> may not be usable for rendering. It's only guaranteed to be
>> readable/writable by the CPU.
>>
>> What I'm talking about is that a driver must support memory
>> allocations for buffers that the device handled by the driver can use.
>> In many cases that allocated buffer also works with other devices, and
>> thus dmabuf export/import can be used. But a driver supporting memory
>> allocations for buffers that the device itself cannot use is just wrong.
>>
>>> Also, it could be that the implementation of omap-video and/or PVR
>>> userspace blobs is against the specs, but I see omap-video calling
>>> DRM_IOCTL_OMAP_GEM_NEW for DRI buffers without OMAP_BO_SCANOUT and
>>> libdbm.so calling DRM_IOCTL_MODE_CREATE_DUMB to create buffers then
>>> used for rendering.
>>
>> I think neither of those are exactly material to be used as examples
>> on how to do things. And there's lots of history there. We didn't have
>> generic allocators back then.
>>
>>> This is not an issue on omap4 an later, because when export of that
>>> buffer is requested, omapdrm uses DMM and exports a single
>>> scatterlist entry, IIUC.
>>>
>>> But, on omap3, given there is no DMM, export is simply refused. I
>>> don't see that as a consistent behaviour - we shall either a) export
>>> non-scanout buffers (scattered ones) using whatever is supported (DMM
>>> and single scatterlist entry on omap4 (and later), multiple-entry
>>> scatterlist on omap3) or b) always require OMAP_BO_SCANOUT for BOs to
>>> be exported and refuse to export if no such flag is set. I would say
>>> b) is not a good option which leaves a) only.
>>
>> I think we should always require OMAP_BO_SCANOUT, or rather, drop the
>> flag totally and always expect the buffer to be a scanout buffer. The
>> only use for DSS is scanout, and those are the only buffers that
>> omapdrm needs to support. But that would be breaking the uAPI, so I
>> think we just have to support what we do now.
>>
>>> BTW, I think DMM is not really needed unless userspace requests
>>> mmap(), in theory we can provide userspace with view through DMM but
>>> give device drivers multiple entry scatterlist, potentially saving
>>> DMM space.
>>
>> The userspace (CPU) doesn't need the DMM, the CPU has an MMU. I
>> thought we already skip the DMM when mapping to the userspace. But in
>> TILER case we always need TILER, even with the CPU.
>>
>>> I hope I made it clearer now why I think this feature shall be
>>> implemented.
>>
>> I think it's just adding more wrong on top of the old wrong =).
>>
>> Also, if we need DMM/TILER allocations for other devices than DSS (but
>> so far this hasn't been mentioned), then I think the DMM/TILER
>> functionality should be separated from omapdrm and moved to (I think)
>> dma-heap.
>>
>
> I see. Ok, it is not that much of an issue for me to carry one
> out-of-tree patch if needed.
>
>>>>>>> 2. Set exp_info.size = omap_gem_mmap_size(obj); when exporting a
>>>>>>> BO. That way importer knows the real BO memory size (including
>>>>>>> alignment etc) so he will be able to calculate the number of
>>>>>>> pages he needs to map the scatterlist.
>>>>>>
>>>>>> Can you elaborate what this means?
>>>>>>
>>>>>
>>>>> When we align to page, we shall report the size including the
>>>>> alignment, no? Or, it is the importer that shall take care to
>>>>> calculate BO size( including the alignment) based on scatterlist if
>>>>> he needs to?
>>>>
>>>> I'm not sure... But I guess the export size should include the
>>>> alignment.
>>>>
>>>
>>> My understanding as well. Will sent that change as a part of page
>>> alignment patch.
>>>
>>>> Hmm... I haven't had enough coffee yet, but how does this go...
>>>> Let's say we have a tiled fb, and the width gets expanded to a page.
>>>> What happens to reads/writes that happen outside the fb, but still
>>>> within the page? Those should cause an error or do nothing, but is
>>>> it possible that they go through TILER and get mapped to some real
>>>> memory location?
>>>>
>>>
>>> I lack the details here, but reading through TRM leaves me with the
>>> impression that TILER smallest unit is a tile, and every tile is
>>> backed by a real memory page (4KiB), so outside read-writes will end
>>> up in memory that's there but unused and will do nothing.
>>>
>>> omap_gem_new() calls tiler_align(), which in turn seems to return
>>> page-aligned size, so I think there is no issue here.
>>
>> Maybe, but, consider this example, with numbers totally out of thin
>> air: We have a fb with the width of 32 pixels, so 128 bytes. If we
>> have tiles which cover 32 x 32 pixels (so 4096 bytes with 4 bpp), we
>> need one tile to cover the width. But we have all the rest of the page
>> mapped, so 3968 bytes that are not covered with a tile (or rather, we
>> haven't configured that tile, or maybe the tile contains old
>> configuration).
>>
>> I could be totally wrong here, as I don't remember the details. But I
>> do think that it's very easy to get this wrong, creating memory
>> corruptions and/or security violations.
>>
>
> I see what you mean and I think you are right. What about configuring
> (if we use your made-up values) those 31 'unused' TILER entries to point
> to the same page tile 1 points to? If we need more than one tile (say N)
> to cover the width, apply the same logic, like, tile N+1 points to page
> 1, N+2 to page 2 and so on. Or, allocate one extra page and setup all
> 'unused' tiles to point to it? Both will guarantee no other memory can
> be accessed, IIUC, though I prefer the 'overlap' workaround.
>
I see I am not the first one to come up with that 'extra page' idea:
https://elixir.bootlin.com/linux/v5.16-rc1/source/drivers/gpu/drm/omapdrm/omap_dmm_tiler.c#L349
. So, we already have that dummy_page, but it seems the loop in
dmm_txn_append() needs some massaging to back 'unused' tiles with that
dummy page.
I will send an updated patch.
> Also, we can have a single page used to back all the 'unused' tiler
> entries, for all the TILER BOs, but with such an attributes that it is
> not actually accessible (not sure how such a page shall be allocated
> though), so if someone tries to access memory outside of the allowed
> region and hits that page, we'll either have SEGFAULT or OOPS.
>
> Ivo
Powered by blists - more mailing lists