-
Notifications
You must be signed in to change notification settings - Fork 15
Image upload benchamark #238
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
| { | ||
| IGPUImage::SCreationParams imgParams{}; | ||
| imgParams.type = IImage::E_TYPE::ET_2D; | ||
| imgParams.extent.width = TILE_SIZE * 32; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should depend on your TILES_PER_FRAME.
|
|
||
| uint32_t bufferOffset = cmdBufIndex * partitionSize; | ||
| void* targetPtr = static_cast<uint8_t*>(mappedPtr) + bufferOffset; | ||
| memcpy(targetPtr, cpuSourceData, partitionSize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you need to check if the mapped pointer's buffer was created on HOST_COHERENT memory. if not, you need to flush the range to be available for the gpu
| } | ||
|
|
||
| private: | ||
| void transitionImageLayout( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know it's a PITA to have to write barriers, but trust me, if we could have a a better simple transition/barrier function, we would've made it.
what you can do is create a default one barrier. where your subresourceRange is fixed or image is fixed and start from modifying that each time. you're going to have different dst/src+access/stage mask combinations for different barriers each time
- Pro Tip: Enable synchronization validation by overriding
getAPIFeaturesToEnable()and look for possible missing barriers and sync from vulkan validation layer bugs.
virtual video::IAPIConnection::SFeatures getAPIFeaturesToEnable() override
{
auto retval = base_t::getAPIFeaturesToEnable();
retval.validations = true;
retval.synchronizationValidation = true;
return retval;
}| data[i] = g(); | ||
| } | ||
|
|
||
| auto regionsPerFrame = new IImage::SBufferCopy*[framesInFlight]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we're not going to blame you if you want to use std::vector
|
|
||
| uint64_t timelineValue = 0; | ||
|
|
||
| commandBuffers[0]->begin(IGPUCommandBuffer::USAGE::ONE_TIME_SUBMIT_BIT); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's ok to do this if the "end-purpose" of this image was to be a destination image and nothing else :D
but it's going to be used in the shaders as well. so what's the first thing you'll think of doing? transition to TRANSFER__DST for the copy and SHADER_READ_OPTIMAL after the copy. it makes sense for a single-upload image which you're going to use in GPU in many many future frames. but we're potentially uploading to this image every frame and it would be redudndant to do this image layout transitions each time.
that's why we asked you to keep the gpu image in GENERAL layout.
another note:
- pipeline barriers are not for image layout transitions only. you still need them to ensure correct synchronization and access to the resource.
so: keep the image as GENERAL layout + add 2 pipeline barriers (1 before the transfer, 1 after) the one after is a "Fake" or "mock" one for now. it's as if it's going to be read by a shader soon.
hint: the second barrier after the transfer should haveSHADER_READ_BITas access inFRAGMENTstage (with no layout transition, i.e GENERAL->GENERAL) - you'll most likely get a sync validation error (assuming you turned it on) if you don't have any pipeline barrier for this destination image
|
|
||
| commandBuffers[cmdBufIndex]->begin(IGPUCommandBuffer::USAGE::ONE_TIME_SUBMIT_BIT); | ||
| commandBuffers[cmdBufIndex]->copyBufferToImage( | ||
| stagingBuffer, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok good, single copy call for all tiles. remember to change this to GENERAL after https://github.com/Devsh-Graphics-Programming/Nabla-Examples-and-Tests/pull/238/files#r2646807113
| { | ||
| smart_refctd_ptr<ISemaphore> timelineSemaphore = m_device->createSemaphore(0); | ||
|
|
||
| auto commandPools = new smart_refctd_ptr<IGPUCommandPool>[framesInFlight]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ummm, just use the stack with MAX_FRAMES_IN_FLIGHT?
or again, nobody is going to blame you if you want to use a vector :D
| delete[] cpuSourceData; | ||
| for (uint32_t i = 0; i < framesInFlight; i++) | ||
| delete[] regionsPerFrame[i]; | ||
| delete[] regionsPerFrame; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I usually cringe when I see raw new/delete in the code. (PTSD from seeing legacy code)
depending on the situation just use:
- stack allocation type data[MAX_SIZE]
- vector for more variation and dyanmically sized stuff
- smart_refctd_ptr for nabla refcounted stuff (inherited from IRefCounted)
- unique_ptr for custom non nabla stuff
and take it easy. it's just an example/benchmark
Added simple Image uploading benchmark