From 227db66e4f478586416e7cdf54d411112c116461 Mon Sep 17 00:00:00 2001 From: Markus Wick Date: Sun, 25 Feb 2018 16:51:01 +0100 Subject: [PATCH] OGL: Use glBufferData on Mali. tl;dr: This PR speedups dolphin on mobiles with the Mali GPU and ES 3.2 drivers by a factor of 10 by using the method with the biggest overhead. Please keep care not to buy this shit! The ARM driver team seems to care very well about their customers. But bad luck, users and open source developers are *not* their customers. So even device-independent feature requests are just ignored for *years*: https://community.arm.com/graphics/f/discussions/4645/gl_ext_buffer_storage-support The bad point, they neither implement any of the other common ways to stream dynamic content in unextented GL: - They just ignore the GL_MAP_UNSYNCHRONIZED_BIT flag - They don't support on-device buffer updates and just stall with glBufferSubData It seems like no benchmark is using any dynamic content - and like no customer cares about anything but benchmarks, or users... We have a flag to disable the glBufferSubData way, this PR adds the flag to also disable the unsychronized mapping way. The second one is available since their ES 3.2 update, but slow as hell. So how to continue? The last remaining technical way to stream dynamic content at all is to alloc a new buffer per draw call with glBufferData. This is very gross, but still a factor 10 speedup compared to stalling the GPU. Small tests shows that you can expect another 3-5 times speedup with EXT_buffer_data, so Mali would be on pair with Adreno here. So if you have bought such a device unfortunately, please try to make noise on your vendor forums/support and ask for this extension. If you are going to buy a new mobile, I'd recormend to avoid *any* mobile with a Mali GPU in it. --- Source/Core/VideoBackends/OGL/StreamBuffer.cpp | 7 ++++++- Source/Core/VideoCommon/DriverDetails.cpp | 2 ++ Source/Core/VideoCommon/DriverDetails.h | 6 +++++- 3 files changed, 13 insertions(+), 2 deletions(-) diff --git a/Source/Core/VideoBackends/OGL/StreamBuffer.cpp b/Source/Core/VideoBackends/OGL/StreamBuffer.cpp index 366aa42965..eb7fc0a619 100644 --- a/Source/Core/VideoBackends/OGL/StreamBuffer.cpp +++ b/Source/Core/VideoBackends/OGL/StreamBuffer.cpp @@ -376,7 +376,12 @@ std::unique_ptr StreamBuffer::Create(u32 type, u32 size) // don't fall back to MapAnd* for Nvidia drivers if (DriverDetails::HasBug(DriverDetails::BUG_BROKEN_UNSYNC_MAPPING)) - return std::make_unique(type, size); + { + if (DriverDetails::HasBug(DriverDetails::BUG_BROKEN_BUFFER_STREAM)) + return std::make_unique(type, size); + else + return std::make_unique(type, size); + } // mapping fallback if (g_ogl_config.bSupportsGLSync) diff --git a/Source/Core/VideoCommon/DriverDetails.cpp b/Source/Core/VideoCommon/DriverDetails.cpp index 2885aea7e9..d280a6a9b4 100644 --- a/Source/Core/VideoCommon/DriverDetails.cpp +++ b/Source/Core/VideoCommon/DriverDetails.cpp @@ -82,6 +82,8 @@ static BugInfo m_known_bugs[] = { BUG_BROKEN_UNSYNC_MAPPING, -1.0, -1.0, true}, {API_OPENGL, OS_LINUX, VENDOR_NVIDIA, DRIVER_NVIDIA, Family::UNKNOWN, BUG_BROKEN_UNSYNC_MAPPING, -1.0, -1.0, true}, + {API_OPENGL, OS_ALL, VENDOR_ARM, DRIVER_ARM, Family::UNKNOWN, BUG_BROKEN_UNSYNC_MAPPING, -1.0, + -1.0, true}, {API_OPENGL, OS_WINDOWS, VENDOR_INTEL, DRIVER_INTEL, Family::UNKNOWN, BUG_INTEL_BROKEN_BUFFER_STORAGE, 101810.3907, 101810.3960, true}, {API_OPENGL, OS_ALL, VENDOR_ATI, DRIVER_ATI, Family::UNKNOWN, BUG_SLOW_GETBUFFERSUBDATA, -1.0, diff --git a/Source/Core/VideoCommon/DriverDetails.h b/Source/Core/VideoCommon/DriverDetails.h index 53f0c68739..df06a77d92 100644 --- a/Source/Core/VideoCommon/DriverDetails.h +++ b/Source/Core/VideoCommon/DriverDetails.h @@ -125,13 +125,17 @@ enum Bug // Intel HD 4000 series isn't affected by the bug BUG_PRIMITIVE_RESTART, // Bug: unsync mapping doesn't work fine - // Affected devices: Nvidia driver + // Affected devices: Nvidia driver, ARM Mali // Started Version: -1 // Ended Version: -1 // The Nvidia driver (both Windows + Linux) doesn't like unsync mapping performance wise. // Because of their threaded behavior, they seem not to handle unsync mapping complete unsync, // in fact, they serialize the driver which adds a much bigger overhead. // Workaround: Use BufferSubData + // The Mali behavior is even worse: They just ignore the unsychronized flag and stall the GPU. + // Workaround: As they were even too lazy to implement asynchronous buffer updates, + // BufferSubData stalls as well, so we have to use the slowest possible path: + // Alloc one buffer per draw call with BufferData. // TODO: some Windows AMD driver/GPU combination seems also affected // but as they all support pinned memory, it doesn't matter BUG_BROKEN_UNSYNC_MAPPING,