Zuma/src at 0d8ef2d3b9cea849d83b4d96529576f28bae5c7c - Zuma

Zuma

History

Lioncash 0d8ef2d3b9 common/swap: Improve codegen of the default swap fallbacks Uses arithmetic that can be identified more trivially by compilers for optimizations. e.g. Rather than shifting the halves of the value and then swapping and combining them, we can swap them in place. e.g. for the original swap32 code on x86-64, clang 8.0 would generate: mov ecx, edi rol cx, 8 shl ecx, 16 shr edi, 16 rol di, 8 movzx eax, di or eax, ecx ret while GCC 8.3 would generate the ideal: mov eax, edi bswap eax ret now both generate the same optimal output. MSVC used to generate the following with the old code: mov eax, ecx rol cx, 8 shr eax, 16 rol ax, 8 movzx ecx, cx movzx eax, ax shl ecx, 16 or eax, ecx ret 0 Now MSVC also generates a similar, but equally optimal result as clang/GCC: bswap ecx mov eax, ecx ret 0 ==== In the swap64 case, for the original code, clang 8.0 would generate: mov eax, edi bswap eax shl rax, 32 shr rdi, 32 bswap edi or rax, rdi ret (almost there, but still missing the mark) while, again, GCC 8.3 would generate the more ideal: mov rax, rdi bswap rax ret now clang also generates the optimal sequence for this fallback as well. This is a case where MSVC unfortunately falls short, despite the new code, this one still generates a doozy of an output. mov r8, rcx mov r9, rcx mov rax, 71776119061217280 mov rdx, r8 and r9, rax and edx, 65280 mov rax, rcx shr rax, 16 or r9, rax mov rax, rcx shr r9, 16 mov rcx, 280375465082880 and rax, rcx mov rcx, 1095216660480 or r9, rax mov rax, r8 and rax, rcx shr r9, 16 or r9, rax mov rcx, r8 mov rax, r8 shr r9, 8 shl rax, 16 and ecx, 16711680 or rdx, rax mov eax, -16777216 and rax, r8 shl rdx, 16 or rdx, rcx shl rdx, 16 or rax, rdx shl rax, 8 or rax, r9 ret 0 which is pretty unfortunate.		2019-04-12 00:07:39 -04:00
..
audio_core	core/core_timing: Make callback parameters consistent	2019-03-24 18:12:17 -04:00
common	common/swap: Improve codegen of the default swap fallbacks	2019-04-12 00:07:39 -04:00
core	Merge pull request #1957 from DarkLordZach/title-provider	2019-04-09 19:16:37 -04:00
input_common	general: Use deducation guides for std::lock_guard and std::unique_lock	2019-04-01 12:53:47 -04:00
tests	kernel: Handle page table switching within MakeCurrentProcess()	2019-04-07 01:12:54 -04:00
video_core	Merge pull request #2354 from lioncash/header	2019-04-09 19:19:41 -04:00
web_service	general: Use deducation guides for std::lock_guard and std::unique_lock	2019-04-01 12:53:47 -04:00
yuzu	Merge pull request #1957 from DarkLordZach/title-provider	2019-04-09 19:16:37 -04:00
yuzu_cmd	Merge pull request #1957 from DarkLordZach/title-provider	2019-04-09 19:16:37 -04:00
.clang-format	Remove special rules for Windows.h and library includes	2016-09-21 00:16:33 -07:00
CMakeLists.txt	CMakeLists: Move off of modifying CMAKE_*-related flags	2019-03-17 06:55:24 -04:00