227 lines
8.5 KiB
Plaintext
227 lines
8.5 KiB
Plaintext
The following text is a brief overview of those key
|
|
principles which are useful to know when generating code
|
|
with SLJIT. Further details can be found in sljitLir.h.
|
|
|
|
----------------------------------------------------------------
|
|
What is SLJIT?
|
|
----------------------------------------------------------------
|
|
|
|
SLJIT is a platform independent assembler which
|
|
- provides access to common CPU features
|
|
- can be easily ported to wide-spread CPU
|
|
architectures (e.g. x86, ARM, POWER, MIPS, SPARC, s390x)
|
|
|
|
The key challenge of this project is finding a common
|
|
subset of CPU features which
|
|
- covers traditional assembly level programming
|
|
- can be translated to machine code efficiently
|
|
|
|
This aim is achieved by selecting those instructions / CPU
|
|
features which are either available on all platforms or
|
|
simulating them has a low performance overhead.
|
|
|
|
For example, some SLJIT instructions support base register
|
|
pre-update when [base+offs] memory accessing mode is used.
|
|
Although this feature is only available on ARM and POWER
|
|
CPUs, the simulation overhead is low on other CPUs.
|
|
|
|
----------------------------------------------------------------
|
|
The generic CPU model of SLJIT
|
|
----------------------------------------------------------------
|
|
|
|
The CPU has
|
|
- integer registers, which can store either an
|
|
int32_t (4 byte) or intptr_t (4 or 8 byte) value
|
|
- floating point registers, which can store either a
|
|
single (4 byte) or double (8 byte) precision value
|
|
- boolean status flags
|
|
|
|
*** Integer registers:
|
|
|
|
The most important rule is: when a source operand of
|
|
an instruction is a register, the data type of the
|
|
register must match the data type expected by an
|
|
instruction.
|
|
|
|
For example, the following code snippet
|
|
is a valid instruction sequence:
|
|
|
|
sljit_emit_op1(compiler, SLJIT_MOV32,
|
|
SLJIT_R0, 0, SLJIT_MEM1(SLJIT_R1), 0);
|
|
// An int32_t value is loaded into SLJIT_R0
|
|
sljit_emit_op1(compiler, SLJIT_NOT32,
|
|
SLJIT_R0, 0, SLJIT_R0, 0);
|
|
// the int32_t value in SLJIT_R0 is bit inverted
|
|
// and the type of the result is still int32_t
|
|
|
|
The next code snippet is not allowed:
|
|
|
|
sljit_emit_op1(compiler, SLJIT_MOV,
|
|
SLJIT_R0, 0, SLJIT_MEM1(SLJIT_R1), 0);
|
|
// An intptr_t value is loaded into SLJIT_R0
|
|
sljit_emit_op1(compiler, SLJIT_NOT32,
|
|
SLJIT_R0, 0, SLJIT_R0, 0);
|
|
// The result of SLJIT_NOT instruction
|
|
// is undefined. Even crash is possible
|
|
// (e.g. on MIPS-64).
|
|
|
|
However, it is always allowed to overwrite a
|
|
register regardless its previous value:
|
|
|
|
sljit_emit_op1(compiler, SLJIT_MOV,
|
|
SLJIT_R0, 0, SLJIT_MEM1(SLJIT_R1), 0);
|
|
// An intptr_t value is loaded into SLJIT_R0
|
|
sljit_emit_op1(compiler, SLJIT_MOV32,
|
|
SLJIT_R0, 0, SLJIT_MEM1(SLJIT_R2), 0);
|
|
// From now on SLJIT_R0 contains an int32_t
|
|
// value. The previous value is discarded.
|
|
|
|
Type conversion instructions are provided to convert an
|
|
int32_t value to an intptr_t value and vice versa. In
|
|
certain architectures these conversions are nops (no
|
|
instructions are emitted).
|
|
|
|
Memory accessing:
|
|
|
|
Registers arguments of SLJIT_MEM1 / SLJIT_MEM2 addressing
|
|
modes must contain intptr_t data.
|
|
|
|
Signed / unsigned values:
|
|
|
|
Most operations are executed in the same way regardless
|
|
the value is signed or unsigned. These operations have
|
|
only one instruction form (e.g. SLJIT_ADD / SLJIT_MUL).
|
|
Instructions where the result depends on the sign have
|
|
two forms (e.g. integer division, long multiply).
|
|
|
|
*** Floating point registers
|
|
|
|
Floating point registers can either contain a single
|
|
or double precision value. Similar to integer registers,
|
|
the data type of the value stored in a source register
|
|
must match the data type expected by the instruction.
|
|
Otherwise the result is undefined (even crash is possible).
|
|
|
|
Rounding:
|
|
|
|
Similar to standard C, floating point computation
|
|
results are rounded toward zero.
|
|
|
|
*** Boolean status flags:
|
|
|
|
Conditional branches usually depend on the value
|
|
of CPU status flags. These status flags are boolean
|
|
values and can be set by certain instructions.
|
|
|
|
To achive maximum efficiency and portability, the
|
|
following rules were introduced:
|
|
- Most instructions can freely modify these status
|
|
flags except if SLJIT_KEEP_FLAGS is passed.
|
|
- The SLJIT_KEEP_FLAGS option may have a performance
|
|
overhead, so it should only be used when necessary.
|
|
- The SLJIT_SET_E, SLJIT_SET_U, etc. options can
|
|
force an instruction to correctly set the
|
|
specified status flags. However, all other
|
|
status flags are undefined. This rule must
|
|
always be kept in mind!
|
|
- Status flags cannot be controlled directly
|
|
(there are no set/clear/invert operations)
|
|
|
|
The last two rules allows efficent mapping of status flags.
|
|
For example the arithmetic and multiply overflow flag is
|
|
mapped to the same overflow flag bit on x86. This is allowed,
|
|
since no instruction can set both of these flags. When
|
|
either of them is set by an instruction, the other can
|
|
have any value (this satisfies the "all other flags are
|
|
undefined" rule). Therefore mapping two SLJIT flags to the
|
|
same CPU flag is possible. Even though SLJIT supports
|
|
a dozen status flags, they can be efficiently mapped
|
|
to CPUs with only 4 status flags (e.g. ARM or SPARC).
|
|
|
|
----------------------------------------------------------------
|
|
Complex instructions
|
|
----------------------------------------------------------------
|
|
|
|
We noticed, that introducing complex instructions for common
|
|
tasks can improve performance. For example, compare and
|
|
branch instruction sequences can be optimized if certain
|
|
conditions apply, but these conditions depend on the target
|
|
CPU. SLJIT can do these optimizations, but it needs to
|
|
understand the "purpose" of the generated code. Static
|
|
instruction analysis has a large performance overhead
|
|
however, so we choose another approach: we introduced
|
|
complex instruction forms for certain non-atomic tasks.
|
|
SLJIT can optimize these "instructions" more efficiently
|
|
since the "purpose" is known to the compiler. These complex
|
|
instruction forms can often be assembled from other SLJIT
|
|
instructions, but we recommended to use them since the
|
|
compiler can optimize them on certain CPUs.
|
|
|
|
----------------------------------------------------------------
|
|
Generating functions
|
|
----------------------------------------------------------------
|
|
|
|
SLJIT is often used for generating function bodies which are
|
|
called from C. SLJIT provides two complex instructions for
|
|
generating function entry and return: sljit_emit_enter and
|
|
sljit_emit_return. The sljit_emit_enter also initializes the
|
|
"compiling context" which specify the current register mapping,
|
|
local space size, etc. configurations. The sljit_set_context
|
|
can also set this context without emitting any machine
|
|
instructions.
|
|
|
|
This context is important since it affects the compiler, so
|
|
the first instruction after a compiler is created must be
|
|
either sljit_emit_enter or sljit_set_context. The context can
|
|
be changed by calling sljit_emit_enter or sljit_set_context
|
|
again.
|
|
|
|
----------------------------------------------------------------
|
|
All-in-one building
|
|
----------------------------------------------------------------
|
|
|
|
Instead of using a separate library, the whole SLJIT
|
|
compiler infrastructure can be directly included:
|
|
|
|
#define SLJIT_CONFIG_STATIC 1
|
|
#include "sljitLir.c"
|
|
|
|
This approach is useful for single file compilers.
|
|
|
|
Advantages:
|
|
- Everything provided by SLJIT is available
|
|
(no need to include anything else).
|
|
- Configuring SLJIT is easy
|
|
(e.g. redefining SLJIT_MALLOC / SLJIT_FREE).
|
|
- The SLJIT compiler API is hidden from the
|
|
world which improves securtity.
|
|
- The C compiler can optimize the SLJIT code
|
|
generator (e.g. removing unused functions).
|
|
|
|
----------------------------------------------------------------
|
|
Types and macros
|
|
----------------------------------------------------------------
|
|
|
|
The sljitConfig.h contains those defines, which controls
|
|
the compiler. The beginning of sljitConfigInternal.h
|
|
lists architecture specific types and macros provided
|
|
by SLJIT. Some of these macros:
|
|
|
|
SLJIT_DEBUG : enabled by default
|
|
Enables assertions. Should be disabled in release mode.
|
|
|
|
SLJIT_VERBOSE : enabled by default
|
|
When this macro is enabled, the sljit_compiler_verbose
|
|
function can be used to dump SLJIT instructions.
|
|
Otherwise this function is not available. Should be
|
|
disabled in release mode.
|
|
|
|
SLJIT_SINGLE_THREADED : disabled by default
|
|
Single threaded programs can define this flag which
|
|
eliminates the pthread dependency.
|
|
|
|
sljit_sw, sljit_uw, etc. :
|
|
It is recommended to use these types instead of long,
|
|
intptr_t, etc. Improves readability / portability of
|
|
the code.
|