xemu/target
Daniel Henrique Barboza bc0ec52eb2 target/riscv/vector_helper.c: skip set tail when vta is zero
The function is a no-op if 'vta' is zero but we're still doing a lot of
stuff in this function regardless. vext_set_elems_1s() will ignore every
single time (since vta is zero) and we just wasted time.

Skip it altogether in this case. Aside from the code simplification
there's a noticeable emulation performance gain by doing it. For a
regular C binary that does a vectors operation like this:

=======
 #define SZ 10000000

int main ()
{
  int *a = malloc (SZ * sizeof (int));
  int *b = malloc (SZ * sizeof (int));
  int *c = malloc (SZ * sizeof (int));

  for (int i = 0; i < SZ; i++)
    c[i] = a[i] + b[i];
  return c[SZ - 1];
}
=======

Emulating it with qemu-riscv64 and RVV takes ~0.3 sec:

$ time ~/work/qemu/build/qemu-riscv64 \
    -cpu rv64,debug=false,vext_spec=v1.0,v=true,vlen=128 ./foo.out

real    0m0.303s
user    0m0.281s
sys     0m0.023s

With this skip we take ~0.275 sec:

$ time ~/work/qemu/build/qemu-riscv64 \
    -cpu rv64,debug=false,vext_spec=v1.0,v=true,vlen=128 ./foo.out

real    0m0.274s
user    0m0.252s
sys     0m0.019s

This performance gain adds up fast when executing heavy benchmarks like
SPEC.

Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
Acked-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com>
Reviewed-by: Weiwei Li <liweiwei@iscas.ac.cn>
Message-Id: <20230427205708.246679-2-dbarboza@ventanamicro.com>
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
2023-06-13 16:35:02 +10:00
..
alpha accel/tcg: Introduce translator_io_start 2023-06-05 12:04:29 -07:00
arm target/arm: Only include tcg/oversized-guest.h if CONFIG_TCG 2023-06-07 08:35:13 -07:00
avr accel/tcg: Introduce translator_io_start 2023-06-05 12:04:29 -07:00
cris accel/tcg: Introduce translator_io_start 2023-06-05 12:04:29 -07:00
hexagon target/*: Add missing includes of exec/translation-block.h 2023-06-05 12:04:29 -07:00
hppa accel/tcg: Introduce translator_io_start 2023-06-05 12:04:29 -07:00
i386 hvf: add guest debugging handlers for Apple Silicon hosts 2023-06-06 10:19:30 +01:00
loongarch target/*: Add missing includes of exec/translation-block.h 2023-06-05 12:04:29 -07:00
m68k target/m68k/fpu_helper: Use FloatRelation enum to hold comparison result 2023-06-09 23:38:16 +03:00
microblaze accel/tcg: Introduce translator_io_start 2023-06-05 12:04:29 -07:00
mips target/*: Add missing includes of exec/translation-block.h 2023-06-05 12:04:29 -07:00
nios2 accel/tcg: Introduce translator_io_start 2023-06-05 12:04:29 -07:00
openrisc accel/tcg: Introduce translator_io_start 2023-06-05 12:04:29 -07:00
ppc target/ppc: Implement gathering irq statistics 2023-06-10 10:19:24 -03:00
riscv target/riscv/vector_helper.c: skip set tail when vta is zero 2023-06-13 16:35:02 +10:00
rx accel/tcg: Introduce translator_io_start 2023-06-05 12:04:29 -07:00
s390x * Fix emulated LCCB, LOCFHR, MXDB and MXDBR s390x instructions 2023-06-06 07:07:37 -07:00
sh4 accel/tcg: Introduce translator_io_start 2023-06-05 12:04:29 -07:00
sparc accel/tcg: Introduce translator_io_start 2023-06-05 12:04:29 -07:00
tricore target/tricore: Fix wrong PSW for call insns 2023-06-07 18:20:48 +02:00
xtensa accel/tcg: Introduce translator_io_start 2023-06-05 12:04:29 -07:00
Kconfig hw/loongarch: Add support loongson3 virt machine type. 2022-06-06 18:09:03 +00:00
meson.build target/loongarch: Add target build suport 2022-06-06 18:09:03 +00:00