There are separate CPUID feature bits for fxsave/fxrstor and cmovCC
instructions. This patch adds CpuCMOV and CpuFXSR to replace Cpu686
on corresponding instructions.
gas/
* config/tc-i386.c (cpu_arch): Add .cmov and .fxsr.
(cpu_noarch): Add nocmov and nofxsr.
* doc/c-i386.texi: Document cmov and fxsr.
opcodes/
* i386-gen.c (cpu_flag_init): Add CpuCMOV and CpuFXSR to
CPU_I686_FLAGS. Add CPU_CMOV_FLAGS, CPU_FXSR_FLAGS,
CPU_ANY_CMOV_FLAGS and CPU_ANY_FXSR_FLAGS.
(cpu_flags): Add CpuCMOV and CpuFXSR.
* i386-opc.tbl: Replace Cpu686 with CpuFXSR on fxsave, fxsave64,
fxrstor and fxrstor64. Replace Cpu686 with CpuCMOV on cmovCC.
* i386-init.h: Regenerated.
* i386-tbl.h: Likewise.
There's no insn allowing ZEROING_MASKING alone. Re-purpose its value for
handling the not uncommon case of insns allowing either form of masking
with register operands, but only merging masking with a memory operand.
Expand Broadcast to 3 bits so that the number of bytes to broadcast
can be computed as 1 << (Broadcast - 1). Use it to simplify x86
assembler.
gas/
* config/tc-i386.c (Broadcast_Operation): Add bytes.
(build_evex_prefix): Use i.broadcast->bytes.
(match_broadcast_size): New function.
(check_VecOperands): Use the broadcast field to compute the
number of bytes to broadcast directly. Set i.broadcast->bytes.
Use match_broadcast_size.
opcodes/
* i386-gen.c (adjust_broadcast_modifier): New function.
(process_i386_opcode_modifier): Add an argument for operands.
Adjust the Broadcast value based on operands.
(output_i386_opcode): Pass operand_types to
process_i386_opcode_modifier.
(process_i386_opcodes): Pass NULL as operands to
process_i386_opcode_modifier.
* i386-opc.h (BYTE_BROADCAST): New.
(WORD_BROADCAST): Likewise.
(DWORD_BROADCAST): Likewise.
(QWORD_BROADCAST): Likewise.
(i386_opcode_modifier): Expand broadcast to 3 bits.
* i386-tbl.h: Regenerated.
Just like for their AVX counterparts and CVTSI2S{D,S}, a memory source
here is ambiguous and hence
- in source files should be qualified with a suitable suffix or operand
size specifier (not doing so is an error in Intel mode, and will gain
a diagnostic in AT&T mode in the future),
- in disassembly should be properly suffixed (the Intel operand size
specifiers were emitted correctly already).
When multiple (here: two) forms of an insn take different width inputs
but produce identical size outputs (here: RegXMM), the templates can be
combined.
Also drop IgnoreSize (and the now redundant size specifiers) wherever
applicable.
These are special because they may not have a register operand to derive
the vector length from, which requires to also deal with the braodcast
case when determining vector length in build_evex_prefix().
Also drop IgnoreSize (and the now redundant size specifiers) from their
suffixed counterparts.
After
commit 1b54b8d7e4
Author: Jan Beulich <jbeulich@novell.com>
Date: Mon Dec 18 09:36:14 2017 +0100
x86: fold RegXMM/RegYMM/RegZMM into RegSIMD
... qualified by their respective sizes, allowing to drop FirstXmm0 at
the same time.
folded RegXMM, RegYMM and RegZMM into RegSIMD, it's no longer impossible
to distinguish if Xmmword can represent a memory reference when operand
specification contains SIMD register. For example, template operands
specification like these
RegXMM|...|Xmmword|...
and
RegXMM|...
The Xmmword bitfield is always set by RegXMM which is represented by
"RegSIMD|Xmmword". This patch splits each of vcvtps2qq, vcvtps2uqq,
vcvttps2qq and vcvttps2uqq into 2 templates: one template only has
RegXMM source operand and the other only has mempry source operand.
gas/
PR gas/23418
* testsuite/gas/i386/xmmword.s: Add tests for vcvtps2qq,
vcvtps2uqq, vcvttps2qq and vcvttps2uqq.
* testsuite/gas/i386/xmmword.l: Updated.
opcodes/
PR gas/23418
* i386-opc.h (Byte): Update comments.
(Word): Likewise.
(Dword): Likewise.
(Fword): Likewise.
(Qword): Likewise.
(Tbyte): Likewise.
(Xmmword): Likewise.
(Ymmword): Likewise.
(Zmmword): Likewise.
* i386-opc.tbl: Split vcvtps2qq, vcvtps2uqq, vcvttps2qq and
vcvttps2uqq.
* i386-tbl.h: Regenerated.
Architecturally, MONITOR's and MONITORX'es memory operand is a 16- or
32-bit register outside of 64-bit mode, and a 64- or 32-bit register
inside 64-bit mode. The other register operands, including all of them
for MWAIT and MWAITX, are uniformly 32-bit, irrespective of mode. Retain
the original 64-bit MONITOR{,X} templates for compatibility only, and
fold the MWAIT{,X} ones.
First of all there's no point in having separate Cpu386 templates - the
respective SReg3 registers can't be specified for pre-386 anyway; see
parse_real_register().
And then we can also make use of D here for the memory forms of the
insn. This cannot be done for the non-64bit GPR forms because of the
IgnoreSize that cannot be dropped from the to-SREG variant.
There's little point carrying up to three templates per insn flavor
when the sole difference is operand size and the dependency on AVX512VL
being enabled. Instead the need for AVX512VL can be derived from an
operand allowing for ZMMword as well as one or both or XMMword and
YMMword (irrespective of whether this is a register or memory operand).
Without further abstraction to deal with the different Disp8MemShift
values between the templates, only a limited set (mostly ones only
allowing for non-memory operands) can be folded, which is being done
here.
Also drop IgnoreSize wherever possible from anything that's being
touched anyway.
It's not clear to me why they had been introduced - the respective
comments in opcodes/i386-gen.c are certainly wrong: ymm<N> registers
are very well supported (and necessary) with just AVX512F.
Since only the first 32 bits of input operand are used for tpause and
umwait, the REX.W bit is skipped. Both 32-bit registers and 64-bit
registers are allowed.
gas/
* testsuite/gas/i386/x86-64-waitpkg.s: Add 32-bit registers
tests for tpause and umwait.
* testsuite/gas/i386/x86-64-waitpkg-intel.d: Updated.
* testsuite/gas/i386/x86-64-waitpkg.d: Likewise.
opcodes/
* i386-dis.c (prefix_table): Replace Em with Edq on tpause and
umwait.
* i386-opc.tbl: Allow 32-bit registers for tpause and umwait in
64-bit mode.
* i386-tbl.h: Regenerated.
It again can be inferred from other information.
The vpopcntd templates all need to have Dword added to their memory
operands; the lack thereof was actually a bug preventing certain Intel
syntax code to assemble, so test cases get extended.
The wrong placement of the Load attribute in the templates prevented
this from working. The disassembler also didn't handle this consistently
with other similar dual-encoding insns.
While many templates allowing multiple suitably matching XMM/YMM/ZMM
operand sizes can be folded, a few need to be split in order to not
wrongly accept "xmmword ptr" operands when only XMM registers are
permitted (and memory operands are more narrow). Add a test case
validating this.
"clr reg" is an alias of "xor reg, reg". We can encode "clr reg64" as
"xor reg32, reg32".
gas/
* config/tc-i386.c (optimize_encoding): Also encode "clr reg64"
as "xor reg32, reg32".
* testsuite/gas/i386/x86-64-optimize-1.s: Add "clr reg64" tests.
* testsuite/gas/i386/x86-64-optimize-1.d: Updated.
opcodes/
* i386-opc.tbl: Add Optimize to clr.
* i386-tbl.h: Regenerated.
The differences between some of the register and memory forms of the
same insn often don't really require the templates to be separate. For
example, Disp8MemShift is simply irrelevant to register forms. Fold
these as far as possible, and also fold register-only forms. Further
folding is possible, but needs other prereq work done first.
A note regarding EVEXDYN: This is intended to be used only when no other
properties of the template would make is_evex_encoding() return true. In
all "normal" cases I think it is preferable to omit this indicator, to
keep the table half way readable.
Their memory forms were bogusly using VexLWP instead of VexNDD. Adjust
VexNDD handling to cope with these, allowing their register and memory
forms to be folded.
The differences between some of the register and memory forms of the
same insn often don't really require the templates to be separate. For
example, Disp8MemShift is simply irrelevant to register forms. Fold them
as far as possible. Further folding is possible, but needs other prereq
work done first.
They aren't really useful (anymore?): The conflicting operand size check
isn't applicable to any insn validly using respective memory operand
sizes (and if they're used wrongly, another error would result), and the
logic in process_suffix() can be easily changed to work without them.
While re-structuring conditionals in process_suffix() also drop the
CMPXCHG8B special case in favor of a NoRex64 attribute in the opcode
table.