st(1) ... st(7) will never be looked up in the hash table, so there's no
point inserting the entries. It's also not really necessary to do a 2nd
hash lookup after parsing the register number, nor is there a real
reason for having both st and st(0) entries. Plus we can easily do away
with the need for st to be first in the table.
For a long time there hasn't been a need anymore to keep together all
templates with identical mnemonics. Move the MOVQ and MOVABS ones next
to their MOV counterparts. Move the string forms of CMPSD and MOVSD next
to their CMPS / MOVS counterparts. Re-arrange what so fgar was the SSE3
section.
This makes reasonably obvious that MONITOR/MWAIT aren't suitable to
cover by CpuSSE3, but adjusting this is left for another time.
In commit 79dec6b7ba ("x86-64: optimize certain commutative
VEX-encoded insns") I missed the fact that there being subtraction
involved here doesn't matter, as absolute differences get summed up.
This way not only the overall (source) table size shrinks by quite a
bit and the risk of related templates going out of sync with one another
gets lowered, but also (dis)similarities between neighboring templates
become easier to spot.
Note that for certain SSE2AVX templates this results in benign attribute
changes:
- LDMXCSR and STMXCSR: NoAVX gets set,
- MOVMSKPS, PMOVMSKB, PEXTR{B,W} (register destination), and PINSR{B,W}
(register source): IgnoreSize and NoRex64 get set,
- CVT{DQ,PS}2PD, CVTSD2SS, MOVMSKPD, MOVDDUP, PMOV{S,Z}X{BW,WD,DQ}, and
ROUNDSD: NoRex64 gets set,
- CVTSS2SD, INSERTPS, PEXTRW (memory destination), PINSRW (memory
source), and PMOV{S,Z}X{BD,WQ,BQ}: IgnoreSize gets set.
Similarly the "normal" (non-SSE2AVX)
- non-64-bit CVTS{I,S}2SD forms get NoRex64 set,
- CMP{EQ,ORD,NEQ,UNORD}{P,S}{S,D} forms get C set,
all again in a benign way.
The remaining differences in the generated table are due to re-ordering
of entries in the course of being folded into templates.
Just like is already done for VEX/XOP/EVEX encoded insns, record the
encoding space information in the respective opcode modifier field. Do
this again without changing the source table, but rather by deriving the
values from their existing source representation.
While all of MMX, SSE, and SSE2 are included in "generic64", they can be
individually disabled. There are two MOVQ forms lacking respective
attributes. While the MMX one would get refused anyway (due to MMX
registers not recognized with .nommx), the assembler did happily accept
the SSE2 form. Add respective CPU settings to both, paralleling what the
MOVD counterparts have.
For INVLPGB the operand count was wrong (besides %edx there's also %ecx
which is an input to the insn). In this case I see little sense in
retaining the bogus 2-operand template. Plus swapping of the operands
wasn't properly suppressed for Intel syntax.
For PVALIDATE, RMPADJUST, and RMPUPDATE bogus single operand templates
were specified. These get retained, as the address operand is the only
one really needed to expressed non-default address size, but only for
compatibility reasons. Proper multi-operand insn get introduced and the
testcases get adjusted / extended accordingly.
While at it also drop the redundant definition of __amd64__ - we already
have x86_64 defined (or not) to distinguish 64-bit and non-64-bit cases.
In the majority of cases we can easily determine the length from the
encoding, irrespective of whether a prefix is specified there as well.
We further don't even need to record the value in the table entries, as
it's easy enough to determine it (without any guesswork, unless an insn
with major opcode 00 appeared that requires a 2nd opcode byte to be
specified explicitly) when installing the chosen template for further
processing.
Should an encoding appear which
- has a major opcode byte of 66, F3, or F2,
- requires a 2nd opcode byte to be specified explicitly,
- doesn't have a mandatory prefix
we'd need to convert all templates presently encoding a mandatory prefix
this way to the Prefix_0X<nn> model to eliminate the respective guessing
i386-gen does.
Just like is already done for legacy encoded insns, record the mandatory
prefix information in the respective opcode modifier field. Do this
without changing the source table, but rather by deriving the values from
their existing source representation.
This is in preparation of opcode_length going away as a field in the
templates. Identify pseudo prefixes by a base opcode of zero instead:
No real prefix has an opcode of zero. This at the same time allows
dropping a curious special case from i386-gen.
Since most attributes are identical for all pseudo prefixes, take the
opportunity and also template them.
In preparation to use PREFIX_0X<nn> attributes also in VEX/XOP/EVEX
encoding templates, renumber the pseudo-enumerators such that their
values can then also be used directly in the respective prefix bit
fields.
Commit 8b65b8953a ("x86: Remove the prefix byte from non-VEX/EVEX
base_opcode") used the opcodeprefix field for two distinct purposes. In
preparation of having VEX/XOP/EVEX and non-VEX templates become similar
in the representatioon of both encoding space and opcode prefixes, split
the field to have a separate one holding an insn's opcode space.
RepPrefixOk, HLEPrefixOk, and NoTrackPrefixOk can't be specified
together, so can share an enum-like field. IsLockable can be inferred
from HLE setting and hence only needs specifying when neither of them
is present.
Despite SYSEXIT being an Intel-only insn in long mode, its behavior
there is similar to SYSRET's: Depending on REX.W execution continues in
either 64-bit or compatibility mode. Hence distinguishing by suffix is
as necessary here as it is there.
CVTPI2PD with a memory operand, unlike CVTPI2PS, doesn't engage MMX
logic. Therefore it
- has a proper AVX equivalent (CVTDQ2PD) and hence can be subject to
SSE2AVX translation and SSE checking,
- should not record MMX use in the respective ELF note.
Rename VexOpcode to OpcodePrefix so that OpcodePrefix can be used for
regular encoding prefix.
gas/
* config/tc-i386.c (build_vex_prefix): Replace vexopcode with
opcodeprefix.
(build_evex_prefix): Likewise.
(is_any_vex_encoding): Don't check vexopcode.
(output_insn): Handle opcodeprefix.
opcodes/
* i386-gen.c (opcode_modifiers): Replace VexOpcode with
OpcodePrefix.
* i386-opc.h (VexOpcode): Renamed to ...
(OpcodePrefix): This.
(PREFIX_NONE): New.
(PREFIX_0X66): Likewise.
(PREFIX_0XF2): Likewise.
(PREFIX_0XF3): Likewise.
* i386-opc.tbl (Prefix_0X66): New.
(Prefix_0XF2): Likewise.
(Prefix_0XF3): Likewise.
Replace VexOpcode= with OpcodePrefix=. Use Prefix_0X66 on xorpd.
Use Prefix_0XF3 on cvtdq2pd. Use Prefix_0XF2 on cvtpd2dq.
* i386-tbl.h: Regenerated.
Just like other VEX-encoded scalar insns do.
Besides a testcase for this behavior also introduce one to verify that
XOP scalar insns don't honor -mavxscalar=256, as they don't ignore
XOP.L.
We check register-only source operand to decide if two source operands of
VEX encoded instructions should be swapped. But source operands in AMX
instructions with two source operands swapped are all register-only
operand. Add SwapSources to indicate two source operands should be
swapped.
gas/
* config/tc-i386.c (build_modrm_byte): Check vexswapsources to
swap two source operands.
opcodes/
* i386-gen.c (opcode_modifiers): Add VexSwapSources.
* i386-opc.h (VexSwapSources): New.
(i386_opcode_modifier): Add vexswapsources.
* i386-opc.tbl: Add VexSwapSources to BMI2 and BMI instructions
with two source operands swapped.
* i386-tbl.h: Regenerated.
These all follow an almost common pattern, again with the exception of
being commutative, which can be easily taken care of.
Note that, as an intended side effect (and in fact one of the reason to
introduce templates), AVX long-form pseudo-ops get introduced alongside
the already existing AVX512 ones.
In order to reduce redundancy as well as the chance of things going out
of sync (see a later patch for an example), make the opcode table
generator capable of recognizing and expanding templated templates. Use
the new capability for compacting the general purpose conditional insns.
Presumably as a result of various changes over the last several months,
and - for some of them - with a generalization of logic in
match_mem_size() plus mirroring of this generalization into the
broadcast handling logic of check_VecOperands(), various register-only
templates can be foled into their respective memory forms. This in
particular then also allows dropping a few more instances of IgnoreSize.
Even after commit dc2be329b9 ("i386: Only check suffix in instruction
mnemonic"), by which many of its uses have become unnecessary (some were
unnecessary even before), IgnoreSize is still used for various slightly
different purposes:
- to suppress emission of an operand size prefix,
- in Intel syntax mode to zap "derived" suffixes in certain cases and to
skip certain checks of remaining "derived" suffixes,
- to suppress ambiguous operand size / missing suffix diagnostics,
- for prefixes to suppress the "stand-alone ... prefix" warning.
Drop entirely unnecessary ones and where possible also replace instances
by the more focused (because of having just a single purpose) NoRex64.
To further restrict when IgnoreSize is needed, also generalize the logic
when to skip a template because of a present or derived L or Q suffix,
by skipping immediate operands. Additionally consider mask registers and
VecSIB there.
Note that for the time being the attribute needs to be kept in place on
MMX/SSE/etc insns (but not on VEX/EVEX encoded ones unless an operand
template of them allows for only non-SIMD-register actuals) allowing for
Dword operands - the logic when to emit a data size prefix would need
further adjustment first.
Note also that the memory forms of {,v}pinsrw get their permission for
an L or Q suffix dropped. I can only assume that it being this way was a
cut-and-paste mistake from the register forms, as the latter
specifically have NoRex64 set, and the {,v}pextrw counterparts don't
allow these suffixes either.
Convert VexW= again to their respective VexW* on lines touched anyway.
As of commit dc2be329b9 ("i386: Only check suffix in instruction
mnemonic") these have been accepted even with "qword ptr" operand size
specifier, but in 64-bit mode they're now wrongly having a REX prefix
(with REX.W set) emitted in this case. These aren't Intel syntax
mnemonics, so rather than fixing code generation, let's simply reject
them. As a result, the Qword attribute can then be dropped, too.
When the template specifies any of the possible VexW settings, we can
use this instead of a separate NoRex64 to suppress the setting of REX_W.
Note that this ends up addressing an inconsistency between VEX- and
EVEX-encoded VEXTRACTPS, VPEXTR{B,W}, and VPINSR{B,W} - while the former
avoided setting VEX.W, the latter pointlessly set EVEX.W when there is a
64-bit GPR operand. Adjust the testcase to cover both cases.
Convert VexW= to their respective VexW* on lines touched anyway.
It is almost entirely redundant with Size64, and the sole case (CRC32)
where direct replacement isn't possible can easily be taken care of in
another way.
For proper code generation in 16-bit mode (or to avoid the "same type of
prefix used twice" diagnostic there), IgnoreSize is needed on certain
templates allowing for just 32-(and maybe 64-)bit operands.
Beyond adding tests for the previously broken cases, also add ones for
the previously working cases where IgnoreSize is needed for the same
reason (leaving out MPX for now, as that'll require an assembler change
first). Some minor adjustments to tests get done such that re-use of the
same code for 16-bit code generation testing becomes easier.
Allowing 64-bit registers is misleading here: Elsewhere these get allowed
when there's no difference between either variant, because of 32-bit
destination registers having their upper halves zeroed in 64-bit mode.
Here, however, they're source registers, and hence specifying 64-bit
registers would lead to the ambiguity of whether the upper 32 bits
actually matter.
Additionally, for proper code generation in 16-bit mode, IgnoreSize is
needed on both.
And finally, just like for e.g. MONITOR/MWAIT, add variants with all
input registers explicitly specified.
According to gas manual, suffix in instruction mnemonics isn't always
required:
When there is no sizing suffix and no (suitable) register operands to
deduce the size of memory operands, with a few exceptions and where long
operand size is possible in the first place, operand size will default
to long in 32- and 64-bit modes.
This includes cvtsi2sd, cvtsi2ss, vcvtsi2sd, vcvtsi2ss, vcvtusi2sd and
vcvtusi2ss. Since they are used in GCC 8 and older GCC releases, they
must be allowed without suffix in AT&T syntax.
gas/
PR gas/25622
* testsuite/gas/i386/i386.exp: Run x86-64-default-suffix and
x86-64-default-suffix-avx.
* testsuite/gas/i386/noreg64.s: Remove cvtsi2sd, cvtsi2ss,
vcvtsi2sd, vcvtsi2ss, vcvtusi2sd and vcvtusi2ss entries.
* testsuite/gas/i386/noreg64.d: Updated.
* testsuite/gas/i386/noreg64.l: Likewise.
* testsuite/gas/i386/x86-64-default-suffix-avx.d: New file.
* testsuite/gas/i386/x86-64-default-suffix.d: Likewise.
* testsuite/gas/i386/x86-64-default-suffix.s: Likewise.
opcodes/
PR gas/25622
* i386-opc.tbl: Add IgnoreSize to cvtsi2sd, cvtsi2ss, vcvtsi2sd,
vcvtsi2ss, vcvtusi2sd and vcvtusi2ss for AT&T syntax.
* i386-tbl.h: Regenerated.
AMD ABM has 2 instructions: popcnt and lzcnt. ABM CPUID feature bit has
been reused for lzcnt and a POPCNT CPUID feature bit is added for popcnt
which used to be the part of SSE4.2. This patch removes CpuABM and adds
CpuPOPCNT. It changes ABM to enable both lzcnt and popcnt, changes SSE4.2
to also enable popcnt.
gas/
* config/tc-i386.c (cpu_arch): Add .popcnt.
* doc/c-i386.texi: Remove abm and .abm. Add popcnt and .popcnt.
Add a tab before @samp{.sse4a}.
opcodes/
* i386-gen.c (cpu_flag_init): Replace CpuABM with
CpuLZCNT|CpuPOPCNT. Add CpuPOPCNT to CPU_SSE4_2_FLAGS. Add
CPU_POPCNT_FLAGS.
(cpu_flags): Remove CpuABM. Add CpuPOPCNT.
* i386-opc.h (CpuABM): Removed.
(CpuPOPCNT): New.
(i386_cpu_flags): Remove cpuabm. Add cpupopcnt.
* i386-opc.tbl: Replace CpuABM|CpuSSE4_2 with CpuPOPCNT on
popcnt. Remove CpuABM from lzcnt.
* i386-init.h: Regenerated.
* i386-tbl.h: Likewise.
There don't really need to be separate Cpu64 and CpuNo64 templates for
these. One small issue with this is that slightly strange code
.intel_syntax noprefix
.code16
.arch i286
.arch .avx
vcvtsi2sd xmm0, xmm0, dword ptr [bx]
vcvtsi2sd xmm0, xmm0, qword ptr [bx]
vcvtsi2sd xmm0, xmm0, ebx
vcvtsi2sd xmm0, xmm0, rbx
now will match in behavior with the AVX512 counterparts in that not
only the 2nd vcvtsi2sd won't assemble, but also the first. The last
two, otoh, will continue to assemble fine (due to the lack of any
memory operand size specifier). As a result, another way to make
things behave more consistently would be to avoid the folding and
add IgnoreSize to the CpuNo64 AVX512 variants. A 3rd way to do so
would be to add Cpu386 to any such insn template.
While doing this also make the usual cosmetic adjustments for the
insns touched anyway. Additionally drop the redundant Cpu64 from
the SAE forms of VCVT{,U}SI2SD - they won't assemble outside of
64-bit mode due to there not being anything to match the Reg64
operand.