aboutsummaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
* x86: Turn PLT32 to PC32 only for PC-relative relocationsH.J. Lu2024-11-091-1/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 292676c15a615b5a95bede9ee91004d3f7ee7dfd Author: H.J. Lu <hjl.tools@gmail.com> Date: Thu Feb 13 13:44:17 2020 -0800 x86: Resolve PLT32 reloc aganst local symbol to section resolved PLT32 relocation against local symbol to section and commit 2585b7a5ce5830e60a089aa2316a329558902f0c Author: H.J. Lu <hjl.tools@gmail.com> Date: Sun Jul 19 06:51:19 2020 -0700 x86: Change PLT32 reloc against section to PC32 turned PLT32 relocation against section into PC32 relocation. But these transformations are valid only for PC-relative relocations. Add fx_pcrel check for PC-relative relocations when performing these transformations to keep PLT32 relocation in `movq $foo@PLT, %rax`. gas/ PR gas/32196 * config/tc-i386.c (tc_i386_fix_adjustable): Return fixP->fx_pcrel for PLT32 relocations. (i386_validate_fix): Turn PLT32 relocation into PC32 relocation only if fixp->fx_pcrel is set. * testsuite/gas/i386/reloc32.d: Updated. * testsuite/gas/i386/reloc64.d: Likewise. * testsuite/gas/i386/reloc32.s: Add PR gas/32196 test. * testsuite/gas/i386/reloc64.s: Likewise. ld/ PR gas/32196 * testsuite/ld-x86-64/plt3.s: New file. * testsuite/ld-x86-64/x86-64.exp: Run plt3. Signed-off-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit 1b714c14e40f37ea8ea02a4998c4d95f25aff7f3) (cherry picked from commit ad2ce1e6457c6026de0bdce1391705445769581d)
* x86-64: Never make R_X86_64_GOT64 section relativeH.J. Lu2024-11-091-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | R_X86_64_GOT64 relocation should never be made section relative. Change tc_i386_fix_adjustable to return 0 for BFD_RELOC_X86_64_GOT64. gas/ PR gas/32189 * config/tc-i386.c (tc_i386_fix_adjustable): Return 0 for BFD_RELOC_X86_64_GOT64. * testsuite/gas/i386/reloc64.d: Updated. * testsuite/gas/i386/reloc64.s: Add more tests for R_X86_64_GOT64 and R_X86_64_GOTOFF64. ld/ PR gas/32189 * testsuite/ld-x86-64/x86-64.exp: Run PR gas/32189 test. * testsuite/ld-x86-64/pr32189.s: New file. Signed-off-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit 8015b1b0c1a1d3a581099c4855f95e4adfa3c0ad) (cherry picked from commit 68d5dbd315d5e4a1675e430fd9a9553a523f79d3)
* x86: correct .insn with opcode extension and VEX/XOP/EVEX encodingJan Beulich2024-08-201-1/+2
| | | | | | | | | | | | | | When VexVVVV handling was re-worked, .insn broke: When an opcode extension is in use, VexVVVV_DST needs using now, as ModR/M.reg is already occupied, matching what c8866e3ec5e2 ("x86: Drop using extension_opcode to encode vvvv register") did. While adding (bad) POP2 forms, also slightly adjust existing ones: No need to use XMM registers, and no need to specify %r8 when really %rax is meant twice (EVEX.vvvv not really being the culprit there, or else EVEX.V' would also have needed mentioning). (cherry picked from commit d13452d18a919ff56de6861b2e0ebeaaf3aa057a)
* x86: accept whitespace inside curly bracesJan Beulich2024-07-191-5/+18
| | | | | | | | | | | | | | | | | | Other than documented /**/ comments currently aren't really converted to a single space, at least not for x86 in its most common configurations. That'll be fixed subsequently, at which point blanks may appear where so far none were expected. Furthermore not permitting blanks immediately inside curly braces wasn't quite logical anyway - such constructs are composite ones, and hence components ought to have been permitted to be separated by whitespace from the very beginning. With this we also don't care anymore whether the scrubber would remove whitespace around curly braces, so move them from extra_symbol_chars[] to operand_special_chars[]. Note: The new testcase doesn't actually exercise much (if any) of the added code. It is being put in place to ensure that subsequently, when that code actually comes into play, behavior remains the same.
* x86: undo '{' being a symbol-start characterJan Beulich2024-07-191-14/+98
| | | | | | | | | | | | | | | | | | | | Having it that way has undue side effects, in permitting not only pseudo-prefixes to be parsed correctly, but also permitting odd symbol names which ought to be possible only when quoted. Borrow what other architectures do: Put in place an "unrecognized line" hook to parse off any pseudo prefixes, while using the "start of line" hook to reject ones not actually followed by an insn. For that parsing re-use parse_insn() in yet a slightly different mode (dealing with only pseudo-prefixes). With that, pp may no longer be cleared from init_globals(), but instead needs clearing after a line was fully processed. Since md_assemble() has pretty many return paths, convert that into a local helper, with a trivial wrapper around it. Similarly pp may no longer be updated (by check_register()) when processing anything other than insn operands. To be able to (easily) recognize the case, clear current_templates.start when done with an insn (or with .insn).
* x86: split pseudo-prefix state from i386_insnJan Beulich2024-07-191-196/+197
| | | | | | Subsequently we will want to update that ahead of md_assemble(), with that function needing to take into account such earlier updating. Therefore it'll want resetting separately from i.
* x86/APX: add CMPcc/CTESTcc cases to noreg64 testsJan Beulich2024-07-191-11/+29
| | | | | | | | | | | | | This was missed when support for the insns was added. Just like for DATA16, in rex64 neg (%rax) rex64 neg (%r16) rex64 {nf} neg (%rax) it is not logical why the last one shouldn't be permitted. Bypassing that check requires other adjustments, though, to actually properly consume (and then squash) the prefix.
* x86/APX: remove two inconsistenciesJan Beulich2024-07-121-18/+23
| | | | | | | | | | | | | | | | | | | As indicated in earlier discussion, permitting GOTTPOFF uniformly for all legacy non-SIMD insns while at the same time restricting to just certain ADD forms when EVEX-encoded is inconsistent. Make promoted insns "equal" to their legacy original ones. Doing that adjustment prevents another inconsistency, too: In data16 neg (%rax) data16 neg (%r16) data16 {nf} neg (%rax) it is not logical why the last one shouldn't be permitted. Bypassing that check requires other adjustments, though, to actually properly consume (and then squash) the data size prefix. While there also add the missing CMP and TEST cases to the test case being modified.
* x86/APX: correct TEST/CTESTcc with 1st operand being a memory oneJan Beulich2024-07-121-4/+8
| | | | | | While they properly inherited D and C, code processing the reversal of operands wasn't updated accordingly (and "reversed" operands also weren't tested anywhere).
* x86-64: Fix support for APX NF TLS IE with 2 operandsLingling Kong2024-07-051-3/+2
| | | | | | | | | | Added the restriction in assemble for APX TLS IE that the destination can only be a register. gas/ * config/tc-i386.c (md_assemble): Added stricter restrictions for APX TLS IE.
* gas: Enhance arch-specific SFrame configuration descriptionsJens Remus2024-07-041-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | Explicitly mention "SFrame" in the descriptions for the architecture- specific SFrame configuration macros, variables, and functions. Use the term "frame pointer" (FP) instead of "base pointer". This aligns with the terminology used in the SFrame specification. Additionally it helps not to confuse "base-pointer register" with the term "BASE_REG" used in the specification to denote either the SP or FP register. Specify what the SFRAME_CFA_*_REG register numbers are used for: - SP (stack pointer): CFA tracking - FP (frame pointer): CFA and FP tracking - RA (return address): RA tracking Align the descriptions for definitions in the source files to the declarations in the header files. gas/ * config/tc-aarch64.h: Enhance architecture-specific SFrame configuration descriptions. * config/tc-aarch64.c: Likewise. * config/tc-i386.h: Likewise. * config/tc-i386.c: Likewise. Signed-off-by: Jens Remus <jremus@linux.ibm.com>
* x86: Remove unused SFrame CFI RA register variableJens Remus2024-07-041-1/+0
| | | | | | | gas/ * config/tc-i386.c (x86_sframe_cfa_ra_reg): Remove. Signed-off-by: Jens Remus <jremus@linux.ibm.com>
* Support APX CFCMOVCui, Lili2024-07-041-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The CMOVcc instruction proposed by EVEX has four different forms, corresponding to the four possible combinations of EVEX.ND and EVEX.NF values. In the encoder part, when the CFCMOV template supports EVEX_NF, it means that it requires EVEX.NF to be 1. In the decoder part, CFCMOV_Fixup is used to reverse source and destination operands in the 2-operand case. gas/ChangeLog: * config/tc-i386.c (build_apx_evex_prefix): Set NF bit for cfcmov when the insn template supports EVEX_NF. * testsuite/gas/i386/x86-64-apx-inval.l: Add invalid tests for cfcmov. * testsuite/gas/i386/x86-64-apx-inval.s: Ditto. * testsuite/gas/i386/x86-64.exp: Add tests for cfcmov and cmov. * testsuite/gas/i386/x86-64-apx-cfcmov-intel.d: Ditto. * testsuite/gas/i386/x86-64-apx-cfcmov.d: Ditto. * testsuite/gas/i386/x86-64-apx-cfcmov.s: Ditto. opcodes/ChangeLog: * i386-dis-evex-prefix.h: Add cfcmov instructions. * i386-dis.c (CFCMOV_Fixup): Special handling of cfcmov. (putop): Print 'cf' for cfcmov instructions. * i386-opc.h (EVEX_NF): New. * i386-opc.tbl: Add cfcmov instructions. * i386-mnem.h: Regerated. * i386-tbl.h: Regerated.
* x86-64: Support APX NF TLS IE with 2 operandsLingling Kong2024-07-031-2/+8
| | | | | | | | | | | | | | | | | | | | | | | | | Support APX NF TLS IE with 2 operands.Verify it with ld and gold. gas/ * config/tc-i386.c (md_assemble): Allow APX NF TLS IE with 2 operands. * testsuite/gas/i386/x86-64-gottpoff.d: Updated. * testsuite/gas/i386/x86-64-gottpoff.s: Add APX NF TLS IE tests with 2 operands. gold/ * testsuite/x86_64_ie_to_le.s: Add APX NF TLS IE tests with 2 operands. * testsuite/x86_64_ie_to_le.sh: Updated. ld/ * testsuite/ld-x86-64/tlsbindesc.s: Add APX NF TLS IE tests with 2 operands. * testsuite/ld-x86-64/tlsbindesc.d: Updated. * testsuite/ld-x86-64/tlsbindesc.rd: Likewise.
* x86/APX: apply NDD-to-legacy transformation to further CMOVcc formsJan Beulich2024-06-281-1/+16
| | | | | With both sources being registers, these insns are almost commutative; the only extra adjustment needed is inversion of the encoded condition.
* x86/APX: extend TEST-by-imm7 optimization to CTESTccJan Beulich2024-06-281-2/+8
| | | | The same properties apply there.
* x86/APX: optimize {nf}-form IMUL-by-power-of-2 to SHLJan Beulich2024-06-281-0/+70
| | | | | | | | | | | | | | | ..., for differing only in the resulting EFLAGS, which are left untouched anyway. That's a shorter encoding, available as long as certain constraints on operands are met; see code comments. (SHL-by-1 forms may then be subject to further optimization that was introduced earlier.) Note that kind of as a side effect this also converts multiplication by 1 to shift by 0, which is a plain move or even no-op anyway. That could be further shrunk (as could be presence of shifts/rotates by 0 in the original code as well as a fair set of other {nf}-form insns), yet the expectation (for now) is that people won't write such code in the first place.
* x86-64: restrict by-imm31 optimizationJan Beulich2024-06-281-12/+15
| | | | | | | | | | | | Avoid changing the encoding when there's no size gain: If there's a REX or REX2 prefix anyway and the base opcode wouldn't be changed, dropping just REX.W / REX2.W has no (size) effect. (Same for the AND-by-imm7 case in the same big conditional.) While there also pull out the .qword check: For the 2-register-operands case whether that's done on the 1st or 2nd operand doesn't matter. Due to reduction in necessary parentheses this improves readability a tiny bit.
* x86/APX: optimize certain {nf}-form insns to LEAJan Beulich2024-06-281-8/+236
| | | | | | | | | | | | | | | ..., as that leaves EFLAGS untouched anyway. That's a shorter encoding, available as long as certain constraints on operand size and registers are met; see code comments. Note that this requires deferring to derive encoding_evex from {nf} presence, as in optimize_encoding() we want to avoid touching the insns when {evex} was also used. Note further that this requires want_disp32() to now also consider the opcode: We don't want to replace i.tm.mnem_off, for diagnostics to still report the original mnemonic (or else things can get confusing). While there, correct adjacent mis-indentation.
* x86/APX: optimize {nf}-form rotate-by-width-less-1Jan Beulich2024-06-281-1/+21
| | | | | | Unlike for the legacy forms, where there's a difference in the resulting EFLAGS.CF, for the NF variants the immediate can be got rid of in that case by switching to a 1-bit rotate in the opposite direction.
* x86/APX: optimize {nf} forms of ADD/SUB with specific immediatesJan Beulich2024-06-281-1/+83
| | | | | | | | Unlike for the legacy forms, where there's a difference in the resulting EFLAGS, for the NF variants we can safely replace ones using 0x80 by the respectively other insn while negating the immediate, saving 3 immediate bytes (just 1 though for 16-bit operand size). Similarly we can replace ones using 1 / -1 by INC/DEC (eliminating the immediate).
* x86: optimize {,V}PEXTR{D,Q} with immediate of 0Jan Beulich2024-06-211-0/+38
| | | | | Such are equivalent to simple moves, which are up to 3 bytes shorter to encode (and perhaps also cheaper to execute).
* x86: optimize left-shift-by-1Jan Beulich2024-06-211-0/+79
| | | | | | | | | | | | | | These can be replaced by adds when acting on a register operand. While for the scalar forms there's no gain in encoding size, ADD generally has higher throughput than SHL. EFLAGS set by ADD are a superset of those set by SHL (AF in particular is undefined there). For the SIMD cases the transformation also reduced code size, by eliminating the 1-byte immediate from the resulting encoding. Note that this transformation is not applied by gcc13 (according to my observations), so would - as of now - even improve compiler generated code.
* x86: %riz, %rip, and %eip don't require REXJan Beulich2024-06-211-2/+2
| | | | | | | | While these can't be used as register operands, they can be used for memory operand addressing. Such uses do not prevent conversion: The RegRex64 checks in check_Rex_required() for base and index registers were simply wrong. They specifically also aren't needed for byte registers, as those won't pass i386_index_check() anyway.
* x86: don't suppress errors when optimizingJan Beulich2024-06-211-1/+16
| | | | | | Blindly ignoring any mnemonic suffix can't be quite right: Bad suffix / operand combinations still want flagging. Simply avoid optimizing in such situations.
* Support APX CCMP and CTESTCui, Lili2024-06-181-1/+145
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | CCMP and CTEST are two new sets of instructions for conditional CMP and TEST, SCC and OSZC flags are given as suffixes of CCMP or CTEST in the instruction mnemonic, e.g.: ccmp<cc> { dfv=sf , cf , of } %eax, %ecx also add {evex} cmp/test %eax, %ecx as an alias for ccmpt. For the encoder part, add function check_Scc_OszcOperation to parse '{ dfv=of , sf, sf, cf}', store scc in the lower 4 bits of base_opcode, and adjust base_opcode to its normal meaning in install_template. For the decoder part, add 'SC' and 'DF' macros to add scc and oszc flags suffixes. gas/ChangeLog: * config/tc-i386.c (OSZC_CF): New. (OSZC_ZF): Ditto. (OSZC_SF): Ditto. (OSZC_OF): Ditto. (set_oszc_flags): Set oszc flags and report error for using the same oszc flags twice. (check_Scc_OszcOperations): Handle SCC OSZC flags. (install_template): Add scc and oszc_flags. (build_apx_evex_prefix): Encode SCC and oszc flags bits. (parse_insn): Handle check_Scc_OszcOperations. * testsuite/gas/i386/x86-64-apx-evex-promoted-bad.d: Add ivalid test case. * testsuite/gas/i386/x86-64-apx-evex-promoted-bad.s: Ditto. * testsuite/gas/i386/x86-64.exp: Add test for ccmp and ctest. * testsuite/gas/i386/x86-64-apx-ccmp-ctest-intel.d: New test. * testsuite/gas/i386/x86-64-apx-ccmp-ctest-inval.l: Ditto. * testsuite/gas/i386/x86-64-apx-ccmp-ctest-inval.s: Ditto. * testsuite/gas/i386/x86-64-apx-ccmp-ctest.d: Ditto. * testsuite/gas/i386/x86-64-apx-ccmp-ctest.s: Ditto. opcodes/ChangeLog: * i386-dis-evex-reg.h: Add ccmp and ctest. * i386-dis-evex.h: Ditto. * i386-dis.c (struct instr_info): add scc. (struct dis386): Add new micro 'NE','SC' and'DF'. (get_valid_dis386): Get scc value and move MAP4 invalid check to print_insn. (putop): Handle %NE, %SC and %DF. * i386-opc.h (SCC): New. * i386-opc.tbl: Add ccmp/ctest and evex format for cmp/test. * i386-mnem.h: Regenerated. * i386-tbl.h: Ditto.
* x86/APX: convert ZU to operand constraintJan Beulich2024-06-101-1/+5
| | | | | | | Extremely rarely used attributes are inefficient when represented by a separate attribute. Convert it to an operand constraint, as already suggested during review. The collision with RegKludge is pretty simple to resolve.
* x86: reduce check_{byte,word,long,qword}_reg() overheadJan Beulich2024-05-311-4/+15
| | | | | | | | These run after template matching. Therefore it is quite pointless for them to check all operands, when operand sizes matching across operands is already known. Exit the loops early in such cases. In check_byte_reg() also drop a long-stale part of a comment.
* x86/Intel: warn about undue mnemonic suffixesJan Beulich2024-05-291-0/+13
| | | | | | | | | | | Except for very few insns mnemonic suffixes aren't permitted in Intel syntax. Warn about such for now, indicating that they will be outright refused down the road. While fiddling with testcases to address fallout, drop a few things which should never have been tested as valid Intel syntax. Also add a previously missing line to simd-suffix.d.
* x86/Intel: SHLD/SHRD have dual meaningJan Beulich2024-05-291-2/+5
| | | | | | | | | | | | | | Since we uniformly permit D suffixes in Intel mode whenever in AT&T mode an L suffix may be used, we need to be consistent with this. Take the easy route, despite that still leading to an anomaly which is also visible from the new testcase: shld eax, ecx, 1 shld eax, ecx, cl can mean two things with APX: SHL with a D suffix in NDD EVEX encoding, or the traditional SHLD in legacy encoding.
* x86: simplify VexVVVV_SRC2 handling for the XOP caseJan Beulich2024-05-241-9/+5
| | | | | | | | As already suggested during review, rather than having an extra conditional in build_modrm_byte() (a code path used for quite a few more insns, including even certain GPR ones), adjust the attribute in the installed template to properly describe things with operands swapped.
* x86: simplify / consolidate check_{word,long,qword}_reg()Jan Beulich2024-05-241-16/+4
| | | | | | | | | | | | These run after template matching. Therefore operands are already known to match the template in use. With the loop bodies skipping anything not a GPR in the actual operands, there's therefore no need to check the template's operand type for permitting Reg or Accum. At the same time bring the three functions in sync for the "byte" part of the logic, as far as checking the template for other sizes (qword specifically) goes. Plus drop a stale comment from check_qword_reg(), when all three are now behaving the same in this regard.
* x86: correct VCVT{,U}SI2SDJan Beulich2024-05-241-5/+47
| | | | | | | | | | | | | | | Properly reject inappropriate suffixes (No_lSuf / No_qSuf mistakenly omitted by cf665fee1d6c ["x86: re-work AVX512 embedded rounding / SAE"]), to avoid emitting bad or arbitrarily guessed instructions. Interestingly check_{long,qword}_suffix() don't help here, which perhaps is another indication that the way they work right now isn't quite appropriate. Sadly correcting just the templates breaks operand ambiguity detection, since so far that worked from a single template permitting more than one suffix. Here we have ambiguity though which can now be noticed only when taking all (matching) templates together. Therefore we need to determine further matching templates (see code comments for constraints), to then accumulate permitted suffixes across all of them.
* Support APX zero-upperCui, Lili2024-05-221-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch is to enable ZU for IMUL (opcodes 0x69 and 0x6B) and SETcc. Since the spec only recommends one form of setzu, I won't be adding set<cc>reg32/reg64 support in this patch. gas/ChangeLog: * config/tc-i386.c (build_apx_evex_prefix): Handle ZU. * testsuite/gas/i386/x86-64.exp: Added new tests for ZU. * testsuite/gas/i386/x86-64.exp: Added new tests for ZU. * testsuite/gas/i386/x86-64-apx-zu-intel.d: New test. * testsuite/gas/i386/x86-64-apx-zu-inval.l: Ditto. * testsuite/gas/i386/x86-64-apx-zu-inval.s: Ditto. * testsuite/gas/i386/x86-64-apx-zu.d: Ditto. * testsuite/gas/i386/x86-64-apx-zu.s: Ditto. opcodes/ChangeLog: * i386-dis-evex-prefix.h: Handle PREFIX_EVEX_MAP4_40 ~ PREFIX_EVEX_MAP4_4F. * i386-dis-evex.h: Ditto. * i386-dis.c (struct dis386): Add new micro 'ZU'. (putop): Handle %ZU. * i386-gen.c: Added ZU. * i386-opc.h: Ditto. * i386-opc.tbl: Added new templates to support ZU.
* X86: Remove "i.rex" to eliminate extra conditional branchCui, Lili2024-05-221-1/+1
| | | | | | | | | Resulting code will do better without the extra conditional branch. Remove "i.rex" to eliminate extra conditional branch. gas/ChangeLog: * config/tc-i386.c (establish_rex): Remove i.rex.
* Add check for 8-bit old registers in EVEX formatCui, Lili2024-05-221-3/+4
| | | | | | | | | | | | Since APX supports EVEX from legacy instructions, we need to check the 8-bit old registers in EVEX format. And add test cases for it. gas/ChangeLog: * config/tc-i386.c (md_assemble): Add invalid check for old byte registers in EVEX format. * testsuite/gas/i386/x86-64-apx-inval.l: Add new test. * testsuite/gas/i386/x86-64-apx-inval.s: Ditto.
* x86: Split REX/REX2 old registers judgment.Cui, Lili2024-05-221-16/+14
| | | | | | | | | Split "REX/REX2 old register checking" and "add empty rex prefix" into two separate branches. gas/ChangeLog: * config/tc-i386.c (establish_rex): Split the judgments.
* x86: Drop using extension_opcode to encode vvvv registerCui, Lili2024-05-061-6/+3
| | | | | | | | | | | | | | gas/ChangeLog: * config/tc-i386.c (build_modrm_byte): Dropped the use of extension_opcode to encode the vvvv register. * testsuite/gas/i386/x86-64-sse2avx.d: Added new testcases. * testsuite/gas/i386/x86-64-sse2avx.s: Diito. opcodes/ChangeLog: * i386-opc.tbl: Added DstVVVV to some extension_opcode instructions. * i386-tbl.h: Regenerated.
* x86: Drop SwapSourcesCui, Lili2024-05-061-8/+11
| | | | | | | | | | | | | | | | | | | gas/ChangeLog: * config/tc-i386.c (build_modrm_byte): Dropped the use of SWAP_SOURCES to encode the vvvv register. opcodes/ChangeLog: * i386-opc.h (SWAP_SOURCES): Dropped. (NO_DEFAULT_MASK): Adjusted the value. (ADDR_PREFIX_OP_REG): Ditto. (DISTINCT_DEST): Ditto. (IMPLICIT_STACK_OP): Ditto. (VexVVVV_SRC2): New. * i386-opc.tbl: Dropped SwapSources and replaced its VexVVVV with Src1VVVV. * i386-tbl.h: Regenerated.
* x86: Use vexvvvv as the switch state to encode the vvvv registerCui, Lili2024-05-061-15/+17
| | | | | | | | | | | | | | | | | | | | Use vexvvvv as the switch state, and replace VexVVVV with Src1VVVV. Src1VVVV means using VEX.vvvv encodes the first source register operand. The old logic did not check vexvvvv first, which made the logic here very complicated. gas/ChangeLog: * config/tc-i386.c (optimize_encoding): Replaced 1 with Src1VVVV. (build_modrm_byte): Used vexvvvv to encode the vvvv register. (s_insn): Replaced 1 with Src1VVVV. opcodes/ChangeLog: * i386-opc.h (VexVVVV_DST): Adjusted the value. (Src1VVVV): New. * i386-opc.tbl: Replaced part VexVVVV with Src1VVVV. * i386-tbl.h: Regenerated.
* x86/APX: extend SSE2AVX coverageJan Beulich2024-05-031-2/+7
| | | | | | | | | | | | | Legacy encoded SIMD insns are converted to AVX ones in that mode. When eGPR-s are in use, i.e. with APX, convert to AVX10 insns (where available; there are quite a few which can't be converted). Note that LDDQU is represented as VMOVDQU32 (and the prior use of the sse3 template there needs dropping, to get the order right). Note further that in a few cases, due to the use of templates, AVX512VL is used when AVX512F would suffice. Since AVX10 is the main reference, this shouldn't be too much of a problem.
* x86/APX: Add invalid check for APX EVEX.X4.Cui, Lili2024-04-221-1/+4
| | | | | | | | | | | | | | gas/ChangeLog: * config/tc-i386.c (build_apx_evex_prefix): Added invalid check for APX X4. * testsuite/gas/i386/x86-64-apx-evex-promoted-bad.d: Added invalid testcase. * testsuite/gas/i386/x86-64-apx-evex-promoted-bad.s: Ditto. opcodes/ChangeLog: * i386-dis.c (get_valid_dis386): Added invalid check for APX X4.
* x86: Fix a memory leak in md_assembleH.J. Lu2024-04-161-5/+8
| | | | | | | | | | | | | | Fix a memory leak in md_assemble where copy may be cleared and may be the same as copy: if (copy && !mnem_suffix) { line = copy; copy = NULL; no_match: * config/tc-i386.c (md_assemble): Properly free the xstrdup memory.
* x86-64: Use long NOPs for Intel Core processorsH.J. Lu2024-04-101-5/+35
| | | | | | | | | | | | | | | | | | | | | | | Use long NOPs for Intel Core processors since they are faster than multiple NOPs. Don't use them for 64-bit processors by default since Intel Atom processors can only decode 4 prefixes in 1 cycle. * config/tc-i386.c (alt64_9): New. (alt64_10): Likewise. (alt64_11): Likewise. (alt64_12): Likewise. (alt64_13): Likewise. (alt64_14): Likewise. (alt64_15): Likewise. (alt64_patt): Likewise. (i386_generate_nops): Use alt64_patt for Intel Core processors in 64-bit mode. * testsuite/gas/i386/x86-64-nops-1-core2.d: Expect long NOPs. * testsuite/gas/i386/x86-64-nops-4-core2.d: Likewise. * testsuite/gas/i386/ilp32/x86-64-nops-1-core2.d: Replace ../x86-64-nops-1.d with ../x86-64-nops-1-core2.d. * testsuite/gas/i386/ilp32/x86-64-nops-4-core2.d: Replace ../x86-64-nops-4.d with ../x86-64-nops-4-core2.d.
* Support APX NFCui, Lili2024-04-071-4/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For the case when NDD and NF are both 0 in evex-promoted format, we will fully support and test it in another patch. gas/ChangeLog: * NEWS: Support Intel APX NF. * config/tc-i386.c (enum i386_error): Add unsupported_nf. (struct _i386_insn): Add has_nf. (is_apx_evex_encoding): Ditto. (build_apx_evex_prefix): Encode the NF bit. (md_assemble): Handle unsupported_nf. (parse_insn): Handle Prefix_NF and report bad for illegal combination. (can_convert_NDD_to_legacy): Replace i.tm.opcode_modifier.nf with i.has_nf. (match_template): Support D for APX_F insns and check NF support. * testsuite/gas/i386/x86-64-apx-evex-promoted-bad.d: Add bad test for NF bit. * testsuite/gas/i386/x86-64-apx-evex-promoted-bad.s: Ditto. * testsuite/gas/i386/x86-64-apx-inval.l: Ditto. * testsuite/gas/i386/x86-64-apx-inval.s: Ditto. * testsuite/gas/i386/x86-64.exp: Add apx nf tests. * testsuite/gas/i386/x86-64-apx-nf-intel.d: New test. * testsuite/gas/i386/x86-64-apx-nf.d: Ditto. * testsuite/gas/i386/x86-64-apx-nf.s: Ditto. opcodes/ChangeLog: * i386-dis-evex.h: Add %NF to the instructions that support APX NF and add new instruction imul, popcnt, tzcnt and lzcnt to EVEX table. * i386-dis-evex-reg.h: Ditto. * i386-dis.c (struct instr_info): Add nf. (struct dis386): Add "NF" for EVEX.NF. (get_valid_dis386): Set ins->vex.nf and report bad-nf for illegal case. (print_insn): Handle ins.vex.nf. (putop): Handle "%NF". * i386-opc.h (Prefix_NF): New. * i386-opc.tbl: Added new entries to support full APX NF instructions. * i386-mnem.h: Regenerated. * i386-tbl.h: Regenerated.
* x86/APX: Remove KEYLOCKER and SHA promotions from EVEX MAP4Cui, Lili2024-04-031-7/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | APX spec removed KEYLOCKER and SHA promotions from EVEX MAP4. https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html gas/ChangeLog: * NEWS: Mention that remove KEYLOCKER and SHA promotions from EVEX * MAP4. * config/tc-i386.c (process_operands): Removed special handling of * KEYLOCKER and SHA. * testsuite/gas/i386/x86-64-apx-egpr-promote-inval.l: Removed KEYLOCKER * and SHA instructions. * testsuite/gas/i386/x86-64-apx-egpr-promote-inval.s: Ditto. * testsuite/gas/i386/x86-64-apx-evex-promoted-bad.d: Ditto. * testsuite/gas/i386/x86-64-apx-evex-promoted-bad.s: Ditto. * testsuite/gas/i386/x86-64-apx-evex-promoted-intel.d: Ditto. * testsuite/gas/i386/x86-64-apx-evex-promoted-wig.d: Ditto. * testsuite/gas/i386/x86-64-apx-evex-promoted.d: Ditto. * testsuite/gas/i386/x86-64-apx-evex-promoted.s: Ditto. opcodes/ChangeLog: * i386-dis-evex-prefix.h: Removed KEYLOCKER and SHA instructions. * i386-dis-evex.h: Ditto. * i386-opc.tbl: Ditto. * i386-dis.c (print_vector_reg): Removed special handling of KEYLOCKER * and SHA.
* x86/SSE2AVX: move checkingJan Beulich2024-03-281-11/+10
| | | | | | | It has always been looking a little odd to me that this was done deep in cpu_flags_match(). Move it to match_template() itself - there's no need to do anything complex when encountering such a template while it cannot possibly be used.
* x86/SSE2AVX: respect prefixesJan Beulich2024-03-281-2/+3
| | | | | | | | 1) Without -msse2avx we unconditionally honor REX.W. Hence we ought to also do so with -msse2avx, converting to VEX.W. 2) {rex} doesn't prevent conversion to VEX encodings. Thus {rex2} shouldn't either.
* x86: fix Solaris testsuite failuresJan Beulich2024-03-221-6/+3
| | | | | | | | | | | | | For one 0afc614c9938 ("x86: Warn .insn instruction with length > 15 bytes") introduced a .insn use involving a slash; such tests need to have --divide passed to gas. And then 5bc71c2a6b8e ("x86-64: Add R_X86_64_CODE_6_GOTTPOFF") broke BFD_RELOC_X86_64_GOTTPOFF conversion to R_X86_64_CODE_4_GOTTPOFF, by adding respective code in a section guarded by generate_relax_relocations (the case of that not being required there was limited to 32-bit object files). Re-arrange that block of code to check generate_relax_relocations later.
* x86/APX: legacy promoted insns can't access %xmm16-%xmm31Jan Beulich2024-03-151-0/+7
| | | | | | | | | | | | | | Irrespective of the encoding being EVEX, the usable SIMD register range continues to be limited to %xmm0-%xmm15. Enforce this in gas (but continue to generate code, as in principle we know how to encode things) and recognize/flag the case in the disassembler. Oddly enough wrong forms were actually used in the testsuite (register- only forms are then really meaningless to test here, and are hence dropped instead of adjusted). Convert the POP2 test that needs touching anyway (due to a bad ModR/M byte having been chosen) to .insn.