* doc/internals.texi: Add loud disclaimer. Refill to 79 columns, specify
fill-column in local-variables section. Change subheadings to subsections so they can be cross-referenced. Describe broken words, frags, frag chains, generic relaxation, relax table, m68k relaxation, m68k addressing modes, test suite code. Add a few words about various file formats.
This commit is contained in:
parent
7f390875ca
commit
ae6cd60f9e
1 changed files with 592 additions and 96 deletions
|
@ -1,7 +1,19 @@
|
|||
\input texinfo
|
||||
@setfilename internals.info
|
||||
@node Assembler Internals
|
||||
@chapter Assembler Internals
|
||||
@cindex internals
|
||||
|
||||
This documentation is not ready for prime time yet. Not even close. It's not
|
||||
so much documentation as random blathering of mine intended to be notes to
|
||||
myself that may eventually be turned into real documentation.
|
||||
|
||||
I take no responsibility for any negative effect it may have on your
|
||||
professional, personal, or spiritual life. Read it at your own risk. Caveat
|
||||
emptor. Delete before reading. Abandon all hope, ye who enter here.
|
||||
|
||||
However, enhancements will be gratefully accepted.
|
||||
|
||||
@menu
|
||||
* Data types:: Data types
|
||||
@end menu
|
||||
|
@ -17,162 +29,602 @@ BFD, MANY_SECTIONS, BFD_HEADERS
|
|||
@section Data types
|
||||
@cindex internals, data types
|
||||
|
||||
@subheading Symbols
|
||||
@subsection Symbols
|
||||
@cindex internals, symbols
|
||||
@cindex symbols, internal
|
||||
|
||||
... `local' symbols ... flags ...
|
||||
|
||||
The definition for @code{struct symbol}, also known as @code{symbolS},
|
||||
is located in @file{struc-symbol.h}. Symbol structures can contain the
|
||||
following fields:
|
||||
The definition for @code{struct symbol}, also known as @code{symbolS}, is
|
||||
located in @file{struc-symbol.h}. Symbol structures can contain the following
|
||||
fields:
|
||||
|
||||
@table @code
|
||||
@item sy_value
|
||||
This is an @code{expressionS} that describes the value of the symbol.
|
||||
It might refer to another symbol; if so, its true value may not be known
|
||||
until @code{foo} is run.
|
||||
This is an @code{expressionS} that describes the value of the symbol. It might
|
||||
refer to another symbol; if so, its true value may not be known until
|
||||
@code{foo} is called.
|
||||
|
||||
More generally, however, ... undefined? ... or an offset from the start
|
||||
of a frag pointed to by the @code{sy_frag} field.
|
||||
More generally, however, ... undefined? ... or an offset from the start of a
|
||||
frag pointed to by the @code{sy_frag} field.
|
||||
|
||||
@item sy_resolved
|
||||
This field is non-zero if the symbol's value has been completely
|
||||
resolved. It is used during the final pass over the symbol table.
|
||||
This field is non-zero if the symbol's value has been completely resolved. It
|
||||
is used during the final pass over the symbol table.
|
||||
|
||||
@item sy_resolving
|
||||
This field is used to detect loops while resolving the symbol's value.
|
||||
|
||||
@item sy_used_in_reloc
|
||||
This field is non-zero if the symbol is used by a relocation entry. If
|
||||
a local symbol is used in a relocation entry, it must be possible to
|
||||
redirect those relocations to other symbols, or this symbol cannot be
|
||||
removed from the final symbol list.
|
||||
This field is non-zero if the symbol is used by a relocation entry. If a local
|
||||
symbol is used in a relocation entry, it must be possible to redirect those
|
||||
relocations to other symbols, or this symbol cannot be removed from the final
|
||||
symbol list.
|
||||
|
||||
@item sy_next
|
||||
@itemx sy_previous
|
||||
These pointers to other @code{symbolS} structures describe a singly or
|
||||
doubly linked list. (If @code{SYMBOLS_NEED_BACKPOINTERS} is not
|
||||
defined, the @code{sy_previous} field will be omitted.) These fields
|
||||
should be accessed with @code{symbol_next} and @code{symbol_previous}.
|
||||
These pointers to other @code{symbolS} structures describe a singly or doubly
|
||||
linked list. (If @code{SYMBOLS_NEED_BACKPOINTERS} is not defined, the
|
||||
@code{sy_previous} field will be omitted.) These fields should be accessed
|
||||
with @code{symbol_next} and @code{symbol_previous}.
|
||||
|
||||
@item sy_frag
|
||||
This points to the @code{fragS} that this symbol is attached to.
|
||||
|
||||
@item sy_used
|
||||
Whether the symbol is used as an operand or in an expression. Note: Not
|
||||
all the backends keep this information accurate; backends which use this
|
||||
bit are responsible for setting it when a symbol is used in backend
|
||||
routines.
|
||||
Whether the symbol is used as an operand or in an expression. Note: Not all of
|
||||
the backends keep this information accurate; backends which use this bit are
|
||||
responsible for setting it when a symbol is used in backend routines.
|
||||
|
||||
@item bsym
|
||||
If @code{BFD_ASSEMBLER} is defined, this points to the @code{asymbol}
|
||||
that will be used in writing the object file.
|
||||
If @code{BFD_ASSEMBLER} is defined, this points to the @code{asymbol} that will
|
||||
be used in writing the object file.
|
||||
|
||||
@item sy_name_offset
|
||||
(Only used if @code{BFD_ASSEMBLER} is not defined.)
|
||||
This is the position of the symbol's name in the symbol table of the
|
||||
object file. On some formats, this will start at position 4, with
|
||||
position 0 reserved for unnamed symbols. This field is not used until
|
||||
@code{write_object_file} is called.
|
||||
(Only used if @code{BFD_ASSEMBLER} is not defined.) This is the position of
|
||||
the symbol's name in the symbol table of the object file. On some formats,
|
||||
this will start at position 4, with position 0 reserved for unnamed symbols.
|
||||
This field is not used until @code{write_object_file} is called.
|
||||
|
||||
@item sy_symbol
|
||||
(Only used if @code{BFD_ASSEMBLER} is not defined.)
|
||||
This is the format-specific symbol structure, as it would be written into
|
||||
the object file.
|
||||
(Only used if @code{BFD_ASSEMBLER} is not defined.) This is the
|
||||
format-specific symbol structure, as it would be written into the object file.
|
||||
|
||||
@item sy_number
|
||||
(Only used if @code{BFD_ASSEMBLER} is not defined.)
|
||||
This is a 24-bit symbol number, for use in constructing relocation table
|
||||
entries.
|
||||
(Only used if @code{BFD_ASSEMBLER} is not defined.) This is a 24-bit symbol
|
||||
number, for use in constructing relocation table entries.
|
||||
|
||||
@item sy_obj
|
||||
This format-specific data is of type @code{OBJ_SYMFIELD_TYPE}. If no
|
||||
macro by that name is defined in @file{obj-format.h}, this field is not
|
||||
defined.
|
||||
This format-specific data is of type @code{OBJ_SYMFIELD_TYPE}. If no macro by
|
||||
that name is defined in @file{obj-format.h}, this field is not defined.
|
||||
|
||||
@item sy_tc
|
||||
This processor-specific data is of type @code{TC_SYMFIELD_TYPE}. If no
|
||||
macro by that name is defined in @file{targ-cpu.h}, this field is not
|
||||
defined.
|
||||
This processor-specific data is of type @code{TC_SYMFIELD_TYPE}. If no macro
|
||||
by that name is defined in @file{targ-cpu.h}, this field is not defined.
|
||||
|
||||
@item TARGET_SYMBOL_FIELDS
|
||||
If this macro is defined, it defines additional fields in the symbol
|
||||
structure. This macro is obsolete, and should be replaced when possible
|
||||
by uses of @code{OBJ_SYMFIELD_TYPE} and @code{TC_SYMFIELD_TYPE}.
|
||||
If this macro is defined, it defines additional fields in the symbol structure.
|
||||
This macro is obsolete, and should be replaced when possible by uses of
|
||||
@code{OBJ_SYMFIELD_TYPE} and @code{TC_SYMFIELD_TYPE}.
|
||||
|
||||
@end table
|
||||
|
||||
Access with S_SET_SEGMENT, S_SET_VALUE, S_GET_VALUE, S_GET_SEGMENT,
|
||||
etc., etc.
|
||||
Access with S_SET_SEGMENT, S_SET_VALUE, S_GET_VALUE, S_GET_SEGMENT, etc., etc.
|
||||
|
||||
@foo Expressions
|
||||
@subsection Expressions
|
||||
@cindex internals, expressions
|
||||
@cindex expressions, internal
|
||||
|
||||
Expressions are stored as a combination of operator, symbols, blah.
|
||||
|
||||
@subheading Fixups
|
||||
@subsection Fixups
|
||||
@cindex internals, fixups
|
||||
@cindex fixups
|
||||
|
||||
@subheading Frags
|
||||
@subsection Frags
|
||||
@cindex internals, frags
|
||||
@cindex frags
|
||||
|
||||
@subheading Broken Words
|
||||
The frag is the basic unit for storing section contents.
|
||||
|
||||
@table @code
|
||||
|
||||
@item fr_address
|
||||
The address of the frag. This is not set until the assembler rescans the list
|
||||
of all frags after the entire input file is parsed. The function
|
||||
@code{relax_segment} fills in this field.
|
||||
|
||||
@item fr_next
|
||||
Pointer to the next frag in this (sub)section.
|
||||
|
||||
@item fr_fix
|
||||
Fixed number of characters we know we're going to emit to the output file. May
|
||||
be zero.
|
||||
|
||||
@item fr_var
|
||||
Variable number of characters we may output, after the initial @code{fr_fix}
|
||||
characters. May be zero.
|
||||
|
||||
@item fr_symbol
|
||||
@itemx fr_offset
|
||||
Foo.
|
||||
|
||||
@item fr_opcode
|
||||
Points to the lowest-addressed byte of the opcode, for use in relaxation.
|
||||
|
||||
@item line
|
||||
Holds line-number info.
|
||||
|
||||
@item fr_type
|
||||
Relaxation state. This field indicates the interpretation of @code{fr_offset},
|
||||
@code{fr_symbol} and the variable-length tail of the frag, as well as the
|
||||
treatment it gets in various phases of processing. It does not affect the
|
||||
initial @code{fr_fix} characters; they are always supposed to be output
|
||||
verbatim (fixups aside). See below for specific values this field can have.
|
||||
|
||||
@item fr_subtype
|
||||
Relaxation substate. If the macro @code{md_relax_frag} isn't defined, this is
|
||||
assumed to be an index into @code{md_relax_table} for the generic relaxation
|
||||
code to process. (@xref{Relaxation}.) If @code{md_relax_frag} is defined,
|
||||
this field is available for any use by the CPU-specific code.
|
||||
|
||||
@item align_mask
|
||||
@itemx align_offset
|
||||
These fields are not used yet. They are intended to keep track of the
|
||||
alignment of the current frag within its section, even if the exact offset
|
||||
isn't known. In many cases, we should be able to avoid creating extra frags
|
||||
when @code{.align} directives are given; instead, the number of bytes needed
|
||||
may be computable when the @code{.align} directive is processed. Hmm. Is this
|
||||
the right place for these, or should they be in the @code{frchainS} structure?
|
||||
|
||||
@item fr_pcrel_adjust
|
||||
@itemx fr_bsr
|
||||
These fields are only used in the NS32k configuration. But since @code{struct
|
||||
frag} is defined before the CPU-specific header files are included, they must
|
||||
unconditionally be defined.
|
||||
|
||||
@item fr_literal
|
||||
Declared as a one-character array, this last field grows arbitrarily large to
|
||||
hold the actual contents of the frag.
|
||||
|
||||
@end table
|
||||
|
||||
These are the possible relaxation states, provided in the enumeration type
|
||||
@code{relax_stateT}, and the interpretations they represent for the other
|
||||
fields:
|
||||
|
||||
@table @code
|
||||
|
||||
@item rs_align
|
||||
The start of the following frag should be aligned on some boundary. In this
|
||||
frag, @code{fr_offset} is the logarithm (base 2) of the alignment in bytes.
|
||||
(For example, if alignment on an 8-byte boundary were desired, @code{fr_offset}
|
||||
would have a value of 3.) The variable characters indicate the fill pattern to
|
||||
be used. (More than one?)
|
||||
|
||||
@item rs_broken_word
|
||||
This indicates that ``broken word'' processing should be done. @xref{Broken
|
||||
Words,,Broken Words}. If broken word processing is not necessary on the target
|
||||
machine, this enumerator value will not be defined.
|
||||
|
||||
@item rs_fill
|
||||
The variable characters are to be repeated @code{fr_offset} times. If
|
||||
@code{fr_offset} is 0, this frag has a length of @code{fr_fix}.
|
||||
|
||||
@item rs_machine_dependent
|
||||
Displacement relaxation is to be done on this frag. The target is indicated by
|
||||
@code{fr_symbol} and @code{fr_offset}, and @code{fr_subtype} indicates the
|
||||
particular machine-specific addressing mode desired. @xref{Relaxation}.
|
||||
|
||||
@item rs_org
|
||||
The start of the following frag should be pushed back to some specific offset
|
||||
within the section. (Some assemblers use the value as an absolute address; the
|
||||
@sc{gnu} assembler does not handle final absolute addresses, it requires that
|
||||
the linker set them.) The offset is given by @code{fr_symbol} and
|
||||
@code{fr_offset}; one character from the variable-length tail is used as the
|
||||
fill character.
|
||||
|
||||
@end table
|
||||
|
||||
A chain of frags is built up for each subsection. The data structure
|
||||
describing a chain is called a @code{frchainS}, and contains the following
|
||||
fields:
|
||||
|
||||
@table @code
|
||||
@item frch_root
|
||||
Points to the first frag in the chain. May be null if there are no frags in
|
||||
this chain.
|
||||
@item frch_last
|
||||
Points to the last frag in the chain, or null if there are none.
|
||||
@item frch_next
|
||||
Next in the list of @code{frchainS} structures.
|
||||
@item frch_seg
|
||||
Indicates the section this frag chain belongs to.
|
||||
@item frch_subseg
|
||||
Subsection (subsegment) number of this frag chain.
|
||||
@item fix_root, fix_tail
|
||||
(Defined only if @code{BFD_ASSEMBLER} is defined.) Point to first and last
|
||||
@code{fixS} structures associated with this subsection.
|
||||
@item frch_obstack
|
||||
Not currently used. Intended to be used for frag allocation for this
|
||||
subsection. This should reduce frag generation caused by switching sections.
|
||||
@end table
|
||||
|
||||
A @code{frchainS} corresponds to a subsection; each section has a list of
|
||||
@code{frchainS} records associated with it. In most cases, only one subsection
|
||||
of each section is used, so the list will only be one element long, but any
|
||||
processing of frag chains should be prepared to deal with multiple chains per
|
||||
section.
|
||||
|
||||
After the input files have been completely processed, and no more frags are to
|
||||
be generated, the frag chains are joined into one per section for further
|
||||
processing. After this point, it is safe to operate on one chain per section.
|
||||
|
||||
@node Broken Words
|
||||
@subsection Broken Words
|
||||
@cindex internals, broken words
|
||||
@cindex broken words
|
||||
@cindex promises, promises
|
||||
|
||||
The ``broken word'' idea derives from the fact that some compilers, including
|
||||
@code{gcc}, will sometimes emit switch tables specifying 16-bit @code{.word}
|
||||
displacements to branch targets, and branch instructions that load entries from
|
||||
that table to compute the target address. If this is done on a 32-bit machine,
|
||||
there is a chance (at least with really large functions) that the displacement
|
||||
will not fit in 16 bits. Thus the ``broken word'' idea is well named, since
|
||||
there is an implied promise that the 16-bit field will in fact hold the
|
||||
specified displacement.
|
||||
|
||||
If the ``broken word'' processing is enabled, and a situation like this is
|
||||
encountered, the assembler will insert a jump instruction into the instruction
|
||||
stream, close enough to be reached with the 16-bit displacement. This jump
|
||||
instruction will transfer to the real desired target address. Thus, as long as
|
||||
the @code{.word} value really is used as a displacement to compute an address
|
||||
to jump to, the net effect will be correct (minus a very small efficiency
|
||||
cost). If @code{.word} directives with label differences for values are used
|
||||
for other purposes, however, things may not work properly. I think there is a
|
||||
command-line option to turn on warnings when a broken word is discovered.
|
||||
|
||||
This code is turned off by the @code{WORKING_DOT_WORD} macro. It isn't needed
|
||||
if @code{.word} emits a value large enough to contain an address (or, more
|
||||
correctly, any possible difference between two addresses).
|
||||
|
||||
@node What Happens?
|
||||
@section What Happens?
|
||||
|
||||
Blah blah blah, initialization, argument parsing, file reading,
|
||||
whitespace munging, opcode parsing and lookup, operand parsing. Now
|
||||
it's time to write the output file.
|
||||
Blah blah blah, initialization, argument parsing, file reading, whitespace
|
||||
munging, opcode parsing and lookup, operand parsing. Now it's time to write
|
||||
the output file.
|
||||
|
||||
In @code{BFD_ASSEMBLER} mode, processing of relocations and symbols and
|
||||
creation of the output file is initiated by calling
|
||||
@code{write_object_file}.
|
||||
creation of the output file is initiated by calling @code{write_object_file}.
|
||||
|
||||
@node Target Dependent Definitions
|
||||
@section Target Dependent Definitions
|
||||
|
||||
@subheader Format-specific definitions
|
||||
@subsection Format-specific definitions
|
||||
|
||||
@defmac obj_sec_sym_ok_for_reloc section
|
||||
@defmac obj_sec_sym_ok_for_reloc (section)
|
||||
(@code{BFD_ASSEMBLER} only.)
|
||||
Is it okay to use this section's section-symbol in a relocation entry?
|
||||
If not, a new internal-linkage symbol is generated and emitted if such a
|
||||
relocation entry is needed. (Default: Always use a new symbol.)
|
||||
Is it okay to use this section's section-symbol in a relocation entry? If not,
|
||||
a new internal-linkage symbol is generated and emitted if such a relocation
|
||||
entry is needed. (Default: Always use a new symbol.)
|
||||
|
||||
@end defmac
|
||||
|
||||
@defmac obj_adjust_symtab
|
||||
(@code{BFD_ASSEMBLER} only.)
|
||||
If this macro is defined, it is invoked just before setting the symbol table of
|
||||
the output BFD. Any finalizing changes needed in the symbol table should be
|
||||
done here. For example, in the COFF support, if there is no @code{.file}
|
||||
symbol defined already, one is generated at this point. If no such adjustments
|
||||
are needed, this macro need not be defined.
|
||||
|
||||
@end defmac
|
||||
|
||||
@defmac EMIT_SECTION_SYMBOLS
|
||||
(@code{BFD_ASSEMBLER} only.)
|
||||
Should section symbols be included in the symbol list if they're used in
|
||||
relocations? Some formats can generate section-relative relocations,
|
||||
and thus don't need
|
||||
(Default: 1.)
|
||||
relocations? Some formats can generate section-relative relocations, and thus
|
||||
don't need symbols emitted for them. (Default: 1.)
|
||||
@end defmac
|
||||
|
||||
@defmac obj_frob_file
|
||||
Any final cleanup needed before writing out the BFD may be done here. For
|
||||
example, ECOFF formats (and MIPS ELF format) may do some work on the MIPS-style
|
||||
symbol table with its integrated debug information. The symbol table should
|
||||
not be modified at this time.
|
||||
@end defmac
|
||||
|
||||
@subsection CPU-specific definitions
|
||||
|
||||
@node Relaxation
|
||||
@subsubsection Relaxation
|
||||
@cindex Relaxation
|
||||
|
||||
If @code{md_relax_frag} isn't defined, the assembler will perform some
|
||||
relaxation on @code{rs_machine_dependent} frags based on the frag subtype and
|
||||
the displacement to some specified target address. The basic idea is that many
|
||||
machines have different addressing modes for instructions that can specify
|
||||
different ranges of values, with successive modes able to access wider ranges,
|
||||
including the entirety of the previous range. Smaller ranges are assumed to be
|
||||
more desirable (perhaps the instruction requires one word instead of two or
|
||||
three); if this is not the case, don't describe the smaller-range, inferior
|
||||
mode.
|
||||
|
||||
The @code{fr_subtype} and the field of a frag is an index into a CPU-specific
|
||||
relaxation table. That table entry indicates the range of values that can be
|
||||
stored, the number of bytes that will have to be added to the frag to
|
||||
accomodate the addressing mode, and the index of the next entry to examine if
|
||||
the value to be stored is outside the range accessible by the current
|
||||
addressing mode. The @code{fr_symbol} field of the frag indicates what symbol
|
||||
is to be accessed; the @code{fr_offset} field is added in.
|
||||
|
||||
If the @code{fr_pcrel_adjust} field is set, which currently should only happen
|
||||
for the NS32k family, the @code{TC_PCREL_ADJUST} macro is called on the frag to
|
||||
compute an adjustment to be made to the displacement.
|
||||
|
||||
The value fitted by the relaxation code is always assumed to be a displacement
|
||||
from the current frag. (More specifically, from @code{fr_fix} bytes into the
|
||||
frag.) This seems kinda silly. What about fitting small absolute values? I
|
||||
suppose @code{md_assemble} is supposed to take care of that, but if the operand
|
||||
is a difference between symbols, it might not be able to, if the difference was
|
||||
not computable yet.
|
||||
|
||||
The end of the relaxation sequence is indicated by a ``next'' value of 0. This
|
||||
is kinda silly too, since it means that the first entry in the table can't be
|
||||
used. I think -1 would make a more logical sentinel value.
|
||||
|
||||
The table @code{md_relax_table} from @file{targ-cpu.c} describes the relaxation
|
||||
modes available. Currently this must always be provided, even on machines for
|
||||
which this type of relaxation isn't possible or practical. Probably fewer than
|
||||
half the machines gas supports used it; it ought to be made conditional on some
|
||||
CPU-specific macro. Currently, also that table must be declared ``const;'' on
|
||||
some machines, though, it might make sense to keep it writeable, so it can be
|
||||
modified depending on which CPU of a family is specified. For example, in the
|
||||
m68k family, the 68020 has some addressing modes that are not available on the
|
||||
68000.
|
||||
|
||||
The relaxation table type contains these fields:
|
||||
|
||||
@table @code
|
||||
@item long rlx_forward
|
||||
Forward reach, must be non-negative.
|
||||
@item long rlx_backward
|
||||
Backward reach, must be zero or negative.
|
||||
@item rlx_length
|
||||
Length in bytes of this addressing mode.
|
||||
@item rlx_more
|
||||
Index of the next-longer relax state, or zero if there is no ``next''
|
||||
relax state.
|
||||
@end table
|
||||
|
||||
The relaxation is done in @code{relax_segment} in @file{write.c}. The
|
||||
difference in the length fields between the original mode and the one finally
|
||||
chosen by the relaxing code is taken as the size by which the current frag will
|
||||
be increased in size. For example, if the initial relaxing mode has a length
|
||||
of 2 bytes, and because of the size of the displacement, it gets upgraded to a
|
||||
mode with a size of 6 bytes, it is assumed that the frag will grow by 4 bytes.
|
||||
(The initial two bytes should have been part of the fixed portion of the frag,
|
||||
since it is already known that they will be output.) This growth must be
|
||||
effected by @code{md_convert_frag}; it should increase the @code{fr_fix} field
|
||||
by the appropriate size, and fill in the appropriate bytes of the frag.
|
||||
(Enough space for the maximum growth should have been allocated in the call to
|
||||
frag_var as the second argument.)
|
||||
|
||||
If relocation records are needed, they should be emitted by
|
||||
@code{md_estimate_size_before_relax}.
|
||||
|
||||
These are the machine-specific definitions associated with the relaxation
|
||||
mechanism:
|
||||
|
||||
@deftypefun int md_estimate_size_before_relax (fragS *@var{frag}, segT @var{sec})
|
||||
This function should examine the target symbol of the supplied frag and correct
|
||||
the @code{fr_subtype} of the frag if needed. When this function is called, if
|
||||
the symbol has not yet been defined, it will not become defined later; however,
|
||||
its value may still change if the section it is in gets relaxed.
|
||||
|
||||
Usually, if the symbol is in the same section as the frag (given by the
|
||||
@var{sec} argument), the narrowest likely relaxation mode is stored in
|
||||
@code{fr_subtype}, and that's that.
|
||||
|
||||
If the symbol is undefined, or in a different section (and therefore moveable
|
||||
to an arbitrarily large distance), the largest available relaxation mode is
|
||||
specified, @code{fix_new} is called to produce the relocation record,
|
||||
@code{fr_fix} is increased to include the relocated field (remember, this
|
||||
storage was allocated when @code{frag_var} was called), and @code{frag_wane} is
|
||||
called to convert the frag to an @code{rs_fill} frag with no variant part.
|
||||
Sometimes changing addressing modes may also require rewriting the instruction.
|
||||
It can be accessed via @code{fr_opcode} or @code{fr_fix}.
|
||||
|
||||
Sometimes @code{fr_var} is increased instead, and @code{frag_wane} is not
|
||||
called. I'm not sure, but I think this is to keep @code{fr_fix} referring to
|
||||
an earlier byte, and @code{fr_subtype} set to @code{rs_machine_dependent} so
|
||||
that @code{md_convert_frag} will get called.
|
||||
@end deftypefun
|
||||
|
||||
@deftypevar relax_typeS md_relax_table []
|
||||
This is the table.
|
||||
@end deftypevar
|
||||
|
||||
@defmac md_relax_frag (@var{frag})
|
||||
|
||||
This macro, if defined, overrides all of the processing described above. It's
|
||||
only defined for the MIPS target CPU, and there it doesn't do anything; it's
|
||||
used solely to disable the relaxing code and free up the @code{fr_subtype}
|
||||
field for use by the CPU-specific code.
|
||||
|
||||
@end defmac
|
||||
|
||||
@defmac tc_frob_file
|
||||
Like @code{obj_frob_file}, this macro handles miscellaneous last-minute
|
||||
cleanup. Currently only used on PowerPC/POWER support, for setting up a
|
||||
@code{.debug} section. This macro should not cause the symbol table to be
|
||||
modified.
|
||||
|
||||
@end defmac
|
||||
|
||||
@node Source File Summary
|
||||
@section Source File Summary
|
||||
|
||||
The code in the @file{obj-coff} back end assumes @code{BFD_ASSEMBLER} is
|
||||
defined; the code in @file{obj-coffbfd} uses @code{BFD},
|
||||
@code{BFD_HEADERS}, and @code{MANY_SEGMENTS}, but does a lot of the file
|
||||
positioning itself. This confusing situation arose from the history of
|
||||
the code.
|
||||
@subsection File Format Descriptions
|
||||
|
||||
@subheading a.out
|
||||
|
||||
The @code{a.out} format is described by @file{obj-aout.*}.
|
||||
|
||||
@subheading b.out
|
||||
|
||||
The @code{b.out} format, described by @file{obj-bout.*}, is similar to
|
||||
@code{a.out} format, except for a few additional fields in the file header
|
||||
describing section alignment and address.
|
||||
|
||||
@subheading COFF
|
||||
|
||||
Originally, @file{obj-coff} was a purely non-BFD version, and
|
||||
@file{obj-coffbfd} was created to use BFD for low-level byte-swapping.
|
||||
When the @code{BFD_ASSEMBLER} conversion started, the first COFF target
|
||||
to be converted was using @file{obj-coff}, and the two files had
|
||||
diverged somewhat, and I didn't feel like first converting the support
|
||||
of that target over to use the low-level BFD interface.
|
||||
@file{obj-coffbfd} was created to use BFD for low-level byte-swapping. When
|
||||
the @code{BFD_ASSEMBLER} conversion started, the first COFF target to be
|
||||
converted was using @file{obj-coff}, and the two files had diverged somewhat,
|
||||
and I didn't feel like first converting the support of that target over to use
|
||||
the low-level BFD interface.
|
||||
|
||||
Currently, all COFF targets use one of the two BFD interfaces, so the
|
||||
non-BFD code can be removed. Eventually, all should be converted to
|
||||
using one COFF back end, which uses the high-level BFD interface.
|
||||
So @file{obj-coff} got converted, and to simplify certain things,
|
||||
@file{obj-coffbfd} got ``merged'' in with a brute-force approach.
|
||||
Specifically, preprocessor conditionals testing for @code{BFD_ASSEMBLER}
|
||||
effectively split the @file{obj-coff} files into the two separate versions. It
|
||||
isn't pretty. They will be merged more thoroughly, and eventually only the
|
||||
higher-level interface will be used.
|
||||
|
||||
@subheading ECOFF
|
||||
|
||||
All ECOFF configurations use BFD for writing object files.
|
||||
|
||||
@subheading ELF
|
||||
|
||||
ELF is a fairly reasonable format, without many of the deficiencies the other
|
||||
object file formats have. (It's got some of its own, but not as bad as the
|
||||
others.) All ELF configurations use BFD for writing object files.
|
||||
|
||||
@subheading EVAX
|
||||
|
||||
This is the format used on VMS. Yes, someone has actually written BFD support
|
||||
for it. The code hasn't been integrated yet though.
|
||||
|
||||
@subheading HP300?
|
||||
|
||||
@subheading IEEE?
|
||||
|
||||
@subheading SOM
|
||||
|
||||
@subheading XCOFF
|
||||
|
||||
The XCOFF configuration is based on the COFF cofiguration (using the
|
||||
higher-level BFD interface). In fact, it uses the same files in the assembler.
|
||||
|
||||
@subheading VMS
|
||||
|
||||
This is the old Vax VMS support. It doesn't use BFD.
|
||||
|
||||
@subsection Processor Descriptions
|
||||
|
||||
Foo: a29k, alpha, h8300, h8500, hppa, i386, i860, i960, m68k, m88k, mips,
|
||||
ns32k, ppc, sh, sparc, tahoe, vax, z8k.
|
||||
|
||||
@node M68k
|
||||
@subsubsection M68k
|
||||
|
||||
The operand syntax handling is atrocious. There is no clear specification of
|
||||
the operand syntax. I'm looking into using a Bison grammar to replace much of
|
||||
it.
|
||||
|
||||
Operands on the 68k series processors can have two displacement values
|
||||
specified, plus a base register and a (possibly scaled) index register of which
|
||||
only some bits might be used. Thus a single 68k operand requires up to two
|
||||
expressions, two register numbers, and size and scale factors. The
|
||||
@code{struct m68k_op} type also includes a field indicating the mode of the
|
||||
operand, and an @code{error} field indicating a problem encountered while
|
||||
parsing the operand.
|
||||
|
||||
An instruction on the 68k may have up to 6 operands, although most of them have
|
||||
to be simple register operands. Up to 11 (16-bit) words may be required to
|
||||
express the instruction.
|
||||
|
||||
A @code{struct m68k_exp} expression contains an @code{expressionS}, pointers to
|
||||
the first and last characters of the input that produced the expression, an
|
||||
indication of the section to which the expression belongs, and a size field.
|
||||
I'm not sure what the size field describes.
|
||||
|
||||
@subsubheading M68k addressing modes
|
||||
|
||||
Many instructions used the low six bits of the first instruction word to
|
||||
describe the location of the operand, or how to compute the location. The six
|
||||
bits are typically split into three for a ``mode'' and three for a ``register''
|
||||
value. The interpretation of these values is as follows:
|
||||
|
||||
@example
|
||||
Mode Register Operand addressing mode
|
||||
0 Dn data register
|
||||
1 An address register
|
||||
2 An indirect
|
||||
3 An indirect, post-increment
|
||||
4 An indirect, pre-decrement
|
||||
5 An indirect with displacement
|
||||
6 An indirect with optional displacement and index;
|
||||
may involve multiple indirections and two
|
||||
displacements
|
||||
7 0 16-bit address follows
|
||||
7 1 32-bit address follows
|
||||
7 2 PC indirect with displacement
|
||||
7 3 PC indirect with optional displacements and index
|
||||
7 4 immediate 16- or 32-bit
|
||||
7 5,6,7 Reserved
|
||||
@end example
|
||||
|
||||
On the 68000 and 68010, support for modes 6 and 7.3 are incomplete; the
|
||||
displacement must fit in 8 bits, and no scaling or index suppression is
|
||||
permitted.
|
||||
|
||||
@subsubheading M68k relaxation modes
|
||||
|
||||
The relaxation modes used on the 68k are:
|
||||
|
||||
@table @code
|
||||
@item ABRANCH
|
||||
Case @samp{g} except when @code{BCC68000} is applicable.
|
||||
@item FBRANCH
|
||||
Coprocessor branches.
|
||||
@item PCREL
|
||||
Mode 7.2 -- program counter indirect with 16-bit displacement. This is
|
||||
available on all processors. Widens to 32-bit absolute. Used only if the
|
||||
original code used @code{ABSL} mode, and the CPU is not a 68000 or 68010.
|
||||
(Why? Those processors support mode 7.2.)
|
||||
@item BCC68000
|
||||
A conditional branch instruction, on the 68000 or 68010. These instructions
|
||||
support only 16-bit displacements on these processors. If a larger
|
||||
displacement is needed, the condition is negated and turned into a short branch
|
||||
around a jump instruction to the specified target. This jump will have an
|
||||
long absolute addressing mode.
|
||||
@item DBCC
|
||||
Like @code{BCC68000}, but for @code{dbCC} (decrement and branch on condition)
|
||||
instructions.
|
||||
@item PCLEA
|
||||
Not currently used?? Short form is mode 7.2 (program counter indirect, 16-bit
|
||||
displacement); long form is 7.3/0x0170 (program counter indirect, suppressed
|
||||
index register, 32-bit displacement). Used in progressive-930331 for mode
|
||||
@code{AOFF} with a PC-relative addressing mode and a displacement that won't
|
||||
fit in 16 bits, or which is variable and is not specified to have a size other
|
||||
than long.
|
||||
@item PCINDEX
|
||||
Newly added. PC indirect with index. An 8-bit displacement is supported on
|
||||
the 68000 and 68010, wider displacements on later processors.
|
||||
|
||||
Well, actually, I haven't added it yet. I need to soon, though. It fixes a
|
||||
bug reported by a customer.
|
||||
@end table
|
||||
|
||||
@subsection ``Emulation'' Descriptions
|
||||
|
||||
These are the @file{te-*.h} files.
|
||||
|
||||
@node Foo
|
||||
@section Foo
|
||||
|
@ -182,12 +634,12 @@ using one COFF back end, which uses the high-level BFD interface.
|
|||
@deftypefun int had_warnings (void)
|
||||
@deftypefunx int had_errors (void)
|
||||
|
||||
Returns non-zero if any warnings or errors, respectively, have been
|
||||
printed during this invocation.
|
||||
Returns non-zero if any warnings or errors, respectively, have been printed
|
||||
during this invocation.
|
||||
|
||||
@end deftypefun
|
||||
|
||||
@deftypefun void as_perror (const char *@var{gripe}, const char *@var{filename})
|
||||
@deftypefun void as_perror (const char *@var{gripe}, const char *@var{filename})
|
||||
|
||||
Displays a BFD or system error, then clears the error status.
|
||||
|
||||
|
@ -198,33 +650,77 @@ Displays a BFD or system error, then clears the error status.
|
|||
@deftypefunx void as_bad (const char *@var{format}, ...)
|
||||
@deftypefunx void as_fatal (const char *@var{format}, ...)
|
||||
|
||||
These functions display messages about something amiss with the input
|
||||
file, or internal problems in the assembler itself. The current file
|
||||
name and line number are printed, followed by the supplied message,
|
||||
formatted using @code{vfprintf}, and a final newline.
|
||||
These functions display messages about something amiss with the input file, or
|
||||
internal problems in the assembler itself. The current file name and line
|
||||
number are printed, followed by the supplied message, formatted using
|
||||
@code{vfprintf}, and a final newline.
|
||||
|
||||
An error indicated by @code{as_bad} will result in a non-zero exit status when
|
||||
the assembler has finished. Calling @code{as_fatal} will result in immediate
|
||||
termination of the assembler process.
|
||||
|
||||
@end deftypefun
|
||||
|
||||
@deftypefun void as_warn_where (char *@var{file}, unsigned int @var{line}, const char *@var{format}, ...)
|
||||
@deftypefunx void as_bad_where (char *@var{file}, unsigned int @var{line}, const char *@var{format}, ...)
|
||||
|
||||
These variants permit specification of the file name and line number,
|
||||
and are used when problems are detected when reprocessing information
|
||||
saved away when processing some earlier part of the file. For example,
|
||||
fixups are processed after all input has been read, but messages about
|
||||
fixups should refer to the original filename and line number that they
|
||||
are applicable to.
|
||||
These variants permit specification of the file name and line number, and are
|
||||
used when problems are detected when reprocessing information saved away when
|
||||
processing some earlier part of the file. For example, fixups are processed
|
||||
after all input has been read, but messages about fixups should refer to the
|
||||
original filename and line number that they are applicable to.
|
||||
|
||||
@end deftypefun
|
||||
|
||||
@deftypefun void fprint_value (FILE *@var{file}, valueT @var{val})
|
||||
@deftypefunx void sprint_value (char *@var{buf}, valueT @var{val})
|
||||
|
||||
These functions are helpful for converting a @code{valueT} value into
|
||||
printable format, in case it's wider than modes that @code{*printf} can
|
||||
handle. If the type is narrow enough, a decimal number will be
|
||||
produced; otherwise, it will be in hexadecimal (FIXME: currently without
|
||||
`0x' prefix). The value itself is not examined to make this
|
||||
determination.
|
||||
These functions are helpful for converting a @code{valueT} value into printable
|
||||
format, in case it's wider than modes that @code{*printf} can handle. If the
|
||||
type is narrow enough, a decimal number will be produced; otherwise, it will be
|
||||
in hexadecimal (FIXME: currently without `0x' prefix). The value itself is not
|
||||
examined to make this determination.
|
||||
|
||||
@end deftypefun
|
||||
|
||||
@node Writing a new target
|
||||
@section Writing a new target
|
||||
|
||||
@node Test suite
|
||||
@section Test suite
|
||||
@cindex test suite
|
||||
|
||||
The test suite is kind of lame for most processors. Often it only checks to
|
||||
see if a couple of files can be assembled without the assembler reporting any
|
||||
errors. For more complete testing, write a test which either examines the
|
||||
assembler listing, or runs @code{objdump} and examines its output. For the
|
||||
latter, the TCL procedure @code{run_dump_test} may come in handy. It takes the
|
||||
base name of a file, and looks for @file{@var{file}.d}. This file should
|
||||
contain as its initial lines a set of variable settings in @samp{#} comments,
|
||||
in the form:
|
||||
|
||||
@example
|
||||
#@var{varname}: @var{value}
|
||||
@end example
|
||||
|
||||
The @var{varname} may be @code{objdump}, @code{nm}, or @code{as}, in which case
|
||||
it specifies the options to be passed to the specified programs. Exactly one
|
||||
of @code{objdump} or @code{nm} must be specified, as that also specifies which
|
||||
program to run after the assembler has finished. If @var{varname} is
|
||||
@code{source}, it specifies the name of the source file; otherwise,
|
||||
@file{@var{file}.s} is used. If @var{varname} is @code{name}, it specifies the
|
||||
name of the test to be used in the @code{pass} or @code{fail} messages.
|
||||
|
||||
The non-commented parts of the file are interpreted as regular expressions, one
|
||||
per line. Blank lines in the @code{objdump} or @code{nm} output are skipped,
|
||||
as are blank lines in the @code{.d} file; the other lines are tested to see if
|
||||
the regular expression matches the program output. If it does not, the test
|
||||
fails.
|
||||
|
||||
Note that this means the tests must be modified if the @code{objdump} output
|
||||
style is changed.
|
||||
|
||||
@bye
|
||||
@c Local Variables:
|
||||
@c fill-column: 79
|
||||
@c End:
|
||||
|
|
Loading…
Add table
Reference in a new issue