* doc/internals.texi: Add loud disclaimer. Refill to 79 columns, specify
fill-column in local-variables section. Change subheadings to subsections so they can be cross-referenced. Describe broken words, frags, frag chains, generic relaxation, relax table, m68k relaxation, m68k addressing modes, test suite code. Add a few words about various file formats.
This commit is contained in:
parent
7f390875ca
commit
ae6cd60f9e
1 changed files with 592 additions and 96 deletions
|
@ -1,7 +1,19 @@
|
||||||
|
\input texinfo
|
||||||
|
@setfilename internals.info
|
||||||
@node Assembler Internals
|
@node Assembler Internals
|
||||||
@chapter Assembler Internals
|
@chapter Assembler Internals
|
||||||
@cindex internals
|
@cindex internals
|
||||||
|
|
||||||
|
This documentation is not ready for prime time yet. Not even close. It's not
|
||||||
|
so much documentation as random blathering of mine intended to be notes to
|
||||||
|
myself that may eventually be turned into real documentation.
|
||||||
|
|
||||||
|
I take no responsibility for any negative effect it may have on your
|
||||||
|
professional, personal, or spiritual life. Read it at your own risk. Caveat
|
||||||
|
emptor. Delete before reading. Abandon all hope, ye who enter here.
|
||||||
|
|
||||||
|
However, enhancements will be gratefully accepted.
|
||||||
|
|
||||||
@menu
|
@menu
|
||||||
* Data types:: Data types
|
* Data types:: Data types
|
||||||
@end menu
|
@end menu
|
||||||
|
@ -17,162 +29,602 @@ BFD, MANY_SECTIONS, BFD_HEADERS
|
||||||
@section Data types
|
@section Data types
|
||||||
@cindex internals, data types
|
@cindex internals, data types
|
||||||
|
|
||||||
@subheading Symbols
|
@subsection Symbols
|
||||||
@cindex internals, symbols
|
@cindex internals, symbols
|
||||||
@cindex symbols, internal
|
@cindex symbols, internal
|
||||||
|
|
||||||
... `local' symbols ... flags ...
|
... `local' symbols ... flags ...
|
||||||
|
|
||||||
The definition for @code{struct symbol}, also known as @code{symbolS},
|
The definition for @code{struct symbol}, also known as @code{symbolS}, is
|
||||||
is located in @file{struc-symbol.h}. Symbol structures can contain the
|
located in @file{struc-symbol.h}. Symbol structures can contain the following
|
||||||
following fields:
|
fields:
|
||||||
|
|
||||||
@table @code
|
@table @code
|
||||||
@item sy_value
|
@item sy_value
|
||||||
This is an @code{expressionS} that describes the value of the symbol.
|
This is an @code{expressionS} that describes the value of the symbol. It might
|
||||||
It might refer to another symbol; if so, its true value may not be known
|
refer to another symbol; if so, its true value may not be known until
|
||||||
until @code{foo} is run.
|
@code{foo} is called.
|
||||||
|
|
||||||
More generally, however, ... undefined? ... or an offset from the start
|
More generally, however, ... undefined? ... or an offset from the start of a
|
||||||
of a frag pointed to by the @code{sy_frag} field.
|
frag pointed to by the @code{sy_frag} field.
|
||||||
|
|
||||||
@item sy_resolved
|
@item sy_resolved
|
||||||
This field is non-zero if the symbol's value has been completely
|
This field is non-zero if the symbol's value has been completely resolved. It
|
||||||
resolved. It is used during the final pass over the symbol table.
|
is used during the final pass over the symbol table.
|
||||||
|
|
||||||
@item sy_resolving
|
@item sy_resolving
|
||||||
This field is used to detect loops while resolving the symbol's value.
|
This field is used to detect loops while resolving the symbol's value.
|
||||||
|
|
||||||
@item sy_used_in_reloc
|
@item sy_used_in_reloc
|
||||||
This field is non-zero if the symbol is used by a relocation entry. If
|
This field is non-zero if the symbol is used by a relocation entry. If a local
|
||||||
a local symbol is used in a relocation entry, it must be possible to
|
symbol is used in a relocation entry, it must be possible to redirect those
|
||||||
redirect those relocations to other symbols, or this symbol cannot be
|
relocations to other symbols, or this symbol cannot be removed from the final
|
||||||
removed from the final symbol list.
|
symbol list.
|
||||||
|
|
||||||
@item sy_next
|
@item sy_next
|
||||||
@itemx sy_previous
|
@itemx sy_previous
|
||||||
These pointers to other @code{symbolS} structures describe a singly or
|
These pointers to other @code{symbolS} structures describe a singly or doubly
|
||||||
doubly linked list. (If @code{SYMBOLS_NEED_BACKPOINTERS} is not
|
linked list. (If @code{SYMBOLS_NEED_BACKPOINTERS} is not defined, the
|
||||||
defined, the @code{sy_previous} field will be omitted.) These fields
|
@code{sy_previous} field will be omitted.) These fields should be accessed
|
||||||
should be accessed with @code{symbol_next} and @code{symbol_previous}.
|
with @code{symbol_next} and @code{symbol_previous}.
|
||||||
|
|
||||||
@item sy_frag
|
@item sy_frag
|
||||||
This points to the @code{fragS} that this symbol is attached to.
|
This points to the @code{fragS} that this symbol is attached to.
|
||||||
|
|
||||||
@item sy_used
|
@item sy_used
|
||||||
Whether the symbol is used as an operand or in an expression. Note: Not
|
Whether the symbol is used as an operand or in an expression. Note: Not all of
|
||||||
all the backends keep this information accurate; backends which use this
|
the backends keep this information accurate; backends which use this bit are
|
||||||
bit are responsible for setting it when a symbol is used in backend
|
responsible for setting it when a symbol is used in backend routines.
|
||||||
routines.
|
|
||||||
|
|
||||||
@item bsym
|
@item bsym
|
||||||
If @code{BFD_ASSEMBLER} is defined, this points to the @code{asymbol}
|
If @code{BFD_ASSEMBLER} is defined, this points to the @code{asymbol} that will
|
||||||
that will be used in writing the object file.
|
be used in writing the object file.
|
||||||
|
|
||||||
@item sy_name_offset
|
@item sy_name_offset
|
||||||
(Only used if @code{BFD_ASSEMBLER} is not defined.)
|
(Only used if @code{BFD_ASSEMBLER} is not defined.) This is the position of
|
||||||
This is the position of the symbol's name in the symbol table of the
|
the symbol's name in the symbol table of the object file. On some formats,
|
||||||
object file. On some formats, this will start at position 4, with
|
this will start at position 4, with position 0 reserved for unnamed symbols.
|
||||||
position 0 reserved for unnamed symbols. This field is not used until
|
This field is not used until @code{write_object_file} is called.
|
||||||
@code{write_object_file} is called.
|
|
||||||
|
|
||||||
@item sy_symbol
|
@item sy_symbol
|
||||||
(Only used if @code{BFD_ASSEMBLER} is not defined.)
|
(Only used if @code{BFD_ASSEMBLER} is not defined.) This is the
|
||||||
This is the format-specific symbol structure, as it would be written into
|
format-specific symbol structure, as it would be written into the object file.
|
||||||
the object file.
|
|
||||||
|
|
||||||
@item sy_number
|
@item sy_number
|
||||||
(Only used if @code{BFD_ASSEMBLER} is not defined.)
|
(Only used if @code{BFD_ASSEMBLER} is not defined.) This is a 24-bit symbol
|
||||||
This is a 24-bit symbol number, for use in constructing relocation table
|
number, for use in constructing relocation table entries.
|
||||||
entries.
|
|
||||||
|
|
||||||
@item sy_obj
|
@item sy_obj
|
||||||
This format-specific data is of type @code{OBJ_SYMFIELD_TYPE}. If no
|
This format-specific data is of type @code{OBJ_SYMFIELD_TYPE}. If no macro by
|
||||||
macro by that name is defined in @file{obj-format.h}, this field is not
|
that name is defined in @file{obj-format.h}, this field is not defined.
|
||||||
defined.
|
|
||||||
|
|
||||||
@item sy_tc
|
@item sy_tc
|
||||||
This processor-specific data is of type @code{TC_SYMFIELD_TYPE}. If no
|
This processor-specific data is of type @code{TC_SYMFIELD_TYPE}. If no macro
|
||||||
macro by that name is defined in @file{targ-cpu.h}, this field is not
|
by that name is defined in @file{targ-cpu.h}, this field is not defined.
|
||||||
defined.
|
|
||||||
|
|
||||||
@item TARGET_SYMBOL_FIELDS
|
@item TARGET_SYMBOL_FIELDS
|
||||||
If this macro is defined, it defines additional fields in the symbol
|
If this macro is defined, it defines additional fields in the symbol structure.
|
||||||
structure. This macro is obsolete, and should be replaced when possible
|
This macro is obsolete, and should be replaced when possible by uses of
|
||||||
by uses of @code{OBJ_SYMFIELD_TYPE} and @code{TC_SYMFIELD_TYPE}.
|
@code{OBJ_SYMFIELD_TYPE} and @code{TC_SYMFIELD_TYPE}.
|
||||||
|
|
||||||
@end table
|
@end table
|
||||||
|
|
||||||
Access with S_SET_SEGMENT, S_SET_VALUE, S_GET_VALUE, S_GET_SEGMENT,
|
Access with S_SET_SEGMENT, S_SET_VALUE, S_GET_VALUE, S_GET_SEGMENT, etc., etc.
|
||||||
etc., etc.
|
|
||||||
|
|
||||||
@foo Expressions
|
@subsection Expressions
|
||||||
@cindex internals, expressions
|
@cindex internals, expressions
|
||||||
@cindex expressions, internal
|
@cindex expressions, internal
|
||||||
|
|
||||||
Expressions are stored as a combination of operator, symbols, blah.
|
Expressions are stored as a combination of operator, symbols, blah.
|
||||||
|
|
||||||
@subheading Fixups
|
@subsection Fixups
|
||||||
@cindex internals, fixups
|
@cindex internals, fixups
|
||||||
@cindex fixups
|
@cindex fixups
|
||||||
|
|
||||||
@subheading Frags
|
@subsection Frags
|
||||||
@cindex internals, frags
|
@cindex internals, frags
|
||||||
@cindex frags
|
@cindex frags
|
||||||
|
|
||||||
@subheading Broken Words
|
The frag is the basic unit for storing section contents.
|
||||||
|
|
||||||
|
@table @code
|
||||||
|
|
||||||
|
@item fr_address
|
||||||
|
The address of the frag. This is not set until the assembler rescans the list
|
||||||
|
of all frags after the entire input file is parsed. The function
|
||||||
|
@code{relax_segment} fills in this field.
|
||||||
|
|
||||||
|
@item fr_next
|
||||||
|
Pointer to the next frag in this (sub)section.
|
||||||
|
|
||||||
|
@item fr_fix
|
||||||
|
Fixed number of characters we know we're going to emit to the output file. May
|
||||||
|
be zero.
|
||||||
|
|
||||||
|
@item fr_var
|
||||||
|
Variable number of characters we may output, after the initial @code{fr_fix}
|
||||||
|
characters. May be zero.
|
||||||
|
|
||||||
|
@item fr_symbol
|
||||||
|
@itemx fr_offset
|
||||||
|
Foo.
|
||||||
|
|
||||||
|
@item fr_opcode
|
||||||
|
Points to the lowest-addressed byte of the opcode, for use in relaxation.
|
||||||
|
|
||||||
|
@item line
|
||||||
|
Holds line-number info.
|
||||||
|
|
||||||
|
@item fr_type
|
||||||
|
Relaxation state. This field indicates the interpretation of @code{fr_offset},
|
||||||
|
@code{fr_symbol} and the variable-length tail of the frag, as well as the
|
||||||
|
treatment it gets in various phases of processing. It does not affect the
|
||||||
|
initial @code{fr_fix} characters; they are always supposed to be output
|
||||||
|
verbatim (fixups aside). See below for specific values this field can have.
|
||||||
|
|
||||||
|
@item fr_subtype
|
||||||
|
Relaxation substate. If the macro @code{md_relax_frag} isn't defined, this is
|
||||||
|
assumed to be an index into @code{md_relax_table} for the generic relaxation
|
||||||
|
code to process. (@xref{Relaxation}.) If @code{md_relax_frag} is defined,
|
||||||
|
this field is available for any use by the CPU-specific code.
|
||||||
|
|
||||||
|
@item align_mask
|
||||||
|
@itemx align_offset
|
||||||
|
These fields are not used yet. They are intended to keep track of the
|
||||||
|
alignment of the current frag within its section, even if the exact offset
|
||||||
|
isn't known. In many cases, we should be able to avoid creating extra frags
|
||||||
|
when @code{.align} directives are given; instead, the number of bytes needed
|
||||||
|
may be computable when the @code{.align} directive is processed. Hmm. Is this
|
||||||
|
the right place for these, or should they be in the @code{frchainS} structure?
|
||||||
|
|
||||||
|
@item fr_pcrel_adjust
|
||||||
|
@itemx fr_bsr
|
||||||
|
These fields are only used in the NS32k configuration. But since @code{struct
|
||||||
|
frag} is defined before the CPU-specific header files are included, they must
|
||||||
|
unconditionally be defined.
|
||||||
|
|
||||||
|
@item fr_literal
|
||||||
|
Declared as a one-character array, this last field grows arbitrarily large to
|
||||||
|
hold the actual contents of the frag.
|
||||||
|
|
||||||
|
@end table
|
||||||
|
|
||||||
|
These are the possible relaxation states, provided in the enumeration type
|
||||||
|
@code{relax_stateT}, and the interpretations they represent for the other
|
||||||
|
fields:
|
||||||
|
|
||||||
|
@table @code
|
||||||
|
|
||||||
|
@item rs_align
|
||||||
|
The start of the following frag should be aligned on some boundary. In this
|
||||||
|
frag, @code{fr_offset} is the logarithm (base 2) of the alignment in bytes.
|
||||||
|
(For example, if alignment on an 8-byte boundary were desired, @code{fr_offset}
|
||||||
|
would have a value of 3.) The variable characters indicate the fill pattern to
|
||||||
|
be used. (More than one?)
|
||||||
|
|
||||||
|
@item rs_broken_word
|
||||||
|
This indicates that ``broken word'' processing should be done. @xref{Broken
|
||||||
|
Words,,Broken Words}. If broken word processing is not necessary on the target
|
||||||
|
machine, this enumerator value will not be defined.
|
||||||
|
|
||||||
|
@item rs_fill
|
||||||
|
The variable characters are to be repeated @code{fr_offset} times. If
|
||||||
|
@code{fr_offset} is 0, this frag has a length of @code{fr_fix}.
|
||||||
|
|
||||||
|
@item rs_machine_dependent
|
||||||
|
Displacement relaxation is to be done on this frag. The target is indicated by
|
||||||
|
@code{fr_symbol} and @code{fr_offset}, and @code{fr_subtype} indicates the
|
||||||
|
particular machine-specific addressing mode desired. @xref{Relaxation}.
|
||||||
|
|
||||||
|
@item rs_org
|
||||||
|
The start of the following frag should be pushed back to some specific offset
|
||||||
|
within the section. (Some assemblers use the value as an absolute address; the
|
||||||
|
@sc{gnu} assembler does not handle final absolute addresses, it requires that
|
||||||
|
the linker set them.) The offset is given by @code{fr_symbol} and
|
||||||
|
@code{fr_offset}; one character from the variable-length tail is used as the
|
||||||
|
fill character.
|
||||||
|
|
||||||
|
@end table
|
||||||
|
|
||||||
|
A chain of frags is built up for each subsection. The data structure
|
||||||
|
describing a chain is called a @code{frchainS}, and contains the following
|
||||||
|
fields:
|
||||||
|
|
||||||
|
@table @code
|
||||||
|
@item frch_root
|
||||||
|
Points to the first frag in the chain. May be null if there are no frags in
|
||||||
|
this chain.
|
||||||
|
@item frch_last
|
||||||
|
Points to the last frag in the chain, or null if there are none.
|
||||||
|
@item frch_next
|
||||||
|
Next in the list of @code{frchainS} structures.
|
||||||
|
@item frch_seg
|
||||||
|
Indicates the section this frag chain belongs to.
|
||||||
|
@item frch_subseg
|
||||||
|
Subsection (subsegment) number of this frag chain.
|
||||||
|
@item fix_root, fix_tail
|
||||||
|
(Defined only if @code{BFD_ASSEMBLER} is defined.) Point to first and last
|
||||||
|
@code{fixS} structures associated with this subsection.
|
||||||
|
@item frch_obstack
|
||||||
|
Not currently used. Intended to be used for frag allocation for this
|
||||||
|
subsection. This should reduce frag generation caused by switching sections.
|
||||||
|
@end table
|
||||||
|
|
||||||
|
A @code{frchainS} corresponds to a subsection; each section has a list of
|
||||||
|
@code{frchainS} records associated with it. In most cases, only one subsection
|
||||||
|
of each section is used, so the list will only be one element long, but any
|
||||||
|
processing of frag chains should be prepared to deal with multiple chains per
|
||||||
|
section.
|
||||||
|
|
||||||
|
After the input files have been completely processed, and no more frags are to
|
||||||
|
be generated, the frag chains are joined into one per section for further
|
||||||
|
processing. After this point, it is safe to operate on one chain per section.
|
||||||
|
|
||||||
|
@node Broken Words
|
||||||
|
@subsection Broken Words
|
||||||
@cindex internals, broken words
|
@cindex internals, broken words
|
||||||
@cindex broken words
|
@cindex broken words
|
||||||
@cindex promises, promises
|
@cindex promises, promises
|
||||||
|
|
||||||
|
The ``broken word'' idea derives from the fact that some compilers, including
|
||||||
|
@code{gcc}, will sometimes emit switch tables specifying 16-bit @code{.word}
|
||||||
|
displacements to branch targets, and branch instructions that load entries from
|
||||||
|
that table to compute the target address. If this is done on a 32-bit machine,
|
||||||
|
there is a chance (at least with really large functions) that the displacement
|
||||||
|
will not fit in 16 bits. Thus the ``broken word'' idea is well named, since
|
||||||
|
there is an implied promise that the 16-bit field will in fact hold the
|
||||||
|
specified displacement.
|
||||||
|
|
||||||
|
If the ``broken word'' processing is enabled, and a situation like this is
|
||||||
|
encountered, the assembler will insert a jump instruction into the instruction
|
||||||
|
stream, close enough to be reached with the 16-bit displacement. This jump
|
||||||
|
instruction will transfer to the real desired target address. Thus, as long as
|
||||||
|
the @code{.word} value really is used as a displacement to compute an address
|
||||||
|
to jump to, the net effect will be correct (minus a very small efficiency
|
||||||
|
cost). If @code{.word} directives with label differences for values are used
|
||||||
|
for other purposes, however, things may not work properly. I think there is a
|
||||||
|
command-line option to turn on warnings when a broken word is discovered.
|
||||||
|
|
||||||
|
This code is turned off by the @code{WORKING_DOT_WORD} macro. It isn't needed
|
||||||
|
if @code{.word} emits a value large enough to contain an address (or, more
|
||||||
|
correctly, any possible difference between two addresses).
|
||||||
|
|
||||||
@node What Happens?
|
@node What Happens?
|
||||||
@section What Happens?
|
@section What Happens?
|
||||||
|
|
||||||
Blah blah blah, initialization, argument parsing, file reading,
|
Blah blah blah, initialization, argument parsing, file reading, whitespace
|
||||||
whitespace munging, opcode parsing and lookup, operand parsing. Now
|
munging, opcode parsing and lookup, operand parsing. Now it's time to write
|
||||||
it's time to write the output file.
|
the output file.
|
||||||
|
|
||||||
In @code{BFD_ASSEMBLER} mode, processing of relocations and symbols and
|
In @code{BFD_ASSEMBLER} mode, processing of relocations and symbols and
|
||||||
creation of the output file is initiated by calling
|
creation of the output file is initiated by calling @code{write_object_file}.
|
||||||
@code{write_object_file}.
|
|
||||||
|
|
||||||
@node Target Dependent Definitions
|
@node Target Dependent Definitions
|
||||||
@section Target Dependent Definitions
|
@section Target Dependent Definitions
|
||||||
|
|
||||||
@subheader Format-specific definitions
|
@subsection Format-specific definitions
|
||||||
|
|
||||||
@defmac obj_sec_sym_ok_for_reloc section
|
@defmac obj_sec_sym_ok_for_reloc (section)
|
||||||
(@code{BFD_ASSEMBLER} only.)
|
(@code{BFD_ASSEMBLER} only.)
|
||||||
Is it okay to use this section's section-symbol in a relocation entry?
|
Is it okay to use this section's section-symbol in a relocation entry? If not,
|
||||||
If not, a new internal-linkage symbol is generated and emitted if such a
|
a new internal-linkage symbol is generated and emitted if such a relocation
|
||||||
relocation entry is needed. (Default: Always use a new symbol.)
|
entry is needed. (Default: Always use a new symbol.)
|
||||||
|
|
||||||
|
@end defmac
|
||||||
|
|
||||||
|
@defmac obj_adjust_symtab
|
||||||
|
(@code{BFD_ASSEMBLER} only.)
|
||||||
|
If this macro is defined, it is invoked just before setting the symbol table of
|
||||||
|
the output BFD. Any finalizing changes needed in the symbol table should be
|
||||||
|
done here. For example, in the COFF support, if there is no @code{.file}
|
||||||
|
symbol defined already, one is generated at this point. If no such adjustments
|
||||||
|
are needed, this macro need not be defined.
|
||||||
|
|
||||||
|
@end defmac
|
||||||
|
|
||||||
@defmac EMIT_SECTION_SYMBOLS
|
@defmac EMIT_SECTION_SYMBOLS
|
||||||
(@code{BFD_ASSEMBLER} only.)
|
(@code{BFD_ASSEMBLER} only.)
|
||||||
Should section symbols be included in the symbol list if they're used in
|
Should section symbols be included in the symbol list if they're used in
|
||||||
relocations? Some formats can generate section-relative relocations,
|
relocations? Some formats can generate section-relative relocations, and thus
|
||||||
and thus don't need
|
don't need symbols emitted for them. (Default: 1.)
|
||||||
(Default: 1.)
|
@end defmac
|
||||||
|
|
||||||
|
@defmac obj_frob_file
|
||||||
|
Any final cleanup needed before writing out the BFD may be done here. For
|
||||||
|
example, ECOFF formats (and MIPS ELF format) may do some work on the MIPS-style
|
||||||
|
symbol table with its integrated debug information. The symbol table should
|
||||||
|
not be modified at this time.
|
||||||
|
@end defmac
|
||||||
|
|
||||||
|
@subsection CPU-specific definitions
|
||||||
|
|
||||||
|
@node Relaxation
|
||||||
|
@subsubsection Relaxation
|
||||||
|
@cindex Relaxation
|
||||||
|
|
||||||
|
If @code{md_relax_frag} isn't defined, the assembler will perform some
|
||||||
|
relaxation on @code{rs_machine_dependent} frags based on the frag subtype and
|
||||||
|
the displacement to some specified target address. The basic idea is that many
|
||||||
|
machines have different addressing modes for instructions that can specify
|
||||||
|
different ranges of values, with successive modes able to access wider ranges,
|
||||||
|
including the entirety of the previous range. Smaller ranges are assumed to be
|
||||||
|
more desirable (perhaps the instruction requires one word instead of two or
|
||||||
|
three); if this is not the case, don't describe the smaller-range, inferior
|
||||||
|
mode.
|
||||||
|
|
||||||
|
The @code{fr_subtype} and the field of a frag is an index into a CPU-specific
|
||||||
|
relaxation table. That table entry indicates the range of values that can be
|
||||||
|
stored, the number of bytes that will have to be added to the frag to
|
||||||
|
accomodate the addressing mode, and the index of the next entry to examine if
|
||||||
|
the value to be stored is outside the range accessible by the current
|
||||||
|
addressing mode. The @code{fr_symbol} field of the frag indicates what symbol
|
||||||
|
is to be accessed; the @code{fr_offset} field is added in.
|
||||||
|
|
||||||
|
If the @code{fr_pcrel_adjust} field is set, which currently should only happen
|
||||||
|
for the NS32k family, the @code{TC_PCREL_ADJUST} macro is called on the frag to
|
||||||
|
compute an adjustment to be made to the displacement.
|
||||||
|
|
||||||
|
The value fitted by the relaxation code is always assumed to be a displacement
|
||||||
|
from the current frag. (More specifically, from @code{fr_fix} bytes into the
|
||||||
|
frag.) This seems kinda silly. What about fitting small absolute values? I
|
||||||
|
suppose @code{md_assemble} is supposed to take care of that, but if the operand
|
||||||
|
is a difference between symbols, it might not be able to, if the difference was
|
||||||
|
not computable yet.
|
||||||
|
|
||||||
|
The end of the relaxation sequence is indicated by a ``next'' value of 0. This
|
||||||
|
is kinda silly too, since it means that the first entry in the table can't be
|
||||||
|
used. I think -1 would make a more logical sentinel value.
|
||||||
|
|
||||||
|
The table @code{md_relax_table} from @file{targ-cpu.c} describes the relaxation
|
||||||
|
modes available. Currently this must always be provided, even on machines for
|
||||||
|
which this type of relaxation isn't possible or practical. Probably fewer than
|
||||||
|
half the machines gas supports used it; it ought to be made conditional on some
|
||||||
|
CPU-specific macro. Currently, also that table must be declared ``const;'' on
|
||||||
|
some machines, though, it might make sense to keep it writeable, so it can be
|
||||||
|
modified depending on which CPU of a family is specified. For example, in the
|
||||||
|
m68k family, the 68020 has some addressing modes that are not available on the
|
||||||
|
68000.
|
||||||
|
|
||||||
|
The relaxation table type contains these fields:
|
||||||
|
|
||||||
|
@table @code
|
||||||
|
@item long rlx_forward
|
||||||
|
Forward reach, must be non-negative.
|
||||||
|
@item long rlx_backward
|
||||||
|
Backward reach, must be zero or negative.
|
||||||
|
@item rlx_length
|
||||||
|
Length in bytes of this addressing mode.
|
||||||
|
@item rlx_more
|
||||||
|
Index of the next-longer relax state, or zero if there is no ``next''
|
||||||
|
relax state.
|
||||||
|
@end table
|
||||||
|
|
||||||
|
The relaxation is done in @code{relax_segment} in @file{write.c}. The
|
||||||
|
difference in the length fields between the original mode and the one finally
|
||||||
|
chosen by the relaxing code is taken as the size by which the current frag will
|
||||||
|
be increased in size. For example, if the initial relaxing mode has a length
|
||||||
|
of 2 bytes, and because of the size of the displacement, it gets upgraded to a
|
||||||
|
mode with a size of 6 bytes, it is assumed that the frag will grow by 4 bytes.
|
||||||
|
(The initial two bytes should have been part of the fixed portion of the frag,
|
||||||
|
since it is already known that they will be output.) This growth must be
|
||||||
|
effected by @code{md_convert_frag}; it should increase the @code{fr_fix} field
|
||||||
|
by the appropriate size, and fill in the appropriate bytes of the frag.
|
||||||
|
(Enough space for the maximum growth should have been allocated in the call to
|
||||||
|
frag_var as the second argument.)
|
||||||
|
|
||||||
|
If relocation records are needed, they should be emitted by
|
||||||
|
@code{md_estimate_size_before_relax}.
|
||||||
|
|
||||||
|
These are the machine-specific definitions associated with the relaxation
|
||||||
|
mechanism:
|
||||||
|
|
||||||
|
@deftypefun int md_estimate_size_before_relax (fragS *@var{frag}, segT @var{sec})
|
||||||
|
This function should examine the target symbol of the supplied frag and correct
|
||||||
|
the @code{fr_subtype} of the frag if needed. When this function is called, if
|
||||||
|
the symbol has not yet been defined, it will not become defined later; however,
|
||||||
|
its value may still change if the section it is in gets relaxed.
|
||||||
|
|
||||||
|
Usually, if the symbol is in the same section as the frag (given by the
|
||||||
|
@var{sec} argument), the narrowest likely relaxation mode is stored in
|
||||||
|
@code{fr_subtype}, and that's that.
|
||||||
|
|
||||||
|
If the symbol is undefined, or in a different section (and therefore moveable
|
||||||
|
to an arbitrarily large distance), the largest available relaxation mode is
|
||||||
|
specified, @code{fix_new} is called to produce the relocation record,
|
||||||
|
@code{fr_fix} is increased to include the relocated field (remember, this
|
||||||
|
storage was allocated when @code{frag_var} was called), and @code{frag_wane} is
|
||||||
|
called to convert the frag to an @code{rs_fill} frag with no variant part.
|
||||||
|
Sometimes changing addressing modes may also require rewriting the instruction.
|
||||||
|
It can be accessed via @code{fr_opcode} or @code{fr_fix}.
|
||||||
|
|
||||||
|
Sometimes @code{fr_var} is increased instead, and @code{frag_wane} is not
|
||||||
|
called. I'm not sure, but I think this is to keep @code{fr_fix} referring to
|
||||||
|
an earlier byte, and @code{fr_subtype} set to @code{rs_machine_dependent} so
|
||||||
|
that @code{md_convert_frag} will get called.
|
||||||
|
@end deftypefun
|
||||||
|
|
||||||
|
@deftypevar relax_typeS md_relax_table []
|
||||||
|
This is the table.
|
||||||
|
@end deftypevar
|
||||||
|
|
||||||
|
@defmac md_relax_frag (@var{frag})
|
||||||
|
|
||||||
|
This macro, if defined, overrides all of the processing described above. It's
|
||||||
|
only defined for the MIPS target CPU, and there it doesn't do anything; it's
|
||||||
|
used solely to disable the relaxing code and free up the @code{fr_subtype}
|
||||||
|
field for use by the CPU-specific code.
|
||||||
|
|
||||||
|
@end defmac
|
||||||
|
|
||||||
|
@defmac tc_frob_file
|
||||||
|
Like @code{obj_frob_file}, this macro handles miscellaneous last-minute
|
||||||
|
cleanup. Currently only used on PowerPC/POWER support, for setting up a
|
||||||
|
@code{.debug} section. This macro should not cause the symbol table to be
|
||||||
|
modified.
|
||||||
|
|
||||||
|
@end defmac
|
||||||
|
|
||||||
@node Source File Summary
|
@node Source File Summary
|
||||||
@section Source File Summary
|
@section Source File Summary
|
||||||
|
|
||||||
The code in the @file{obj-coff} back end assumes @code{BFD_ASSEMBLER} is
|
@subsection File Format Descriptions
|
||||||
defined; the code in @file{obj-coffbfd} uses @code{BFD},
|
|
||||||
@code{BFD_HEADERS}, and @code{MANY_SEGMENTS}, but does a lot of the file
|
@subheading a.out
|
||||||
positioning itself. This confusing situation arose from the history of
|
|
||||||
the code.
|
The @code{a.out} format is described by @file{obj-aout.*}.
|
||||||
|
|
||||||
|
@subheading b.out
|
||||||
|
|
||||||
|
The @code{b.out} format, described by @file{obj-bout.*}, is similar to
|
||||||
|
@code{a.out} format, except for a few additional fields in the file header
|
||||||
|
describing section alignment and address.
|
||||||
|
|
||||||
|
@subheading COFF
|
||||||
|
|
||||||
Originally, @file{obj-coff} was a purely non-BFD version, and
|
Originally, @file{obj-coff} was a purely non-BFD version, and
|
||||||
@file{obj-coffbfd} was created to use BFD for low-level byte-swapping.
|
@file{obj-coffbfd} was created to use BFD for low-level byte-swapping. When
|
||||||
When the @code{BFD_ASSEMBLER} conversion started, the first COFF target
|
the @code{BFD_ASSEMBLER} conversion started, the first COFF target to be
|
||||||
to be converted was using @file{obj-coff}, and the two files had
|
converted was using @file{obj-coff}, and the two files had diverged somewhat,
|
||||||
diverged somewhat, and I didn't feel like first converting the support
|
and I didn't feel like first converting the support of that target over to use
|
||||||
of that target over to use the low-level BFD interface.
|
the low-level BFD interface.
|
||||||
|
|
||||||
Currently, all COFF targets use one of the two BFD interfaces, so the
|
So @file{obj-coff} got converted, and to simplify certain things,
|
||||||
non-BFD code can be removed. Eventually, all should be converted to
|
@file{obj-coffbfd} got ``merged'' in with a brute-force approach.
|
||||||
using one COFF back end, which uses the high-level BFD interface.
|
Specifically, preprocessor conditionals testing for @code{BFD_ASSEMBLER}
|
||||||
|
effectively split the @file{obj-coff} files into the two separate versions. It
|
||||||
|
isn't pretty. They will be merged more thoroughly, and eventually only the
|
||||||
|
higher-level interface will be used.
|
||||||
|
|
||||||
|
@subheading ECOFF
|
||||||
|
|
||||||
|
All ECOFF configurations use BFD for writing object files.
|
||||||
|
|
||||||
|
@subheading ELF
|
||||||
|
|
||||||
|
ELF is a fairly reasonable format, without many of the deficiencies the other
|
||||||
|
object file formats have. (It's got some of its own, but not as bad as the
|
||||||
|
others.) All ELF configurations use BFD for writing object files.
|
||||||
|
|
||||||
|
@subheading EVAX
|
||||||
|
|
||||||
|
This is the format used on VMS. Yes, someone has actually written BFD support
|
||||||
|
for it. The code hasn't been integrated yet though.
|
||||||
|
|
||||||
|
@subheading HP300?
|
||||||
|
|
||||||
|
@subheading IEEE?
|
||||||
|
|
||||||
|
@subheading SOM
|
||||||
|
|
||||||
|
@subheading XCOFF
|
||||||
|
|
||||||
|
The XCOFF configuration is based on the COFF cofiguration (using the
|
||||||
|
higher-level BFD interface). In fact, it uses the same files in the assembler.
|
||||||
|
|
||||||
|
@subheading VMS
|
||||||
|
|
||||||
|
This is the old Vax VMS support. It doesn't use BFD.
|
||||||
|
|
||||||
|
@subsection Processor Descriptions
|
||||||
|
|
||||||
|
Foo: a29k, alpha, h8300, h8500, hppa, i386, i860, i960, m68k, m88k, mips,
|
||||||
|
ns32k, ppc, sh, sparc, tahoe, vax, z8k.
|
||||||
|
|
||||||
|
@node M68k
|
||||||
|
@subsubsection M68k
|
||||||
|
|
||||||
|
The operand syntax handling is atrocious. There is no clear specification of
|
||||||
|
the operand syntax. I'm looking into using a Bison grammar to replace much of
|
||||||
|
it.
|
||||||
|
|
||||||
|
Operands on the 68k series processors can have two displacement values
|
||||||
|
specified, plus a base register and a (possibly scaled) index register of which
|
||||||
|
only some bits might be used. Thus a single 68k operand requires up to two
|
||||||
|
expressions, two register numbers, and size and scale factors. The
|
||||||
|
@code{struct m68k_op} type also includes a field indicating the mode of the
|
||||||
|
operand, and an @code{error} field indicating a problem encountered while
|
||||||
|
parsing the operand.
|
||||||
|
|
||||||
|
An instruction on the 68k may have up to 6 operands, although most of them have
|
||||||
|
to be simple register operands. Up to 11 (16-bit) words may be required to
|
||||||
|
express the instruction.
|
||||||
|
|
||||||
|
A @code{struct m68k_exp} expression contains an @code{expressionS}, pointers to
|
||||||
|
the first and last characters of the input that produced the expression, an
|
||||||
|
indication of the section to which the expression belongs, and a size field.
|
||||||
|
I'm not sure what the size field describes.
|
||||||
|
|
||||||
|
@subsubheading M68k addressing modes
|
||||||
|
|
||||||
|
Many instructions used the low six bits of the first instruction word to
|
||||||
|
describe the location of the operand, or how to compute the location. The six
|
||||||
|
bits are typically split into three for a ``mode'' and three for a ``register''
|
||||||
|
value. The interpretation of these values is as follows:
|
||||||
|
|
||||||
|
@example
|
||||||
|
Mode Register Operand addressing mode
|
||||||
|
0 Dn data register
|
||||||
|
1 An address register
|
||||||
|
2 An indirect
|
||||||
|
3 An indirect, post-increment
|
||||||
|
4 An indirect, pre-decrement
|
||||||
|
5 An indirect with displacement
|
||||||
|
6 An indirect with optional displacement and index;
|
||||||
|
may involve multiple indirections and two
|
||||||
|
displacements
|
||||||
|
7 0 16-bit address follows
|
||||||
|
7 1 32-bit address follows
|
||||||
|
7 2 PC indirect with displacement
|
||||||
|
7 3 PC indirect with optional displacements and index
|
||||||
|
7 4 immediate 16- or 32-bit
|
||||||
|
7 5,6,7 Reserved
|
||||||
|
@end example
|
||||||
|
|
||||||
|
On the 68000 and 68010, support for modes 6 and 7.3 are incomplete; the
|
||||||
|
displacement must fit in 8 bits, and no scaling or index suppression is
|
||||||
|
permitted.
|
||||||
|
|
||||||
|
@subsubheading M68k relaxation modes
|
||||||
|
|
||||||
|
The relaxation modes used on the 68k are:
|
||||||
|
|
||||||
|
@table @code
|
||||||
|
@item ABRANCH
|
||||||
|
Case @samp{g} except when @code{BCC68000} is applicable.
|
||||||
|
@item FBRANCH
|
||||||
|
Coprocessor branches.
|
||||||
|
@item PCREL
|
||||||
|
Mode 7.2 -- program counter indirect with 16-bit displacement. This is
|
||||||
|
available on all processors. Widens to 32-bit absolute. Used only if the
|
||||||
|
original code used @code{ABSL} mode, and the CPU is not a 68000 or 68010.
|
||||||
|
(Why? Those processors support mode 7.2.)
|
||||||
|
@item BCC68000
|
||||||
|
A conditional branch instruction, on the 68000 or 68010. These instructions
|
||||||
|
support only 16-bit displacements on these processors. If a larger
|
||||||
|
displacement is needed, the condition is negated and turned into a short branch
|
||||||
|
around a jump instruction to the specified target. This jump will have an
|
||||||
|
long absolute addressing mode.
|
||||||
|
@item DBCC
|
||||||
|
Like @code{BCC68000}, but for @code{dbCC} (decrement and branch on condition)
|
||||||
|
instructions.
|
||||||
|
@item PCLEA
|
||||||
|
Not currently used?? Short form is mode 7.2 (program counter indirect, 16-bit
|
||||||
|
displacement); long form is 7.3/0x0170 (program counter indirect, suppressed
|
||||||
|
index register, 32-bit displacement). Used in progressive-930331 for mode
|
||||||
|
@code{AOFF} with a PC-relative addressing mode and a displacement that won't
|
||||||
|
fit in 16 bits, or which is variable and is not specified to have a size other
|
||||||
|
than long.
|
||||||
|
@item PCINDEX
|
||||||
|
Newly added. PC indirect with index. An 8-bit displacement is supported on
|
||||||
|
the 68000 and 68010, wider displacements on later processors.
|
||||||
|
|
||||||
|
Well, actually, I haven't added it yet. I need to soon, though. It fixes a
|
||||||
|
bug reported by a customer.
|
||||||
|
@end table
|
||||||
|
|
||||||
|
@subsection ``Emulation'' Descriptions
|
||||||
|
|
||||||
|
These are the @file{te-*.h} files.
|
||||||
|
|
||||||
@node Foo
|
@node Foo
|
||||||
@section Foo
|
@section Foo
|
||||||
|
@ -182,12 +634,12 @@ using one COFF back end, which uses the high-level BFD interface.
|
||||||
@deftypefun int had_warnings (void)
|
@deftypefun int had_warnings (void)
|
||||||
@deftypefunx int had_errors (void)
|
@deftypefunx int had_errors (void)
|
||||||
|
|
||||||
Returns non-zero if any warnings or errors, respectively, have been
|
Returns non-zero if any warnings or errors, respectively, have been printed
|
||||||
printed during this invocation.
|
during this invocation.
|
||||||
|
|
||||||
@end deftypefun
|
@end deftypefun
|
||||||
|
|
||||||
@deftypefun void as_perror (const char *@var{gripe}, const char *@var{filename})
|
@deftypefun void as_perror (const char *@var{gripe}, const char *@var{filename})
|
||||||
|
|
||||||
Displays a BFD or system error, then clears the error status.
|
Displays a BFD or system error, then clears the error status.
|
||||||
|
|
||||||
|
@ -198,33 +650,77 @@ Displays a BFD or system error, then clears the error status.
|
||||||
@deftypefunx void as_bad (const char *@var{format}, ...)
|
@deftypefunx void as_bad (const char *@var{format}, ...)
|
||||||
@deftypefunx void as_fatal (const char *@var{format}, ...)
|
@deftypefunx void as_fatal (const char *@var{format}, ...)
|
||||||
|
|
||||||
These functions display messages about something amiss with the input
|
These functions display messages about something amiss with the input file, or
|
||||||
file, or internal problems in the assembler itself. The current file
|
internal problems in the assembler itself. The current file name and line
|
||||||
name and line number are printed, followed by the supplied message,
|
number are printed, followed by the supplied message, formatted using
|
||||||
formatted using @code{vfprintf}, and a final newline.
|
@code{vfprintf}, and a final newline.
|
||||||
|
|
||||||
|
An error indicated by @code{as_bad} will result in a non-zero exit status when
|
||||||
|
the assembler has finished. Calling @code{as_fatal} will result in immediate
|
||||||
|
termination of the assembler process.
|
||||||
|
|
||||||
@end deftypefun
|
@end deftypefun
|
||||||
|
|
||||||
@deftypefun void as_warn_where (char *@var{file}, unsigned int @var{line}, const char *@var{format}, ...)
|
@deftypefun void as_warn_where (char *@var{file}, unsigned int @var{line}, const char *@var{format}, ...)
|
||||||
@deftypefunx void as_bad_where (char *@var{file}, unsigned int @var{line}, const char *@var{format}, ...)
|
@deftypefunx void as_bad_where (char *@var{file}, unsigned int @var{line}, const char *@var{format}, ...)
|
||||||
|
|
||||||
These variants permit specification of the file name and line number,
|
These variants permit specification of the file name and line number, and are
|
||||||
and are used when problems are detected when reprocessing information
|
used when problems are detected when reprocessing information saved away when
|
||||||
saved away when processing some earlier part of the file. For example,
|
processing some earlier part of the file. For example, fixups are processed
|
||||||
fixups are processed after all input has been read, but messages about
|
after all input has been read, but messages about fixups should refer to the
|
||||||
fixups should refer to the original filename and line number that they
|
original filename and line number that they are applicable to.
|
||||||
are applicable to.
|
|
||||||
|
|
||||||
@end deftypefun
|
@end deftypefun
|
||||||
|
|
||||||
@deftypefun void fprint_value (FILE *@var{file}, valueT @var{val})
|
@deftypefun void fprint_value (FILE *@var{file}, valueT @var{val})
|
||||||
@deftypefunx void sprint_value (char *@var{buf}, valueT @var{val})
|
@deftypefunx void sprint_value (char *@var{buf}, valueT @var{val})
|
||||||
|
|
||||||
These functions are helpful for converting a @code{valueT} value into
|
These functions are helpful for converting a @code{valueT} value into printable
|
||||||
printable format, in case it's wider than modes that @code{*printf} can
|
format, in case it's wider than modes that @code{*printf} can handle. If the
|
||||||
handle. If the type is narrow enough, a decimal number will be
|
type is narrow enough, a decimal number will be produced; otherwise, it will be
|
||||||
produced; otherwise, it will be in hexadecimal (FIXME: currently without
|
in hexadecimal (FIXME: currently without `0x' prefix). The value itself is not
|
||||||
`0x' prefix). The value itself is not examined to make this
|
examined to make this determination.
|
||||||
determination.
|
|
||||||
|
|
||||||
@end deftypefun
|
@end deftypefun
|
||||||
|
|
||||||
|
@node Writing a new target
|
||||||
|
@section Writing a new target
|
||||||
|
|
||||||
|
@node Test suite
|
||||||
|
@section Test suite
|
||||||
|
@cindex test suite
|
||||||
|
|
||||||
|
The test suite is kind of lame for most processors. Often it only checks to
|
||||||
|
see if a couple of files can be assembled without the assembler reporting any
|
||||||
|
errors. For more complete testing, write a test which either examines the
|
||||||
|
assembler listing, or runs @code{objdump} and examines its output. For the
|
||||||
|
latter, the TCL procedure @code{run_dump_test} may come in handy. It takes the
|
||||||
|
base name of a file, and looks for @file{@var{file}.d}. This file should
|
||||||
|
contain as its initial lines a set of variable settings in @samp{#} comments,
|
||||||
|
in the form:
|
||||||
|
|
||||||
|
@example
|
||||||
|
#@var{varname}: @var{value}
|
||||||
|
@end example
|
||||||
|
|
||||||
|
The @var{varname} may be @code{objdump}, @code{nm}, or @code{as}, in which case
|
||||||
|
it specifies the options to be passed to the specified programs. Exactly one
|
||||||
|
of @code{objdump} or @code{nm} must be specified, as that also specifies which
|
||||||
|
program to run after the assembler has finished. If @var{varname} is
|
||||||
|
@code{source}, it specifies the name of the source file; otherwise,
|
||||||
|
@file{@var{file}.s} is used. If @var{varname} is @code{name}, it specifies the
|
||||||
|
name of the test to be used in the @code{pass} or @code{fail} messages.
|
||||||
|
|
||||||
|
The non-commented parts of the file are interpreted as regular expressions, one
|
||||||
|
per line. Blank lines in the @code{objdump} or @code{nm} output are skipped,
|
||||||
|
as are blank lines in the @code{.d} file; the other lines are tested to see if
|
||||||
|
the regular expression matches the program output. If it does not, the test
|
||||||
|
fails.
|
||||||
|
|
||||||
|
Note that this means the tests must be modified if the @code{objdump} output
|
||||||
|
style is changed.
|
||||||
|
|
||||||
|
@bye
|
||||||
|
@c Local Variables:
|
||||||
|
@c fill-column: 79
|
||||||
|
@c End:
|
||||||
|
|
Loading…
Add table
Reference in a new issue