Running the H8 port through the GCC testsuite currently takes 4h 30m on my
fastest server -- that's roughly 1.5hrs per multilib tested and many tests are
disabled for various reasons.
To put that 1.5hr/multilib in perspective, that's roughly 3X the time for other
embedded targets. Clearly something isn't working as well as it should.
A bit of digging with perf shows that we're spending a crazy amount of time
decoding instructions in the H8 simulator. It's not hard to see why --
basically we take a blob of instruction data, then try to match it to every
instruction in the H8 opcode table starting at the beginning. That table has
~8000 entries (each different addressing mode is considered a different
instruction in the table).
Naturally my first thought was to sort the table and use a binary search to
find the right entry. That's made excessively complex due to the encoding on
the H8. Just getting the sort right would be much more complex than I'd
consider advisable.
Another thought was to build a mapping to the right entry for all the
instructions that can be disambiguated based on the first nibble (4 bits) of
instruction data and a mapping for those which can be disambiguated based on
the first byte of instruction data.
That seemed feasible until I realized that the H8/SX did some truly horrid
things with encoding branches in the 0x4XYY opcode space. It uses an "always
zero" bit in the offset to encode new semantic information. So we can't select
on just 0x4X. Ugh!
We could always to a custom decoder. I've done several through the years, they
can be very fast. But no way I can justify the time to do that.
So what I settled on was to first sort the opcode table by the first nibble,
then find the index of the first instruction for each nibble. Decoding uses
that index to start its search. This cuts the overall build/test by more than
half.
Next I adjusted the sort so that instructions that are not available on the
current sub architecture are put at the end of the table. This shaves another
~15% off the total cycle time.
The net of the two changes is on my fastest server we've gone from 4:30 to 1:40
running the GCC testsuite. Same test results before/after, of course. It's
still not fast, but it's a hell of a lot better.