Description:
This is a small patch which replace the CopyMem and CopyMemQuick
functions of exec.library.
These functions are optimized for the 68060 processor. They should
also work with the 68040 processor, howevery they might not be the
fastest possible for 68040.
The patch tests for a 68040 or 060 processor. If it can't find one,
it doesn't install the patch and exit with a return code of 20 (=fail).
It also fails, if it can't allocate the necessary memory. If MorphOS
PPC kernel is running it won't install the patch and will exit with a
return code of 5 (=warn).
If the CPU is a 68040 CMQ060 will install a slightly improved version
of v1.4 routines. If CPU is a 68060 routines with new movem-loop are
picked instead. Note that due these movem-copyloops v1.5 is slightly
slower in chipmem copies than v1.4. However fast->fast copies are sped
up, so I don't consider this a problem, esp. since most copies are
fast->fast.
In average (measured with "TestIt" from CopyMemQuicker V2.8) these
routines are 29.4% faster than Kickstart 3.1 ones. CMQ060 v1.5 is
in average 2.5% faster than CMQ060 v1.4.
The full source code is included. The source code was compiled with
GenAm 3.14, it also compiles with PhxAss.
Installation:
Just copy CMQ060 into c:
And insert CMQ060 in your s:startup-sequence
Some notes about Move16:
Move16 is a new assembler command of the 68040 and 060 processors. It
moves 16 bytes at once and it uses burst accesses. Andreas Kleinert and
Thomas Richter said there could be problems with the Move16 command on
the Amiga, especially in the chipram, caused by the DMA of the custom
chips.
So v1.5 of CMQ060 doesn't use Move16 from or into memory below
$01000000 (Chipram, ZorroII-Fastram, I/O-Space, Kickstart,...). Move16
is only used, when the source and destination addresses are both higher
than $00ffffff (32-bit fastram).
(If you didn't get any errors with V1.3 and want to get the most speed
improvement, you could use CMQ060_Move16. This version use Move16 also
below $01000000, but you might get problems.
If you want to avoid all problems which Move16 could cause [the 68040
has some Move16 bugs], you should use Aminet:util/boot/CMQ030. This
one never uses Move16 and is still faster than the other available
patches.)
Some notes about the movem bug:
Some CPU Cards have a bug in the bus controller and these cards fail to
perform movem properly with odd addresses. CMQ060 v1.5 autodetect such
cards and will use move-loop instead of movem-loop with them. If move-
loop is picked the performance will drop slightly compared to movem-
loop. Fortunately such defect cards are rare. Special thanks to Harald
Frank who patiently explained the bug to me, and gave me idea how to
autodetect it.
Version 1.5 author:
Harry "Piru" Sintonen
<sintoneniki.fi>
Original CMQ060 author:
Dirk Busse
Kropsburgstraße 8
D-67141 Neuhofen
Germany
<dbusseprimus-online.de>
<100.141999germanynet.de>
Speed comparision:
There are some similar patches available on the Aminet:
CopyMemQuicker V2.8 from 1994 -> Aminet:util/boot/COPMQR28.lha
PCM V1.0 from 1996 -> Aminet:util/boot/PCM_1.0.lha
Also MCP patches these functions.
Here are some test results. All results were measured on the same AMIGA
1200 with a phase5 Blizzard PPC with 060 50MHz. Blizzard PPC memory
speed setting for M68K was set to fastest possible.
The most surprising result is that PCM V1.0 is in average *slower* than
original Kickstart 3.1 routines!
"TestIt" from
CopyMemQuicker V2.8
orig COPMQR MCP PCM CMQ030 CMQ060 CMQ060 CMQ060
KS 3.1 V2.8 V1.33b1 V1.0 V1.1 V1.4 V1.5 Move16
CopyMem routines V1.5
565×64kB L->L 2.04 2.08 1.92 1.56 1.91 1.52 1.51 1.51
147×64kB L->L+1 0.94 0.68 0.57 0.68 0.56 0.57 0.56 0.56
413×64kB L->E 1.66 1.70 1.61 1.91 1.57 1.61 1.59 1.59
147×64kB L->E+1 0.94 0.68 0.57 0.68 0.56 0.57 0.56 0.56
147×64kB L+1->L 0.94 0.67 0.57 0.60 0.56 0.57 0.55 0.56
382×64kB L+1->L+1 1.62 1.39 1.29 1.05 1.30 1.03 1.02 1.02
147×64kB L+1->E 0.94 0.68 0.57 0.69 0.57 0.57 0.56 0.56
501×64kB L+1->E+1 1.91 1.89 1.95 2.34 1.96 1.96 1.93 1.93
501×64kB E->L 1.92 1.92 1.94 2.06 1.92 1.95 1.90 1.90
147×64kB E->L+1 0.94 0.67 0.57 0.68 0.57 0.57 0.55 0.55
382×64kB E->E 1.62 1.39 1.29 1.06 1.30 1.03 1.02 1.02
147×64kB E->E+1 0.94 0.68 0.57 0.68 0.57 0.57 0.56 0.56
147×64kB E+1->L 0.94 0.67 0.57 0.60 0.56 0.57 0.55 0.56
413×64kB E+1->L+1 1.71 1.70 1.60 1.93 1.61 1.60 1.56 1.56
147×64kB E+1->E 0.94 0.67 0.57 0.69 0.57 0.57 0.55 0.55
564×64kB E+1->E+1 2.10 2.06 1.91 1.56 1.92 1.52 1.50 1.50
33900×1kB L->L 0.43 0.42 0.37 1.49 0.36 0.36 0.36 0.36
9400×1kB L->L+1 0.58 0.33 0.20 0.24 0.20 0.19 0.19 0.19
24000×1kB E->E 0.68 0.30 0.26 1.01 0.27 0.26 0.26 0.26
196000×128B L->L 0.55 0.45 0.41 1.12 0.32 0.35 0.33 0.33
155000×128B E->E 0.75 0.40 0.34 1.10 0.34 0.30 0.30 0.31
588000×19B L->L 0.85 0.61 0.72 0.93 0.53 0.53 0.53 0.53
622000×18B L->L 0.86 0.51 0.71 0.89 0.51 0.50 0.50 0.51
663000×17B L->L 0.75 0.68 0.76 0.80 0.51 0.53 0.53 0.55
956000×16B L->L 0.82 0.71 1.04 1.05 0.59 0.72 0.55 0.55
1060000×8B L->L 0.85 0.72 0.89 1.03 0.62 0.53 0.55 0.55
1430000×4B L->L 0.80 0.63 0.94 1.12 0.71 0.45 0.45 0.48
2190000×1B L->L 0.74 0.61 1.40 0.88 0.44 0.66 0.66 0.70
CopyMemQuick
565×64kB L->L 2.04 2.06 1.91 1.56 1.91 1.52 1.51 1.51
33900×1kB L->L 0.43 0.43 0.37 1.27 0.36 0.36 0.35 0.35
196000×128B L->L 0.53 0.43 0.38 1.09 0.31 0.32 0.30 0.30
956000×16B L->L 0.73 0.63 0.94 1.06 0.42 0.58 0.42 0.42
1060000×8B L->L 0.53 0.57 0.80 0.63 0.44 0.42 0.42 0.42
1430000×4B L->L 0.43 0.51 0.80 0.60 0.31 0.28 0.28 0.31
Total
35.63 30.70 31.48 36.84 27.31 25.80 25.16 25.31
History:
1.0 (12.Sep.1998)
- First public version.
1.1 (15.Sep.1998)
- V1.0 exits with a return code of 10 (=error), if it can't find
a 68040 or 68060 or can't get the necessary memory.
V1.1 exits, in this cases, with a return code of 20 (=fail).
- Fixed a mistake in the readme.
1.1b (19.Sep.1998)
(I didn't changed the Patch itself! It's the same as V1.1)
- Added the Testresults of MCP V1.30 into the readme.
- Added CMQ060beep and CMQ060beepCMQ (see above).
1.2 (29.Nov.1998)
- Added the Testresults of MCP V1.32b12 into the readme.
- Changed the source code.
There was a problem with a wrong written program which expects
the address of the last source byte +1 in A0 and the address
of the last destination byte +1 in A1.
This version of CMQ060 solves problems with such badly programs.
It's now 100 Bytes longer, but the speed is the same. Big moves
by the CopyMem function will be one or two cycles faster, but
you didn't recognize it.
1.3 (5.Jan.1999)
All changes made to this version doesn't effect the speed. They
are only to avoid problems with future versions of AMIGA OS.
- changed the version string to the "standard" format
- changed BMI to BCS and BPL to BCC
-> now CMQ030 could move blocks bigger than 2 GigaByte ;-)
1.4 (3.Apr.1999)
- CMQ060 now doesn't use Move16 into/from memory below $01000000
- added CMQ060Move16 (It's the same as CMQ060 V1.3)
- added the test results of CMQ030 (Does never use Move16)
1.5 (5.Sep.2000)
- Totally rewrote the source code.
- Bugfix: Fixed major bug from the patch init: If the memory was
allocated near 64k boundary CMQ060 trashed innocent memory and
crashed the system completely. Odds were 1/8192 for this to
happen.
- Speedup: Removed two pipeline stalls from big copies.
- Speedup: Optimized non-move16 copy loop, now it uses movem.l
instead of move.l. Slightly slower in chipmem copies, however
fast -> fast copies sped up.
- Speedup: Unrolled the bigcopy-loops to do 256 bytes per
iteration.
- Added MorphOS check, it makes no sense to slow down MorphOS
with m68k patches.
- Redid all speedtests, MCP test with 1.33b1. Added V1.4 result
for reference. Cleaned up this readme.
1.5b (6.Sep.2000)
- With 68040 the move-loop is faster then movem-loop. So, now
always pick move-loop for 68040. Thanks to Chip for benchmark
results.
- Added autodetect for movem buscontroller bug. Now automagically
pick between movem- and move-loop on 68060.
- Fixed Kickstart requirement, 68040 wasn't officially supported
before Kickstart 2.04.
1.5c (7.Sep.2000)
- Bugfix: movem buscontroller bug autodetect was bugged. Fixed.
- Made the source compile with PhxAss.
1.5d (11.Sep.2000)
- Bugfix: movem buscontroller bug autodetect still had a potential
problem. Fixed.
|