Description:
This is a small patch which replace the CopyMem and CopyMemQuick
functions of exec.library.
These functions are optimized for the 68060 processor. They should
also work with the 68040 processor.
The patch tests for a 68040 or 060 processor. If it can't find one,
it doesn't install the patch and exits with a return code of 20 (=fail).
It also fails, if it can't allocate the necessary memory.
In some cases these new functions are four times faster than the
original functions.
Installation:
Just copy CMQ060 into c:
And insert CMQ060 in your s:Startup-Sequence
Some notes about Move16:
Move16 is a new assembler command of the 68040 and 060 processors. It
moves 16 bytes at once. Therefor it uses burst accesses.
Andreas Kleinert and Thomas Richter told me, there could be problems with
the Move16 command on the Amiga. Especially in the Chipram. Caused by
the DMA of the custom chips.
I couldn't produce such an error, but maybe on other systems.
So V1.4 of CMQ060 doesn't use Move16 from or into memory below $01000000
(Chipram, ZorroII-Fastram, I/O-Space, Kickstart,...). Move16 is only
used, when the source and destination addresses are both higher than
$00ffffff (32-bit-Fastram,...).
(If you didn't get any errors with V1.3 and want to get the most speed
improvement, you could use CMQ060Move16. This is identically with CMQ060
V1.3 and uses Move16 also in and from Chipram. But maybe you get
problems.)
(If you want to avoid all problems which Move16 could cause [the 68040
has some Move16 bugs], you should use Aminet:util/boot/CMQ030. This one
never uses Move16 and is still faster than the other available patches.)
The source code is also in the archive.
Author:
Dirk Busse
Kropsburgstraße 8
D-67141 Neuhofen
Germany
<dbusseprimus-online.de>
<100.141999germanynet.de>
How often are these functions used?
Some people told me, they couldn't notice a speed improvement.
You couldn't get a speed improvement by a factor of two. But there is a
little speed improvement, even if you couldn't notice it.
To show you how often the patched functions are called, I've inserted two
modified patches into Version 1.1b of this archive.
CMQ060beep:
Every time one of the patched functions CopyMem or CopyMemQuick is
called, your AMIGA makes a DisplayBeep. After calling LoadWB your
AMIGA beeps very often per second. If you boot your AMIGA without
Startup-Sequence and install CMQ060beep, you could see, every AMIGA
dos command like Dir, List, Avail, Resident... is using the patched
functions.
They all are using the CopyMem function. And this is the function with
the most speed improvement.
CMQ060beepCMQ:
This will only make a DisplayBeep, if the patched CopyMemQuick
function is called. So it shows you which programms are using the
patched CopyMemQuick function. For Example: PageStream3.3 while moving
a scrollbar or making a redraw or TeleInfo2 or ... .
The two above patches aren't for real use. They are only to demonstrate
how often the functions are used.
Speed comparision:
There are already some similar patches available on the Aminet:
CopyMemQuicker V2.8 from 1994 -> Aminet:util/boot/COPMQR28.lha
PCM V1.0 from 1996 -> Aminet:util/boot/PCM_1.0.lha
Also MCP patches these functions.
CopyMemQuicker is optimized for a 68000,010 and 020 processor.
But on a 68060 (I think also on a 68040) you could get some more
speed improvement.
PCM is optimized for the 68040 and 060 processor. But some copy modes
like Long to Even aren't optimized. And the copy mode Long+1 to Even+1
needs twice the time as the original exec function.
PCM works only with a 68040 or 060, because it also uses the Move16
command (see the note above).
In a lot of cases the patched functions from MCP are the slowest of all.
Some copy modes are even slower than the original Kickstart 3.1
functions.
Here are some test results. All results are measured on the same
AMIGA 2000 with a DKB WildFire060-50MHz:
"TestIt" from
CopyMemQuicker
V2.8 original CopyMemQuicker MCP PCM CMQ030 CMQ060 CMQ060
Kickstart3.1 V2.8 V1.32b12 V1.0 V1.1 V1.4 Move16
CopyMem routines V1.4
565×64kB L->L 1.85 1.85 1.85 1.35 1.79 1.31 1.31
147×64kB L->L+1 1.33 1.14 1.07 1.07 0.47 0.45 0.47
413×64kB L->E 2.21 2.21 2.21 2.23 1.31 1.31 1.31
147×64kB L->E+1 1.35 1.15 1.07 1.07 0.45 0.45 0.45
147×64kB L+1->L 1.35 1.15 0.51 0.47 0.47 0.45 0.47
382×64kB L+1->L+1 2.11 1.23 2.88 0.91 1.21 0.89 0.87
147×64kB L+1->E 1.33 1.15 0.81 0.79 0.47 0.45 0.47
501×64kB L+1->E+1 1.71 1.70 3.81 3.71 1.59 1.57 1.59
501×64kB E->L 1.71 1.71 1.75 1.59 1.59 1.59 1.59
147×64kB E->L+1 1.33 1.15 1.11 1.07 0.47 0.47 0.47
382×64kB E->E 2.11 1.23 2.13 0.91 1.21 0.87 0.89
147×64kB E->E+1 1.35 1.13 1.13 1.09 0.47 0.45 0.45
147×64kB E+1->L 1.33 1.15 0.51 0.45 0.45 0.47 0.47
413×64kB E+1->L+1 2.19 2.19 3.15 3.05 1.31 1.29 1.29
147×64kB E+1->E 1.33 1.15 0.81 0.79 0.45 0.45 0.45
564×64kB E+1->E+1 1.81 1.81 4.31 1.35 1.79 1.31 1.31
33900×1kB L->L 1.10 1.11 1.13 1.31 1.03 1.04 1.07
9400×1kB L->L+1 1.17 0.93 0.91 0.86 0.29 0.29 0.27
24000×1kB E->E 1.70 0.80 1.68 0.92 0.74 0.75 0.75
196000×128B L->L 1.02 0.73 1.03 1.04 0.75 0.75 0.75
155000×128B E->E 1.61 0.63 1.55 1.05 0.62 0.60 0.59
588000×19B L->L 0.83 0.60 1.43 0.74 0.50 0.51 0.49
622000×18B L->L 0.81 0.51 1.43 0.77 0.51 0.49 0.51
663000×17B L->L 0.75 0.70 1.47 0.73 0.52 0.52 0.50
956000×16B L->L 0.79 0.71 1.98 1.00 0.58 0.51 0.50
1060000×8B L->L 0.85 0.79 1.17 1.01 0.58 0.52 0.52
1430000×4B L->L 0.73 0.61 1.09 1.14 0.45 0.39 0.41
2190000×1B L->L 0.67 0.61 0.73 0.84 0.33 0.57 0.62
CopyMemQuick
565×64kB L->L 1.85 1.87 1.85 1.33 1.79 1.31 1.29
33900×1kB L->L 1.09 1.11 1.13 0.89 1.03 1.03 1.07
196000×128B L->L 0.99 0.71 1.03 0.81 0.73 0.73 0.73
956000×16B L->L 0.69 0.63 0.88 0.94 0.38 0.39 0.38
1060000×8B L->L 0.47 0.57 0.71 0.60 0.40 0.40 0.40
1430000×4B L->L 0.35 0.51 0.73 0.52 0.23 0.21 0.25
"Test" from
PCM V1.0 ("Test" moves ten times a Block of 500.000 Bytes)
Fast->Fast
CopyMem 0.26 0.26 0.18 0.18 0.24 0.18 0.18
CopyMemQuick 0.26 0.26 0.18 0.20 0.26 0.18 0.18
Chip->Fast
CopyMem 1.98 1.98 1.96 2.16 2.16 2.15 1.98
CopyMemQuick 1.98 1.98 1.98 2.16 2.16 2.16 1.98
Fast->Chip
CopyMem 1.92 1.91 1.92 1.90 1.90 1.90 1.90
CopyMemQuick 1.92 1.92 1.92 1.90 1.90 1.88 1.90
Chip->Chip
CopyMem 3.64 3.62 3.64 3.70 3.96 3.96 3.72
CopyMemQuick 3.62 3.62 3.62 3.70 3.94 3.94 3.72
History:
1.0 (12.Sep.1998)
- First public version.
1.1 (15.Sep.1998)
- V1.0 exits with a return code of 10 (=error), if it can't find
a 68040 or 68060 or can't get the necessary memory.
V1.1 exits, in this cases, with a return code of 20 (=fail).
- Fixed a mistake in the readme.
1.1b (19.Sep.1998)
(I didn't changed the Patch itself! It's the same as V1.1)
- Added the Testresults of MCP V1.30 into the readme.
- Added CMQ060beep and CMQ060beepCMQ (see above).
1.2 (29.Nov.1998)
- Added the Testresults of MCP V1.32b12 into the readme.
- Changed the source code.
There was a problem with a wrong written program which expects
the address of the last source byte +1 in A0 and the address
of the last destination byte +1 in A1.
This version of CMQ060 solves problems with such badly programs.
It's now 100 Bytes longer, but the speed is the same. Big moves
by the CopyMem function will be one or two cycles faster, but
you didn't recognize it.
1.3 (5.Jan.1999)
All changes made to this version doesn't effect the speed. They
are only to avoid problems with future versions of AMIGA OS.
- changed the version string to the "standard" format
- changed BMI to BCS and BPL to BCC
-> now CMQ030 could move blocks bigger than 2 GigaByte ;-)
1.4 (3.Apr.1999)
- CMQ060 now doesn't use Move16 into/from memory below $01000000
- added CMQ060Move16 (It's the same as CMQ060 V1.3)
- added the test results of CMQ030 (Does never use Move16)
|