AminetAminet
Search:
83253 packages online
About
Recent
Browse
Search
Upload
Setup
Services

util/boot/CopyMemAIO.lha

Mirror:Random
Showing:m68k-amigaosppc-amigaosppc-morphosi386-arosi386-amithlonppc-warpupppc-powerupgeneric
No screenshot available
Short:Speedup your programs & workbench
Author:Holger.Hippenstiel AT gmx.de
Uploader:Holger Hippenstiel nc-online de
Type:util/boot
Version:4.4
Replaces:util/boot/CopyMemAIO.lha
Architecture:m68k-amigaos >= 3.0.0
Distribution:Aminet
Kurz:Beschleunigt die Workbench/Programme
Date:2020-10-23
Download:http://aminet.net/util/boot/CopyMemAIO.lha - View contents
Readme:http://aminet.net/util/boot/CopyMemAIO.readme
Downloads:1285
CopyMemAIO V.4.4
===============

TL;DR CopyMem() is an essential function of exec.library, it's used a lot
by the operating system, so all os-functions and programs benefit from
replacing this function with a quicker one.
Install CopyMemAIO in C:, Call in Startup-Sequence after SetPatch or in User-
Startup. No need to run it, automaticly selects the best code for your cpu.
(Code for 680x0, 68040, 68060, 68080 & native x86)
Or put CopyMemAIO and its icon in the WBStartup-Folder.

For native CopyMem on WinUAE which is 2-3 times quicker, copy the
winuae_dll/CopyMemAIO.alib to your WinUAE-folder/winuae_dll.
You need to enable the Option under "Miscellaneous" -> "Allow native code".
*** Native Code only works on WinUAE_x86, not on x64 !! ***
There is no need to use the 64-Bitversion anyway.

In the past there where a lot of replacements to speed it up:

Feb.1993 by Arthur Hagen - 68000-68020 Copies best function to ramblock,
relies on AllocMem for alignment, uses mostly movem.l (ax)+,dn-dm/an-am
http://aminet.net/package/util/misc/CopyMemQuicker

Aug.1994 by Arthur Hagen - 68000-68020 Copies best function to ramblock,
aligns the codeentries to /16 divisable Adresses, JmpTable, Multiple movem.l
http://aminet.net/package/util/boot/COPMQR28

Oct.1996 by Allenbrand Brice - written for 68040, works with 68060
No copying/detaching, has to be started with "Run >NIL: ...",
code alignment purely dependend on hunk-loading. Single move16 in a loop ..
http://aminet.net/package/util/boot/PCM_1.0

May.1999 by Dirk Busse - written for 68030, works with 68020-68060, but all
use the same function, aligns the codeentries to /16 divisable Adresses,
enrolled move.l (an)+,(am)+ loops.
http://aminet.net/package/util/boot/CMQ030

Jul.1999 by Dirk Busse - written for 68060, works with 68040, but use the
same function, aligns the codeentries to /16 divisable Adresses,
enrolled move16 & move.l (an)+,(am)+ loops.
CMQ060 permantly checks for <$1000000 Address-Range (SAFE-Mode)
CMQ060Move16 does not check for <$1000000 Address-Range, so a bit faster.
http://aminet.net/package/util/boot/CMQ060

Nov.2000 by Harry "Piru" Sintonen - written for 68060, works with 68040,
but use the same function, aligns the codeentries to /16 divisable Adresses,
enrolled move16, movem.l & move.l (an)+,(am)+ loops.
Wont install on MorphOS.
http://aminet.net/package/util/boot/NewCMQ060

Aug.2009 by Matt Hey - three versions - written for 680x0, 68040 & 68060.
No copying/detaching, has to be started with "Run >NIL: ...",
code alignment purely dependend on hunk-loading.
Big enrolled move16 & move.l (an)+,(am)+ loops.
Best use of cache-size/burstloading.
But has to be run and single files for each processor and Safe-Mode.
http://aminet.net/package/util/boot/CopyMem

Aug.2020 by Holger Hippenstiel - mainly based on CopyMem by Matt Hey
Copies best function to ramblock, aligns the codeentries to /16 Adresses.
No need to run it.
Code for 680x0, 68040, 68060, 68080 & native x86.

I wrote a Benchmark which tests all Functions written so far and it can
test if the copymem-functions work correctly with different sizes.

The memory-layout is the same as CopyMemQuicker, so "TestIt" will believe
CopyMemQuicker is running.

For a fast emulation most important are the Advanced JIT Settings in WinUAE:
Cache Size: 16MB
Check FPU Support
Check Constant Jump
NoCheck Hard flush
Select Direct
Check No flags
Check Catch unexpected exceptions

If you want to try "TestIt" on WinUAE with a descent fast machine,
notice it will crash with Division by Zero, take a look at
http://aminet.net/package/util/boot/NoMoreDiv0 from me to fix that problem.

For real Amigas it can be started with Argument "S" or "SAFE",
then source and destination must be in 24bit-space for move16-operation.
The SAFE Option is only needed for controllers which can only do 24Bit-DMA,
like the A2091, but there is a driverpatch for that:
http://aminet.net/package/driver/media/vbak2091

Starting CopyMemAIO again removes the patches.

*****************************************************************************

Update V4.0:

Major rework and implemented native copymem for (Win)UAE.

Testresults from BenchCM:

*** Correction, BenchCM V1.7 and older tested for #1000000 "ticks", but in
reality it should test for EClock/Ticks_Value, so tests took about 60% to
long and thats the reason values are to high (and caching from x86).
Actual Values from BenchCM V1.8:

Intel Core i7-4790k 4.4Ghz 2400Mhz Ram
--------+----------+----------+----------+
Testsize|      64kb|      16kb|       4kb|
--------+----------+----------+----------+
CM0x0   | 9066 MB/s| 8886 MB/s| 8277 MB/s|
CM040   | 6322 MB/s| 5411 MB/s| 5728 MB/s|
CM060   | 6034 MB/s| 5691 MB/s| 5538 MB/s|
CMNative|22768 MB/s|20382 MB/s|16493 MB/s|

As you can see the native code is 2-3 times faster, but depending on the
processor the overhead for calling the native code is only worth above
1kb copysize, so when installing native code it will use CM0x0 below
1024 bytes.

A lot of functions will feel smoother now, Icon-Drawing/Window-Dragging
and so on, all use CopyMem().

Possible Arguments are now:
S=SAFE=SAFEMODE/S,V=VERBOSE/S,NN=NONATIVE/S:

SafeMode for Amigas with Zorro II-Controller who can only do 24bit-dma.
Verbose will output which code was installed/when it was removed.
NoNative will not use the native function, just the optimized 680x0 code.

Included MemTest from http://aminet.net/package/misc/emu/RaMithlon,
which copies different memoryblocks and checks if the code is working.

I get on a 4790k with WinUAE 4.4 68060-Emulation:
Size  |  Iter   | No CMQ | 040  | 0x0  | Native
------+---------+--------+------+------+-------
   4kb| 1000000 |   42   |  34  |  24  |   7
  16kb|  250000 |   40   |  25  |  22  |   5
  64kb|   22500 |   14   |   7  |   8  |   2
 256kb|    1125 |    4   |   2  |   2  |   0
1024kb|     350 |    4   |   2  |   2  |   0

Update 4.0b: Vampire-Machines got a problem with Native-Init, fixed.

Update 4.1:
Removed all old methods to test for UAE (using fixed address in case
uae.resource wasn't found) - because WinUAE will return completly
different adresses anyway.
This should fix crashes on native Amigas and under Aros.

New 68080 code which is relying on 68080-move16 ability to use any alignment.
This ability is not compatible with real CPUs or WinUAE's emulation and is
tested before using the new 68080-code, because they may change move16 back
to be fully compatible, in this case the 68040-code will be used (which was
faster on Vampire than the 68080-code in 4.0 anyway). I got no hardware/
emulation to test the new 68080-code, but it should be a bit faster.
Included CMBench & Sourcecode from Philippe Carpentier.
Thanks to Gunnar von Boehn from Apollo-Team for explaining details of Vampire-
implementation.

Update 4.2:
Oops, alignmentcheck for /16 destionation in 68080 code was still in there,
now removed. 68080 CopyMem & CopyMemQuick now go full ham Apollo/Vampire,
no more extra dataregister for alignment-checks btst #x,an to go .. :)
Removing some additional commands/a bit quicker due to 68080's abilities.

Update BenchCM V1.8:
More accurate Measurement of Time / Copyspeed, rolling buffer to prevent
caching.

Update 4.3:
This time small memcopies (which are used by the OS all the time) where the
mainfocus, 4.3 will do those 15% quicker than 4.2 on Vampire V2 & V4.
Native Copymem also uses a better method for small copies.
CMBench updated to V1.2 and modified it the same way as BenchCM, it now uses
a rolling buffer, so that caches/preloading/prefetching & burst wont modify
the real speed, now you can give a loop-multiplier as an argument.
Default is for V4 Vampire = 3, V2 users can use "CMBench 1", on WinUAE use
"CMBench 64".

Update 4.4:
Some small improvements for 68080 large copies 5% faster on V2 and bit faster
on Vampire V4 Standalone. Corrected the Speed-Values given in the V4.0 Update.

Many thanks to Renaud Schweingruber & Joshua Dolan for testing.

How to install:
Install CopyMemAIO in C:, Call in Startup-Sequence after SetPatch or in User-
Startup. No need to run it, automaticly selects the best code for your cpu.
(Code for 680x0, 68040, 68060, 68080 & native x86)
Or put CopyMemAIO and its icon in the WBStartup-Folder.

For native CopyMem on WinUAE which is 2-3 times quicker, copy the
winuae_dll/CopyMemAIO.alib to your WinUAE-folder/winuae_dll.
You need to enable the Option under "Miscellaneous" -> "Allow native code".
*** Native Code only works on WinUAE_x86, not on x64 !! ***
There is no need to use the 64-Bitversion anyway.

    DISCLAIMER

        This software is subject to the "Standard Amiga FD-Software Copyright
        Note". It is Giftware as defined in paragraph 4g. If you like it and
		use it regulary, please send me a small gift.
		For more information please read "AFD-COPYRIGHT".

        Diese Software unterliegt der "Standard Amiga FD-Software Copyright
        Note". Sie ist Giftware wie definiert in Absatz 4g. Falls du sie magst
		und regelmaessig benutzt, sende bitte ein kleines Geschenk.
		Fuer mehr Informationen lies bitte "AFD-COPYRIGHT".

        (/pub/aminet/docs/misc/AFD-FilesV-XX.lha V=Version,XX=Languages)

    AUTHOR

        Please send comments, bug-reports or small gifts like a Vampire V4
        or a now "worthless :P" NVidia RTX 2080 Ti, or Paypal me to:

        Holger.Hippenstiel AT gmx.de
        Hauptstr. 38
        71229 Leonberg
        Germany


Contents of util/boot/CopyMemAIO.lha
PERMISSION  UID  GID    PACKED    SIZE  RATIO METHOD CRC     STAMP     NAME
---------- ----------- ------- ------- ------ ---------- ------------ ----------
[unknown]                 2898    7381  39.3% -lh5- d49c Oct 27  1999 afd-copyright
[unknown]                  810    1576  51.4% -lh5- 07e8 Oct 11 21:11 AFD-COPYRIGHT.info
[unknown]                 1335    2256  59.2% -lh5- 449d Oct  7 07:26 benchmarks/BenchCM
[unknown]                 6416   10220  62.8% -lh5- cb11 Oct 11 16:57 benchmarks/CMBench
[unknown]                 1225    2028  60.4% -lh5- b128 Jan  6  2002 benchmarks/MemTest
[unknown]                 3308    6048  54.7% -lh5- 3eb8 Sep  9 10:54 benchmarks/TestIt
[unknown]                 1841    5476  33.6% -lh5- b93f Oct 23 09:09 CopyMemAIO
[unknown]                  889    1220  72.9% -lh5- 4116 Oct 11 21:05 CopyMemAIO.info
[unknown]                 3941    9432  41.8% -lh5- aa22 Oct 23 09:08 CopyMemAIO.txt
[unknown]                  978    1344  72.8% -lh5- a0a6 Oct 11 21:11 CopyMemAIO.txt.info
[unknown]                 2491    7468  33.4% -lh5- be22 Oct  7 07:26 source/BenchCM.s
[unknown]                 1986    7107  27.9% -lh5- d16e Oct 11 16:56 source/CMBench.c
[unknown]                 3138    9282  33.8% -lh5- a610 Oct 11 20:35 source/CopyMemAIO.s
[unknown]                 1017    3035  33.5% -lh5- 4d7f Sep 25 02:22 source/Func_CM040.s
[unknown]                  952    2567  37.1% -lh5- 9950 Sep 25 02:23 source/Func_CM060.s
[unknown]                  696    2028  34.3% -lh5- 24fc Oct 13 02:10 source/Func_CM080.s
[unknown]                  674    1703  39.6% -lh5- 13a0 Sep 25 02:29 source/Func_CM0x0.s
[unknown]                 1120    2991  37.4% -lh5- 7720 Oct 11 18:44 source/Func_CMNative.s
[unknown]                  897    2511  35.7% -lh5- db17 Sep 25 02:31 source/Func_CMQ040.s
[unknown]                  807    2016  40.0% -lh5- 0730 Sep 25 02:32 source/Func_CMQ060.s
[unknown]                  569    1290  44.1% -lh5- c207 Oct 13 02:12 source/Func_CMQ080.s
[unknown]                  578    1384  41.8% -lh5- 2788 Sep 25 02:34 source/Func_CMQ0x0.s
[unknown]                 1022    4220  24.2% -lh5- e7d5 Oct 20 07:45 source/Func_SmallCopy.s
[unknown]                  830    3379  24.6% -lh5- fc0b Oct 13 04:48 source/Func_SmallCopy080.s
[unknown]                 5826   11264  51.7% -lh5- 58a0 Sep 17 15:30 winuae_dll/CopyMemAIO.alib
---------- ----------- ------- ------- ------ ---------- ------------ ----------
 Total        25 files   46244  109226  42.3%            Oct 24 04:23

Aminet © 1992-2020 Urban Müller and the Aminet team. Aminet contact address: <aminetaminet net>