Short: For all CPU's 68000-68080 & UAE-x86 Author: Holger.Hippenstiel AT gmx.de Uploader: Holger Hippenstiel nc-online de Type: util/boot Version: 5.0 Replaces: util/boot/CopyMemAIO.lha Architecture: m68k-amigaos Distribution: Aminet Kurz: Fuer alle CPU's 68000-68080 & UAE-x86 CopyMemAIO V5.0 =============== TL;DR CopyMem() is an essential function of exec.library, it's used a lot by the operating system, so all os-functions and programs benefit from replacing this function with a quicker one. Install CopyMemAIO in C:, Call in Startup-Sequence after SetPatch or in User- Startup. No need to run it, automaticly selects the best code for your cpu. (Code for 68000/68010, 68020/68030, 68040, 68060, 68080 & native x86) Or put CopyMemAIO and its icon in the WBStartup-Folder. For native CopyMem on WinUAE which is 2-3 times quicker, copy the winuae_dll/CopyMemAIO.alib to your WinUAE-folder/winuae_dll. You need to enable the Option under "Miscellaneous" -> "Allow native code". *** Native Code only works on WinUAE_x86, not on x64 !! *** There is no need to use the 64-Bitversion anyway. In the past there where a lot of replacements to speed it up: Feb.1993 by Arthur Hagen - 68000-68020 Copies best function to ramblock, relies on AllocMem for alignment, uses mostly movem.l (ax)+,dn-dm/an-am http://aminet.net/package/util/misc/CopyMemQuicker Aug.1994 by Arthur Hagen - 68000-68020 Copies best function to ramblock, aligns the codeentries to /16 divisable Adresses, JmpTable, Multiple movem.l http://aminet.net/package/util/boot/COPMQR28 Oct.1996 by Allenbrand Brice - written for 68040, works with 68060 No copying/detaching, has to be started with "Run >NIL: ...", code alignment purely dependend on hunk-loading. Single move16 in a loop .. http://aminet.net/package/util/boot/PCM_1.0 May.1999 by Dirk Busse - written for 68030, works with 68020-68060, but all use the same function, aligns the codeentries to /16 divisable Adresses, enrolled move.l (an)+,(am)+ loops. http://aminet.net/package/util/boot/CMQ030 Jul.1999 by Dirk Busse - written for 68060, works with 68040, but use the same function, aligns the codeentries to /16 divisable Adresses, enrolled move16 & move.l (an)+,(am)+ loops. CMQ060 permantly checks for <$1000000 Address-Range (SAFE-Mode) CMQ060Move16 does not check for <$1000000 Address-Range, so a bit faster. http://aminet.net/package/util/boot/CMQ060 Nov.2000 by Harry "Piru" Sintonen - written for 68060, works with 68040, but use the same function, aligns the codeentries to /16 divisable Adresses, enrolled move16, movem.l & move.l (an)+,(am)+ loops. Wont install on MorphOS. http://aminet.net/package/util/boot/NewCMQ060 Aug.2009 by Matt Hey - three versions - written for 680x0, 68040 & 68060. No copying/detaching, has to be started with "Run >NIL: ...", code alignment purely dependend on hunk-loading. Big enrolled move16 & move.l (an)+,(am)+ loops. Best use of cache-size/burstloading. But has to be run and single files for each processor and Safe-Mode. http://aminet.net/package/util/boot/CopyMem Aug.2020 by Holger Hippenstiel - mainly based on CopyMem by Matt Hey Copies best function to ramblock, aligns the codeentries to /16 Adresses. No need to run it. Added more and more Code and SmallCopy-Functions because those are used by the OS most of the time. Code for 68000/68010, 68020/68030, 68040, 68060, 68080 & native x86. No OS restrictions, even the oldest Kickroms can use it .. I wrote a Benchmark which tests all Functions written so far and it can test if the copymem-functions work correctly with different sizes. The memory-layout is the same as CopyMemQuicker, so "TestIt" will believe CopyMemQuicker is running. For a fast emulation most important are the Advanced JIT Settings in WinUAE: Cache Size: 16MB Check FPU Support Check Constant Jump NoCheck Hard flush Select Direct Check No flags Check Catch unexpected exceptions If you want to try "TestIt" on WinUAE with a descent fast machine, notice it will crash with Division by Zero, take a look at http://aminet.net/package/util/boot/NoMoreDiv0 from me to fix that problem. For real Amigas it can be started with Argument "S" or "SAFE", then source and destination must be in 24bit-space for move16-operation. The SAFE Option is only needed for controllers which can only do 24Bit-DMA, like the A2091, but there is a driverpatch for that: http://aminet.net/package/driver/media/vbak2091 Starting CopyMemAIO again removes the patches. ***************************************************************************** Update V4.0: Major rework and implemented native copymem for (Win)UAE. Testresults from BenchCM: *** Correction, BenchCM V1.7 and older tested for #1000000 "ticks", but in reality it should test for EClock/Ticks_Value, so tests took about 60% to long and thats the reason values are to high (and caching from x86). Actual Values from BenchCM V1.8: Intel Core i7-4790k 4.4Ghz 2400Mhz Ram --------+----------+----------+----------+ Testsize| 64kb| 16kb| 4kb| --------+----------+----------+----------+ CM0x0 | 9066 MB/s| 8886 MB/s| 8277 MB/s| CM040 | 6322 MB/s| 5411 MB/s| 5728 MB/s| CM060 | 6034 MB/s| 5691 MB/s| 5538 MB/s| CMNative|22768 MB/s|20382 MB/s|16493 MB/s| As you can see the native code is 2-3 times faster, but depending on the processor the overhead for calling the native code is only worth above 1kb copysize, so when installing native code it will use CM0x0 below 1024 bytes. A lot of functions will feel smoother now, Icon-Drawing/Window-Dragging and so on, all use CopyMem(). Possible Arguments are now: S=SAFE=SAFEMODE/S,V=VERBOSE/S,N=NN=NONATIVE/S: SafeMode for Amigas with Zorro II-Controller who can only do 24bit-dma. Verbose will output which code was installed/when it was removed. NoNative will not use the native function, just the optimized 680x0 code. Included MemTest from http://aminet.net/package/misc/emu/RaMithlon, which copies different memoryblocks and checks if the code is working. I get on a 4790k with WinUAE 4.4 68060-Emulation: Size | Iter | No CMQ | 040 | 0x0 | Native ------+---------+--------+------+------+------- 4kb| 1000000 | 42 | 34 | 24 | 7 16kb| 250000 | 40 | 25 | 22 | 5 64kb| 22500 | 14 | 7 | 8 | 2 256kb| 1125 | 4 | 2 | 2 | 0 1024kb| 350 | 4 | 2 | 2 | 0 Update 4.0b: Vampire-Machines got a problem with Native-Init, fixed. Update 4.1: Removed all old methods to test for UAE (using fixed address in case uae.resource wasn't found) - because WinUAE will return completly different adresses anyway. This should fix crashes on native Amigas and under Aros. New 68080 code which is relying on 68080-move16 ability to use any alignment. This ability is not compatible with real CPUs or WinUAE's emulation and is tested before using the new 68080-code, because they may change move16 back to be fully compatible, in this case the 68040-code will be used (which was faster on Vampire than the 68080-code in 4.0 anyway). I got no hardware/ emulation to test the new 68080-code, but it should be a bit faster. Included CMBench & Sourcecode from Philippe Carpentier. Thanks to Gunnar von Boehn from Apollo-Team for explaining details of Vampire- implementation. Update 4.2: Oops, alignmentcheck for /16 destionation in 68080 code was still in there, now removed. 68080 CopyMem & CopyMemQuick now go full ham Apollo/Vampire, no more extra dataregister for alignment-checks btst #x,an to go .. :) Removing some additional commands/a bit quicker due to 68080's abilities. Update BenchCM V1.8: More accurate Measurement of Time / Copyspeed, rolling buffer to prevent caching. Update 4.3: This time small memcopies (which are used by the OS all the time) where the mainfocus, 4.3 will do those 15% quicker than 4.2 on Vampire V2 & V4. Native Copymem also uses a better method for small copies. CMBench updated to V1.2 and modified it the same way as BenchCM, it now uses a rolling buffer, so that caches/preloading/prefetching & burst wont modify the real speed, now you can give a loop-multiplier as an argument. Default is for V4 Vampire = 3, V2 users can use "CMBench 1", on WinUAE use "CMBench 64". Update 4.4: Some small improvements for 68080 large copies 5% faster on V2 and bit faster on Vampire V4 Standalone. Corrected the Speed-Values given in the V4.0 Update. Many thanks to Renaud Schweingruber & Joshua Dolan for testing. Update 4.5: 68040 & 68060 now also got the SmallCopy-Function which Vampire has since 4.3 Update 4.6: OS-Restriction lowered to OS2.0, so all old machines could benefit from it. Massive rework of the code, no multiple SmallCopy anymore, clever structure for code-reuse resulting in new codesize of only 5636 bytes (11492 before), so codesize only 49% compared with V4.5. Update 4.7: As a tribute to http://a1k.org removed ALL OS-Restrictions, so even a barebone A1000 can use it, but Native doesn't work with OS1.x. :P And the options are limited to single chars, s=safe & v=verbose. Cosmetical bug removed, it looked like it was possible to install 68060 code when using SAFEMODE & NONATIVE with Uae, in Reality 680x0 was used, because it's the fastest with emulation. Oopsi, a small bug in a compare resulted in 680x0-code to be used everytime. I noticed when BenchCM "Current CopyMem" only showed results similar to 680x0 code. :P Update 4.8: Oops, the "OldOS"-Code caused Problems on some machines, fixed now. Update 4.9: Well, nobody ever tried to use it with a real 68000 until today - and it crashed, but with extensions like ACA500plus with upto 42 Mhz 68000 a quicker CopyMem would make sense .. so here it is with plain 68000 support. Update 5.0: 68020 & 68030 now got SmallCopy-Function aswell. After 350+ Coderevsions, this will probably be the final version, nothing left to implement/optimize/fix anymore. :D How to install: Install CopyMemAIO in C:, Call in Startup-Sequence after SetPatch or in User- Startup. No need to run it, automaticly selects the best code for your cpu. (Code for 68000/68010, 68020/68030, 68040, 68060, 68080 & native x86) Or put CopyMemAIO and its icon in the WBStartup-Folder. For native CopyMem on WinUAE which is 2-3 times quicker, copy the winuae_dll/CopyMemAIO.alib to your WinUAE-folder/winuae_dll. You need to enable the Option under "Miscellaneous" -> "Allow native code". *** Native Code only works on WinUAE_x86, not on x64 !! *** There is no need to use the 64-Bitversion anyway. Check out my other tools: http://aminet.net/search?readme=%22Holger+Hippenstiel%22&sort=date&ord=DESC DISCLAIMER This software is subject to the "Standard Amiga FD-Software Copyright Note". It is Giftware as defined in paragraph 4g. If you like it and use it regulary, please send me a small gift. For more information please read "AFD-COPYRIGHT". Diese Software unterliegt der "Standard Amiga FD-Software Copyright Note". Sie ist Giftware wie definiert in Absatz 4g. Falls du sie magst und regelmaessig benutzt, sende bitte ein kleines Geschenk. Fuer mehr Informationen lies bitte "AFD-COPYRIGHT". (/pub/aminet/docs/misc/AFD-FilesV-XX.lha V=Version,XX=Languages) AUTHOR Please send comments, bug-reports or small gifts like a Vampire V4 or a now "worthless :P" NVidia RTX 2080 Ti, or Paypal me to: Holger.Hippenstiel AT gmx.de Hauptstr. 38 71229 Leonberg Germany