Short: Patch CopyMem/Quick for 68060(040) v1.1 Author: matthey7@gmail.com (Matt Hey) Uploader: matthey7 gmail com (Matt Hey) Type: util/boot Version: 1.1 Requires: 68020+, Kickstart 2.04 Architecture: m68k-amigaos >= 2.04 Description: This is a small patch which replaces the CopyMem and CopyMemQuick functions of exec.library. These functions, especially CopyMem, are used a lot. I logged the use of this function in ram and ran out of memory in about a minute with over 100MB of memory. CopyMem060 is optimized for the 68060 processor. CopyMem040 is optimized for the 68040 processor. CopyMem0x0 should run well on a 68040 or 68060 if the move16 instruction is buggy or caching problems exist. It should run on a 68020 or 68030 processor as well but is not optimized for them. CopyMem060 and CopyMem040 test for at least a 68040 processor. If it can't find one, it doesn't install the patch and exits with a return code of 20 (=fail). The code is based in part on Harry "Piru" Sintonen's CMQ060 which is very good. Thanks Piru. However, I thought it tried to do too much and could be faster with small copies. I also wanted to learn how to optimize for the 68060 and see what's possible. Features: Much faster than Piru's CMQ060 at small memory copies which are common. Doesn't use movem as it's the same speed as move on the 68060 and slower on the 68040. Doesn't touch the MMU (directly) which is proper when using Thomas Richter's mmu.library and 68060.library and also unnecessary. Installation is fast, doesn't fragment memory, and doesn't use much memory. The full source code is included. Assembled with PhxAss. Free. Installation: Copy your flavor of CopyMem wherever you like. It runs from Workbench but I recommend starting from cli in the S:startup-sequence after the setpatch command. It does not detach from the shell so the command needed is... Run >NIL: CopyMem060 As little as 512 bytes of stack should be fine. A control C will break CopyMem causing it to uninstall which is dangerous because of how the exec setpatch function works. Notes about Move16: move16 is an instruction of the 68040 and 68060 processors. It moves 16 bytes at once and uses burst accesses if possible. Andreas Kleinert and Thomas Richter said there could be problems with the move16 instruction on the Amiga, especially in the chipram, caused by the DMA of the custom chips. Personally, I have tested move16 with chip ram and it doesn't seem to be a problem here. SysSpeed by Torsten Bach uses the move16 instruction in chip ram and it's stable across a wide range of Amigas. The 68040 and 68060 user manuals are not clear on the subject. My opinion is that if the memory areas are mapped correctly by the MMU so that burst will not take place where it shouldn't, that the move16 instruction will not use burst in those areas and will be safe to use. I have included CopyMem040.safe and CopyMem060.safe that use move16 in fast ram only. They are slightly slower and larger. Debugging: I have included a Snoopy 2.0 (Aminet:util/moni/snoopy20.lha) script that shows the value of A0 and A1 on return of CopyMem and CopyMemQuick. The source + size should equal A0 on return and destination + size should equal A1 on return with CopyMem060 only. Please report these values if they are not. Also, be careful about logging all calls (default) as memory and buffers will fill up *very* fast. It's best to use a filter so just the programs wanted are logged. The Snoopy output location is specified in the script's icon (tooltypes). Please report all bugs to the e-mail address at the top. Speed comparison: CopyMemQuicker V2.8 on Aminet:util/boot/COPMQR28.lha has a "TestIt" program that is good for speed comparisons. Here is how you test CopyMem060 from a cli... CopyMem060 >NIL: CopyMemQuicker TestIt and CMQ060... CMQ060 CopyMemQuicker TestIt and AmigaOS 3.9 default... CopyMemQuicker TestIt Some test results... Using "TestIt" from CopyMemQuicker V2.8 orig=original AmigaOS 3.9 routines CMQ28=CopyMemQuicker v2.8 MCP=MCP v1.46 CMQ060=CopyMemQuick v1.5 CM060=CopyMem060 v1.1 orig CMQ28 MCP CMQ060 CM060 1.30 1.28 1.18 0.91 0.91 CM565×64kB L->L 0.64 0.43 0.35 0.35 0.35 CM147×64kB L->L+1 1.06 1.06 0.96 0.99 0.96 CM413×64kB L->E 0.61 0.43 0.35 0.36 0.34 CM147×64kB L->E+1 0.60 0.41 0.35 0.35 0.33 CM147×64kB L+1->L 1.01 0.86 0.79 0.61 0.61 CM382×64kB L+1->L+1 0.66 0.43 0.33 0.35 0.34 CM147×64kB L+1->E 1.18 1.18 1.19 1.21 1.19 CM501×64kB L+1->E+1 1.20 1.18 1.16 1.18 1.16 CM501×64kB E->L 0.59 0.41 0.35 0.35 0.35 CM147×64kB E->L+1 1.01 0.86 0.79 1.61 0.61 CM382×64kB E->E 0.64 0.43 0.35 1.35 0.34 CM147×64kB E->E+1 0.59 0.41 0.35 0.35 0.35 CM147×64kB E+1->L 1.06 1.06 0.98 0.96 0.98 CM413×64kB E+1->L+1 0.60 0.41 0.35 0.34 0.35 CM147×64kB E+1->E 1.30 1.26 1.16 0.91 0.90 CM564×64kB E+1->E+1 0.30 0.29 0.26 0.26 0.23 CM33900×1kB L->L 0.39 0.21 0.13 0.13 0.13 CM9400×1kB L->L+1 0.45 0.18 0.16 0.18 0.18 CM24000×1kB E->E 0.36 0.30 0.26 0.24 0.21 CM196000×128B L->L 0.48 0.26 0.21 0.20 0.16 CM155000×128B E->E 0.54 0.41 0.45 0.34 0.30 CM588000×19B L->L 0.55 0.33 0.46 0.31 0.30 CM622000×18B L->L 0.48 0.45 0.44 0.33 0.31 CM663000×17B L->L 0.53 0.48 0.53 0.46 0.39 CM956000×16B L->L 0.56 0.48 0.54 0.35 0.16 CM1060000×8B L->L 0.51 0.43 0.53 0.26 0.06 CM1430000×4B L->L 0.45 0.41 0.83 0.38 0.18 CM2190000×1B L->L 1.30 1.28 1.18 0.91 0.91 CMQ565×64kB L->L 0.30 0.30 0.24 0.25 0.23 CMQ33900×1kB L->L 0.34 0.28 0.23 0.21 0.20 CMQ196000×128B L->L 0.46 0.43 0.46 0.36 0.28 CMQ956000×16B L->L 0.34 0.40 0.46 0.25 0.15 CMQ1060000×8B L->L 0.24 0.36 0.48 0.15 0.09 CMQ1430000×4B L->L ---- ---- ---- ---- ---- 22.79 19.53 18.99 15.89 14.69 14.30% speedup for CMQ v2.8 16.67% speedup for MCP v1.46 30.28% speedup for CMQ060 v1.5 35.54% speedup for CopyMem060 v1.1 Actual results will vary but these are typical when some of the data to be copied is in the data cache. Turning off the data cache and testing will show results if none of the copy data is in the data cache. The move16 instruction has an even larger speed advantage when the data to be copied is not in the data cache. Results sent to me from Amigian show a larger speedup with CopyMem040 on the 68040. Future: I may make a Blizkick/Remus patch for the AmigaOS 3.9 BB2 exec.library. I may make a CPUpatch that detects the proper CPU and installs CPU specific patches. I may add a mmu.library protected option to protect the patches and possibly speed them up more. I may make an optimized CopyMem020 and/or CopyMem030 for the 68020 and 68030 if I find there is enough benefit and interest. History: 1.0 (02.05.09) First release 1.1 (28.06.09) Fixed bugs in CopyMem0x0 and CopyMem060.safe A new CopyMem040 thanks to Amigian's testing A few optimizations in CopyMem060 and much smaller now Thanks: Amigian for bug and performance testing CopyMem040. Harry "Piru" Sintonen and Dirk Busse for CMQ060. Arthur Hagen for his TestIt program.