Introduction
~~~~~~~~~~~~
TLSFMem is an implementation of a very new memory allocation system called
TLSF (two level segregated fit). TLSF was described in a paper by the three
spanish researchers M. Masmano, I. Ripoll, A. Crespo.
Originally designed for Realtime Operating Systems, all allocation and free
operations run with constant time complexity (O(1)). This is a major
improvement over the original AmigaOS memory system, which gets slower
while memory gets fragmented (O(m) where m is the number of fragments).
Moreover, the old AmigaOS allocator uses a first fit strategy, which causes
the memory to fragment pretty quickly. TLSF is an exact fit allocator for
memory blocks smaller than 512 bytes and a good fit allocator for all other
sizes: It will always find a free block which is always smaller than 103%
of the requested block.
***************************************************************************
TLSFMem is blindly fast and will reduce memory fragmentation significantly!
***************************************************************************
TLSFMem was written in optimized assembly language, but more importantly,
it uses these clever constant time algorithms. (There is really no use in
programs that are written in assembly, when the algorithm sucks (e.g. naïve
pattern matching algorithm, like in SearchStar vs. sub-linear Rabin-Karp).)
This software comes as freeware, but as I spent a lot of time in developing
it, you are welcome to donate something. Audio CDs are welcome and there's
an Amazon.de wishlist aswell. No PayPal.
FAQ
~~~
Q: Does this 103% stuff mean it wastes memory (internal fragmentation)?
A: No. It only needs four extra bytes per allocation. The only internal
fragmentation happening is for its alignment of 16 bytes per allocation.
Q: Does it run on WarpOS / PowerUP?
A: The last version V1.3 didn't due to a bug in AllocAbs. The new one
should run under PowerUP, because according to Laire, PowerUP calls the
exec functions. I suspect WarpOS still not to work because it's said to
have its own PPC native implementation of the exec memory functions,
which will cause WarpOS to run out of memory. This is due to TLSFMem
allocating its memory from the normal memory lists and pouring each into
TLSF mem headers, which would appear empty to the standard exec
routines.
Q: There was another question here before. Where did this important
question go?
A: A fascist idiot and troll blackmailed me for mentioning his realname
here.
Q: Will it speed up any program or just those written for it in mind?
A: It will speed up all programs using the standard exec memory functions.
It will not speed up programs that use their own allocators (but these
are scarce).
Q: Will TLSFMem (not TLSFMemPool) also speed up Memory Pools?
A: Yes, as memory pools also depend on AllocMem() allocations when building
new puddles. However, memory allocations within a puddle remain
unchanged (I think exec from 3.9 BB2 uses AVL trees which have amortized
O(log n) complexity, which is "good enough" -- but I don't know how
exec <V45 memory pools are organized internally).
Q: So why should I use TLSFMemPool then?
A: Dunno. Maybe for that little bit of extra speed. It has a higher memory
overhead and increased fragmentation though (no pooling).
Q: Is it really faster? How much?
A: It is not faster in a system with only a few free memory chunks (we're
talking here about 20 and less). But the fragmentation will increase
while the systems is running, so without TLSFMem, the performance will
degrade. How much? You are free to run tools like FragIt in the
background to experience the slow-down in real-time.
Q: It doesn't work here.
A: Remove patches like MemOptimizer, AllocP, AllocP32, PoolMem.
Don't run debugging tools like Enforcer, MuForce, MuGuardianAngel,
Mungwall, etc.
A: I've found several docs regarding a design flaw in layers.library that
will attempt to release partial memory blocks. I didn't encounter this
here, so it WorksForMe(TM). But TLSFMem does not have a workaround for
layers.library like other patches might have... If you get a GURU with
number #715F0041, this might be the problem.
A: Send a bug report. A real one. With helpful information. Please.
Requirements
~~~~~~~~~~~~
TLSFMem will run on Kick 3.0 and higher systems. It will work nicely along
Piru's exec44. It can be installed in a FlashRom such as the Romulus,
Algor, or Deneb. It also can be included in a customly build Kickstart rom
with the Remus/RomSplit-Tool (e.g. for BlizKick). Please only use TLSFMem,
not TLSFMemPool for these cases.
If you don't want to use these features, you can still just install it
residently or run it from your startup-sequence (or user-startup).
However, if you decide to run it very early in the system (Kickstart,
FlashRom, reset-resident), you will have to make sure that you patch the
scsi.device, because there is a bug in it that will kill TLSFMem (see
below).
Usage
~~~~~
To simply try it out, just run TLSFMem or TLSFMemPool in a shell. This will
install patches to the following exec.library functions:
- AllocMem()
- FreeMem()
- AvailMem()
- AllocAbs()
- Allocate()
- Deallocate()
- AddMemList()
TLSFMemPool installs the following additional patches:
- CreatePool()
- DeletePool()
- AllocPooled()
- FreePooled()
TLSFMem cannot be started from Workbench, don't try this by giving it an
icon. When run from shell, it will not output anything. If successfully
installed, an error code of 0 (OK) will returned, 5 (WARN) when the patch
has already been installed or 20 (FAIL), if something went wrong. There is
no message reported back. There is no need to run it with "Run", as it will
not block.
TLSFMemPool will also patch memory pools to use the O(1) allocation system,
but then memory will not be pooled together into puddles. There is little
additional benefit in using these memory pool patches, as exec.library will
still call TLSF functions to allocate the puddles itself and larger
allocations. Also, TLSF pooled allocations have an overhead of 8 bytes,
which could be a waste compared to the small allocation usually happening
with memory pools. TLSFMemPool is not compatible with the AROS/MorphOS
extension that provides Semaphore protection.
Do not run programs such as PoolMem or MemOptimizer at the same time.
TLSFMem cannot be run with MungWall or MuForce.
On start, TLSFMem will run through the existing memory headers and will
convert large chunks of at least 512 KB size to new TLSF memory headers
(the threshold is used to avoid adding dozens of memory headers to the
system). If the memory header was still completely unused, it will eat it
alive.
TLSFMem obviously cannot be removed without resetting the system.
TLSFDebug will report some internal stats to serial debug (or Sashimi if
you have this running). This is a standalone program, not a debug version
of TLSFMem.
Theory of Operation
~~~~~~~~~~~~~~~~~~~
I suggest you read the paper of the spanish researchers mentioned above,
which can be found easily by searching for "TLSF Allocator", if you are
interested in details on how it works. In summary TLSF works by build two
level bitmaps of size categories for free memory chunks. The first level
bitmap indicates the existence of at least one free block between 2^n and
2^(n+1), whereas the second level bitmap exists for each of the coarse
categories and thus splits each into another 32 smaller categories. The
magic comes from a MC68020+ instruction (bfffo) which will find the first
set bit in these bitmaps in constant time.
The complexity of the operations can be given as following:
Practical case Theoretical worst case
- AllocMem() O(1)* (O(h))
- FreeMem() O(1)* (O(h))
- AvailMem() O(1)** (O(h*floor(t/c)))
- AllocAbs() O(1)*** (O(m))
- Allocate() O(1)
- Deallocate() O(1)
- AddMemList() O(1)
- CreatePool() O(1)
- DeletePool() O(n)****
- AllocPooled() O(1)
- FreePooled() O(1)
The complexities given in brackets are theoretically worst case
complexities, which will actually not occur in 99,999999999999% of the
uses. The input value "h" represents the number of memory headers in the
system (very few), "t" the amount of total free memory in bytes and "c" the
size of the largest chunk of memory. In practice, all operations are
executed in constant time with very few CPU instructions.
* These functions iterate over the available memory headers (usually only a
few), but in 99,9% of all memory allocation cases, the allocation and
deallocations will be satisfied by the very first memory header, which is a
TLSF one. The number of memory headers usually doesn't change, hence these
operations can be assumed O(1). The original functions /also/ need to
iterate over the memory headers, so this is not some drawback introduced by
TLSFMem.
** AvailMem() with MEMF_LARGEST will find the category of a memory header,
where the largest block is stored in O(1), but it theoretically needs to
iterate of the blocks inside this category (there are up to 768 size
categories in total). In nearly 100% of the cases, this category will hold
exactly one memory block, namely the largest one. Theoretically, there can
be up to t/c such blocks (where t is the total free memory in bytes and c
is the size of the largest free memory block. It is clear that t/c will
normally be 1, and only if memory starts running low, it could happen that
t/c raises to 2 or more. In theory.
*** AllocAbs() needs to iterate over the free chunks to find the correct
one. Hence, worst case complexity is O(m), m being the number of free
chunks. But as it starts with the largest chunk going to smaller ones, it
is very probable that it will find the chunk *very* few iterations.
AllocAbs() is a very seldom operation anyway and usually only used by
aligned memory operations. Might be worth noticing that TLSFMem returns the
pointer of the requested block instead of the one actually reserved (as
stated in the AutoDocs), because a lot of programs seem to crash otherwise.
For freeing memory allocated by AllocAbs(), you would need to provide the
pointer of the originally requested block anyway, and NOT the one returned
by AllocAbs(). It seems that most programs don't care about this, but
TLSFMem does not allow memory to be freed partially.
**** When patched, DeletePool() is the only operation, which potentially
might take longer than the original one. Because the currently
implementation is very simple, it will use linear time to free the
remaining allocated chunks. But I guess most programs will empty their
pools manually anyway, instead of calling DeletePool() on a fully loaded
pool.
Benchmarks
~~~~~~~~~~
I've included the program AllocatorBenchmark for your own experiments. The
milage will greatly vary on the fragmentation level of your system without
TLSF. I've measured factors of 30 and more.
For example:
AllocatorBenchmark RANDOMLY
TLSFMem
AllocatorBenchmark RANDOMLY
AllocatorBenchmark will currently not measure Memory Pool performance.
scsi.device Bug
~~~~~~~~~~~~~~~
There is a bug in all known version of the scsi.device (both IDE and NCR
SCSI). The devices will allocate an IORequest structure (32 bytes big) and
then use it as IOStdReq structure (which is 48 bytes big), overwriting
innocent memory past the 32 allocated bytes. By chance, this seems to have
no noticable effect on the standard AmigaOS allocator, but immediately
kills TLSFMem, as a vital pointer in its internal structures are
overwritten. Don't get me wrong: This is a bug in the scsi.device and that
TLSFMem triggers it makes it no bug of TLSFMem.
I've included spatch diffs for the AmigaOS Rom Update files for three
different versions and the corresponding IDE and SCSI devices. In case you
are unable to patch these files (I don't know if mine were from the
original boing bags, because there are IDE speedup patches and I might have
already applied them to the files), you can still manually fix the devices,
by searching for the hex string "422D xxxx 7020 6100" and replace it by
"422D xxxx 7030 6100" (there are multiple occurrences in the AmigaOS Rom
Update files).
You won't need to fix these bugs if you only start TLSFMem in
startup-sequence or user-startup. If you want to benefit from its features
very early during booting, running it residently or from flashrom is
recommended.
Pros and Cons
~~~~~~~~~~~~~
+ Memory allocations are very fast O(1) and remain fast during the whole
runtime of the system.
+ Minimal fragmentation.
+ Compatible with all system conformly written programs.
+ Option to patch the memory pool system, too.
- Currently doesn't work with WarpOS (should do with PowerUP).
- Uses four additional bytes per allocation and has a minimum chunk size of
16 instead of 8.
- Is not compatible with various development tools such as MungWall,
MuForce, etc.
- Is not compatible with broken programs that try to free partial memory
blocks. This never was legal and never will be.
- Is very sensitive to programs overwriting innocent memory past allocated
memory or using memory freed before.
- Currently does not support MEMF_REVERSE (could be added, but there is not
much sense in this).
History
~~~~~~~
V1.4 (22-Oct-07):
- Fixed enforcer hit (memory write to 8) reported by Bernd Rösch.
- Fixed TLSF memory being counted multiple times, if the original mem
header was present for AvailMem(MEMF_TOTAL).
- Fixed a slight inaccuracy for AvailMem(MEMF_LARGEST) if there were
multiple blocks of the same category or multiple TLSF memory headers.
- When eating a memory header, random MEMF flags were used. Fixed.
- When converting memory headers and the TLSF Mem header is at the end
of the original memory header, it will shrink the original one to
remove the overlap.
- AllocatorBenchmark was broken and would always use MEMF_CLEAR.
- SaferPatches, PatchControl and (probably) SetMan would crash TLSFMem
if started before, because they would try to allocate memory while
TLSFMem calls SetFunction to modify the allocation routine, even before
it could store the old function pointers itself. Hence, SaferPatches
would call the patched TLSF pointer for memory, and TLSF would call
a NULL pointer for legacy memory routines.
To fix this catch22, TLSFMem now also implements the very last legacy
routines Allocate() and Deallocate() itself.
- TLSFMem would crash when run in ROM space, because it tried to save the
old SetFunction() pointers in its code. Now that it doesn't call legacy
routines anymore, this is fixed. However, TLSFMemPool still needs to
store the old function pointers -- When run as resident TLSFMemPool
will not patch memory pool functions.
- Fixed some glitches in CreatePool() when running out of memory.
- MEMF_CLEAR memory clearing now happens outside of Forbid state.
- Fixed some evil glitches in AllocAbs() which could cause aligned memory
allocations operations to fail when memory got fragmented. This could
also explain why PowerUP programs refused to run.
- Enabled minimum sanity check on FreeMem() for correct size parameter.
Will recoverably GURU with $715F0041 when mismatch is detected.
V1.3 (14-Oct-07):
- First public release.
Contact
~~~~~~~
Any mail, comments or donations welcome:
Email: chrisly(!)platon42.de
WWW: http://www.platon42.de/
Wishlist: http://www.amazon.de/gp/registry/wishlist/1123KBN787GJQ
|