Rewritten version (free for non-commercial; small price for commercial use) GitHub – pleriche/FastMM5: FastMM is a fast replacement memory manager for Embarcadero Delphi applications that scales well across multiple threads and CPU cores, is not prone to memory fragmentation, and supports shared memory without the use of external .DLL files.
Posted by jpluimers on 2020/05/05
It has been mentioned a few times already, but for my link archive: [WayBack] GitHub – pleriche/FastMM5: FastMM is a fast replacement memory manager for Embarcadero Delphi applications that scales well across multiple threads and CPU cores, is not prone to memory fragmentation, and supports shared memory without the use of external .DLL files.
Version 5 is a complete rewrite of FastMM. It is designed from the ground up to simultaneously keep the strengths and address the shortcomings of version 4.992:
- Multithreaded scaling across multiple CPU cores is massively improved, without memory usage blowout. It can be configured to scale close to linearly for any number of CPU cores.
- In the Fastcode memory manager benchmark tool FastMM 5 scores 15% higher than FastMM 4.992 on the single threaded benchmarks, and 30% higher on the multithreaded benchmarks. (I7-8700K CPU, EnableMMX and AssumeMultithreaded options enabled.)
- It is fully configurable runtime. There is no need to change conditional defines and recompile to change options. (It is however backward compatible with many of the version 4 conditional defines.)
- Debug mode uses the same debug support library as version 4 (FastMM_FullDebugMode.dll) by default, but custom stack trace routines are also supported. Call FastMM_EnterDebugMode to switch to debug mode (“FullDebugMode”) and call FastMM_ExitDebugMode to return to performance mode. Calls may be nested, in which case debug mode will be exited after the last FastMM_ExitDebugMode call.
- Supports 8, 16, 32 or 64 byte alignment of all blocks. Call FastMM_EnterMinimumAddressAlignment to request a minimum block alignment, and FastMM_ExitMinimumAddressAlignment to rescind a prior request. Calls may be nested, in which case the coarsest alignment request will be in effect.
- All event notifications (errors, memory leak messages, etc.) may be routed to the debugger (via OutputDebugString), a log file, the screen or any combination of the three. Messages are built using templates containing mail-merge tokens. Templates may be changed runtime to facilitate different layouts and/or translation into any language. Templates fully support Unicode, and the log file may be configured to be written in UTF-8 or UTF-16 format, with or without a BOM.
- It may be configured runtime to favour speed, memory usage efficiency or a blend of the two via the FastMM_SetOptimizationStrategy call.
…
Licence
FastMM 5 is dual-licensed. You may choose to use it under the restrictions of the GPL v3 licence at no cost to you, or you may purchase a commercial licence. A commercial licence includes all future updates. The commercial licence pricing is as follows:
Number Of Developers Price (USD) 1 developer $99 2 developers $189 3 developers $269 4 developers $339 5 developers $399 More than 5 developers $399 + $50 per developer from the 6th onwards Site licence (unlimited developers) $999 Please send an e-mail to fastmm@leriche.org to request an invoice before or after payment is made at https://www.paypal.me/fastmm (paypal@leriche.org). Support is available for users with a commercial licence via the same e-mail address.
FastMM4 is still free ([WayBack] GitHub – pleriche/FastMM4: A memory manager for Delphi and C++ Builder with powerful debugging facilities), but I recommend to consider switching as I think the focus will be on FastMM5.
It was made public a few days ago, but has had commits for months: [WayBack] Commits · pleriche/FastMM5 · GitHub
–jeroen
Maxim Masiutin said
Here is the single-threading performance comparison between FastMM5 (FastMM v5.01 dated Jun 12, 2020 and FastMM4-AVX v1.03 dated Jun 14, 2020). This test is run on Jun 16, 2020, under Intel Core i7-1065G7 CPU (base frequency: 1.3 GHz, 4 cores, 8 threads). Compiled under Delphi 10.3 Update 3, 64-bit target.
You can find the program, used to generate the benchmark data,at https://github.com/maximmasiutin/FastCodeBenchmark
You can find FastMM4-AVX branch at https://github.com/maximmasiutin/FastMM4-AVX
On the tests above demonstrated, FastMM4-AVX branch is faster than FastMM5.
Besides that, FastMM5 uses “Winapi.Windows.SwitchToThread” call in multi-threading in an attempt to obtain a lock of a block manager. The “SwitchToThread” call is not a very efficient way in a spin-lock loop. A better way, even recommended by Intel, is to use “pause” instruction, e.g. 5000 times, and only then if it would not help, call “SwitchToThread”. Usually, “pause” will help and the spin-lock will release before reaching 5000 iterations, so no “SwitchToThread” call will be needed.
The following should also be taken into consideration: (1) Each call to SwitchToThread() experiences the expensive cost of a context switch, which can be 10000+ cycles; (2) It also suffers the cost of ring 3 to ring 0 transitions, which can be 1000+ cycles; (3) SwitchToThread() may be of no use if no threads are in the ready state.
The FastMM4-AVX branch checks if the CPU supports SSE2 and thus the “pause” instruction, it uses “pause” spin-loop for 5000 iterations before calling “SwitchToThread”. If a CPU doesn’t have the “pause” instruction or Windows doesn’t have the SwitchToThread() API function, it will use EnterCriticalSection/LeaveCriticalSection.
jpluimers said
Thanks. Did you contact Pierre about this?
FastMM5 not free said
Not sure why you didn’t publish my comment that FastMM is not free for non-commerical, only for GPL3 code. There are other open source licences, that are incompatible with GPL3. And of course, software that is free (non-commercial), but not opensource.
jpluimers said
During my fight with rectum cancer, I will only occasionally run through blog comments.
Usually I leave comments from anonymous commenters unpublished.
None said
Free for non-commerical is misleading, since it is GPL3. What you mean is free only for GPL3 code, which is quite restrictive.
jpluimers said
Thanks for the addition.