The Wiert Corner – irregular stream of stuff

Jeroen W. Pluimers on .NET, C#, Delphi, databases, and personal interests

  • My badges

  • Twitter Updates

  • My Flickr Stream

  • Pages

  • All categories

  • Enter your email address to subscribe to this blog and receive notifications of new posts by email.

    Join 2,343 other followers

Delphi: There is a FastMM4 fork with AVX support and multi-threaded enhancements (faster locking) – how will it impact floating point heavy applications (as Delphi uses SSE instructions for floating point)

Posted by jpluimers on 2018/05/01

Interesting fork of FastMM4 for which I now think I understand why it is not merged into the regular FastMM4 repository: [WayBack] GitHub – maximmasiutin/FastMM4-AVX: FastMM4 fork with AVX support and multi-threaded enhancements (faster locking).

The fork does two things:

  • it has multi-threading enhancements (faster locking)
  • AVX support which seems tough for floating point heavy applications as Delphi generates SSE instructions for them

Reminder to self: how big is that impact and could the locking be separately merged into the base repository?

Eric Grange:
Looking at https://github.com/pleriche/FastMM4/issues/36 and given than the compile generates SSE2 code for floating point, I guess using AVX may be problematic when your code also does a lot of floating point using Delphi code (rather than AVX asm)

It could be that this repository is using only AVX-128, which might not have a penalty as per Advanced Vector Extensions – Wikipedia:

The AVX instructions support both 128-bit and 256-bit SIMD. The 128-bit versions can be useful to improve old code without needing to widen the vectorization, and avoid the penalty of going from SSE to AVX, they are also faster on some early AMD implementations of AVX. This mode is sometimes known as AVX-128.

Via: [WayBack] Do you use Embarcadero version of FastMM or the “official” bleeding edge version of FastMM from the gitHub repository? Any idea what are the difference… – Tommi Prami – Google+

–jeroen

6 Responses to “Delphi: There is a FastMM4 fork with AVX support and multi-threaded enhancements (faster locking) – how will it impact floating point heavy applications (as Delphi uses SSE instructions for floating point)”

  1. The main advantage of the FastMM4-AVX is gained by efficient synchronization, not by the use of the MMX registers for memory copy.
    In FastMM4-AVX, proper synchronization techniques are used depending on context and availability, i.e. spin-wait loops, SwitchToThread, critical sections, etc..
    FastMM4-AVX uses the “test, test-and-set” technique for the spin-wait loops. The “pause” instruction is used in these loops. See https://stackoverflow.com/a/44916975/6910868 for more details on the implementation of the “pause”-based spin-wait loops.
    As I wrote before, the effect of using AVX instruction in speed improvement is negligible, comparing to the effect brought by efficient synchronizaton; sometimes AVX instructions can even slow down the program because of AVX-SSE transition penalties and reduced CPU frequency caused by AVX-512 instructions in some processors; use DisableAVX to turn AVX off completely or use DisableAVX1/DisableAVX2/DisableAVX512 to disable separately certain AVX-related instruction set from being compiled.

  2. I didn’t say that “AVX1 doesn’t have any penalty if it used together with SSE”. Any AVX-SSE transition have a penalty (potentially). In other words, any command with a VEX-prefix (i.e. any AVX command including AVX-1) and intertwined with a command without a VEX-prefix (SSE) incurs a transition penalty.

  3. “Only AVX1 enhancements” would not change anything, because any instruction with VEX-prefix https://en.wikipedia.org/wiki/VEX_prefix (including an AVX1 instruction) will generate the penalty.

  4. Emil Mustea said

    If you use 32bit you have only faster locking; in 64bit you can undefine EnableAVX and you have only faster locking or you can have EnableAVX but undefine DisableAVX2 and DisableAVX512 so you have only AVX1 enhancements. As you wrote AVX1 doesn’t have any penalty if it used together with SSE.

Leave a Reply to jpluimers Cancel reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

 
%d bloggers like this: