The Wiert Corner – irregular stream of stuff

Jeroen W. Pluimers on .NET, C#, Delphi, databases, and personal interests

  • My badges

  • Twitter Updates

  • My Flickr Stream

  • Pages

  • All categories

  • Enter your email address to subscribe to this blog and receive notifications of new posts by email.

    Join 1,421 other followers

Delphi: There is a FastMM4 fork with AVX support and multi-threaded enhancements (faster locking) – how will it impact floating point heavy applications (as Delphi uses SSE instructions for floating point)

Posted by jpluimers on 2018/05/01

Interesting fork of FastMM4 for which I now think I understand why it is not merged into the regular FastMM4 repository: [WayBack] GitHub – maximmasiutin/FastMM4-AVX: FastMM4 fork with AVX support and multi-threaded enhancements (faster locking).

The fork does two things:

  • it has multi-threading enhancements (faster locking)
  • AVX support which seems tough for floating point heavy applications as Delphi generates SSE instructions for them

Reminder to self: how big is that impact and could the locking be separately merged into the base repository?

Eric Grange:
Looking at https://github.com/pleriche/FastMM4/issues/36 and given than the compile generates SSE2 code for floating point, I guess using AVX may be problematic when your code also does a lot of floating point using Delphi code (rather than AVX asm)

It could be that this repository is using only AVX-128, which might not have a penalty as per Advanced Vector Extensions – Wikipedia:

The AVX instructions support both 128-bit and 256-bit SIMD. The 128-bit versions can be useful to improve old code without needing to widen the vectorization, and avoid the penalty of going from SSE to AVX, they are also faster on some early AMD implementations of AVX. This mode is sometimes known as AVX-128.

Via: [WayBack] Do you use Embarcadero version of FastMM or the “official” bleeding edge version of FastMM from the gitHub repository? Any idea what are the difference… – Tommi Prami – Google+

–jeoren

5 Responses to “Delphi: There is a FastMM4 fork with AVX support and multi-threaded enhancements (faster locking) – how will it impact floating point heavy applications (as Delphi uses SSE instructions for floating point)”

  1. I didn’t say that “AVX1 doesn’t have any penalty if it used together with SSE”. Any AVX-SSE transition have a penalty (potentially). In other words, any command with a VEX-prefix (i.e. any AVX command including AVX-1) and intertwined with a command without a VEX-prefix (SSE) incurs a transition penalty.

  2. “Only AVX1 enhancements” would not change anything, because any instruction with VEX-prefix https://en.wikipedia.org/wiki/VEX_prefix (including an AVX1 instruction) will generate the penalty.

  3. Emil Mustea said

    If you use 32bit you have only faster locking; in 64bit you can undefine EnableAVX and you have only faster locking or you can have EnableAVX but undefine DisableAVX2 and DisableAVX512 so you have only AVX1 enhancements. As you wrote AVX1 doesn’t have any penalty if it used together with SSE.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

 
%d bloggers like this: