Delphi: There is a FastMM4 fork with AVX support and multi-threaded enhancements (faster locking) – how will it impact floating point heavy applications (as Delphi uses SSE instructions for floating point) « The Wiert Corner

All categories

May 2018
M	T	W	T	F	S	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

Delphi: There is a FastMM4 fork with AVX support and multi-threaded enhancements (faster locking) – how will it impact floating point heavy applications (as Delphi uses SSE instructions for floating point)

Posted by jpluimers on 2018/05/01

Interesting fork of FastMM4 for which I now think I understand why it is not merged into the regular FastMM4 repository: [WayBack] GitHub – maximmasiutin/FastMM4-AVX: FastMM4 fork with AVX support and multi-threaded enhancements (faster locking).

The fork does two things:

it has multi-threading enhancements (faster locking)
AVX support which seems tough for floating point heavy applications as Delphi generates SSE instructions for them

Reminder to self: how big is that impact and could the locking be separately merged into the base repository?

Eric Grange:
Looking at https://github.com/pleriche/FastMM4/issues/36 and given than the compile generates SSE2 code for floating point, I guess using AVX may be problematic when your code also does a lot of floating point using Delphi code (rather than AVX asm)

It could be that this repository is using only AVX-128, which might not have a penalty as per Advanced Vector Extensions – Wikipedia:

The AVX instructions support both 128-bit and 256-bit SIMD. The 128-bit versions can be useful to improve old code without needing to widen the vectorization, and avoid the penalty of going from SSE to AVX, they are also faster on some early AMD implementations of AVX. This mode is sometimes known as AVX-128.

Via: [WayBack] Do you use Embarcadero version of FastMM or the “official” bleeding edge version of FastMM from the gitHub repository? Any idea what are the difference… – Tommi Prami – Google+

–jeroen

This entry was posted on 2018/05/01 at 18:00 and is filed under Delphi, Development, FastMM, Software Development. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

6 Responses to “Delphi: There is a FastMM4 fork with AVX support and multi-threaded enhancements (faster locking) – how will it impact floating point heavy applications (as Delphi uses SSE instructions for floating point)”

Maxim Masiutin said

2020/10/09 at 11:43
The main advantage of the FastMM4-AVX is gained by efficient synchronization, not by the use of the MMX registers for memory copy.
In FastMM4-AVX, proper synchronization techniques are used depending on context and availability, i.e. spin-wait loops, SwitchToThread, critical sections, etc..
FastMM4-AVX uses the “test, test-and-set” technique for the spin-wait loops. The “pause” instruction is used in these loops. See https://stackoverflow.com/a/44916975/6910868 for more details on the implementation of the “pause”-based spin-wait loops.
As I wrote before, the effect of using AVX instruction in speed improvement is negligible, comparing to the effect brought by efficient synchronizaton; sometimes AVX instructions can even slow down the program because of AVX-SSE transition penalties and reduced CPU frequency caused by AVX-512 instructions in some processors; use DisableAVX to turn AVX off completely or use DisableAVX1/DisableAVX2/DisableAVX512 to disable separately certain AVX-related instruction set from being compiled.

Reply
Maxim Masiutin said

2018/05/04 at 17:32
I didn’t say that “AVX1 doesn’t have any penalty if it used together with SSE”. Any AVX-SSE transition have a penalty (potentially). In other words, any command with a VEX-prefix (i.e. any AVX command including AVX-1) and intertwined with a command without a VEX-prefix (SSE) incurs a transition penalty.

Reply
Maxim Masiutin said

2018/05/04 at 16:59
“Only AVX1 enhancements” would not change anything, because any instruction with VEX-prefix https://en.wikipedia.org/wiki/VEX_prefix (including an AVX1 instruction) will generate the penalty.

Reply
- jpluimers said
  
  2018/05/04 at 17:25
  Thanks for confirming that.
  
  Reply
Emil Mustea said

2018/05/02 at 10:36
If you use 32bit you have only faster locking; in 64bit you can undefine EnableAVX and you have only faster locking or you can have EnableAVX but undefine DisableAVX2 and DisableAVX512 so you have only AVX1 enhancements. As you wrote AVX1 doesn’t have any penalty if it used together with SSE.

Reply
- jpluimers said
  
  2018/05/02 at 20:29
  Thanks for the update!
  
  Reply

	jpluimers on Sony STR-DE205 Receiver…
	jpluimers on Position paper nachtsluiting D…
	A/V Revolution on Need to find a “smart…
	jpluimers on Position paper nachtsluiting D…
	Nic3 on Position paper nachtsluiting D…

The Wiert Corner – irregular stream of stuff

Jeroen W. Pluimers on .NET, C#, Delphi, databases, and personal interests

Subscribe

Archives

Recent Comments

Recent Posts

Blog Stats

Meta title

Tag Cloud Title

Top Clicks

Top Posts

My badges

Twitter Updates

Pages

All categories

Email Subscription

Delphi: There is a FastMM4 fork with AVX support and multi-threaded enhancements (faster locking) – how will it impact floating point heavy applications (as Delphi uses SSE instructions for floating point)

6 Responses to “Delphi: There is a FastMM4 fork with AVX support and multi-threaded enhancements (faster locking) – how will it impact floating point heavy applications (as Delphi uses SSE instructions for floating point)”

Maxim Masiutin said

Maxim Masiutin said

Maxim Masiutin said

jpluimers said

Emil Mustea said

jpluimers said

Leave a comment Cancel reply

The Wiert Corner – irregular stream of stuff

Jeroen W. Pluimers on .NET, C#, Delphi, databases, and personal interests

Subscribe

Archives

Recent Comments

Recent Posts

Blog Stats

Meta title

Tag Cloud Title

Top Clicks

Top Posts

My badges

Twitter Updates

Pages

All categories

Email Subscription

Delphi: There is a FastMM4 fork with AVX support and multi-threaded enhancements (faster locking) – how will it impact floating point heavy applications (as Delphi uses SSE instructions for floating point)

Rate this:

Share this:

Related

6 Responses to “Delphi: There is a FastMM4 fork with AVX support and multi-threaded enhancements (faster locking) – how will it impact floating point heavy applications (as Delphi uses SSE instructions for floating point)”

Maxim Masiutin said

Maxim Masiutin said

Maxim Masiutin said

jpluimers said

Emil Mustea said

jpluimers said

Leave a comment Cancel reply