Delphi: A few notes on tracking down a use-after free related issue involving interfaces crashing inside System._IntfClear.
Posted by jpluimers on 2020/01/20
A few notes on tracking down a use-after free related issue involving interfaces.
The crash message is like this:
Project UseAfterFreeWithInterface.exe raised exception class $C0000005 with message 'access violation at 0x004106c0: read of address 0x80808088'.
Two things here:
- the is the hexadecimal value of
STATUS_ACCESS_VIOLATION
, which is sort of mentioned in [WayBack] Delphi 2009: EExternalException Class, but not documented in any more recent Delphi versions. It is in the code for the units System, System.SysUtils and Winapi.Windows (see below).
The value is not well documented on the Microsoft documentation site either: you sort of have to connect the dots between site:docs.microsoft.com “0xC0000005” – Google Search and [WayBack] site:docs.microsoft.com “STATUS_ACCESS_VIOLATION” – Google Search to find for instance these two:
- The
0x80808088
comes from the [WayBack]FastMM4.pas
DebugFillPattern
value$80808080
which is used to overwrite freed memory. Not all versions of FastMM4.pas have that value near the DebugFillPattern in the code, so I raised an issue for that: [WayBack] Document default 32-bit values of `DebugFillPattern` and `DebugReservedAddress` · Issue #67 · pleriche/FastMM4 · GitHub
The relation between
$80808088
and$80808080
(a difference of just$00000008
) comes clear when you look at theSystem._IntClear
method disassembly below. - If you use EurekaLog instead of FastMM4, then you addresses will be patterns based on
$DEADBEEF
, see these links on why:
An important note first
Basically any memory value in an exception starting with
$8080
and sometimes even$80
should raise suspicion: it usually means a use-after-free case.
You see these errors with FastMM and not with the memory manager as [WayBack] delphi • View topic • Problem with FastMM and D7 explains:
Because FastMM4 fills freed memory with 808080... where the old memory manager just leaves it as it is. When the memory is left untouched, your program might work until the memory manager starts to use that memory again. So it is actually working on plain luck, and that is not good. One little change anywhere in your code might trigger strange results that will be very difficult for you to find, and now this new memory manager helps you with avoiding these problems.
Stack contents
Often the call to _IntfClear is in the tear-down of the method, right in the middle of the implicit compiler generated try…finally section.
At that moment, part of the stack is already gone, so usually you cannot see which method is actually in tear-down mode.
Especially for test case code, it is wise to manually clear all interface references before the end
statement in the method.
This will still have the stack in tact, so when you get an access violation in _IntfClear
, it is way easier to see what method called it.
Exception location
The exception happens right in the middle of the method System._IntfClear
which has no documentation on docs.embarcadero.com nor on docwiki.embarcadero.com, but a few tiny bits on other embarcadero sites:
- [WayBack] Delphi reference counted interfaces
- [WayBack] Access Violation in _IntfClear – Community Blogs – Embarcadero Community
- [WayBack] Non-Reference-Counted Interfaces – Community Blogs – Embarcadero Community
There is not much on other sites either; usually the entries involve questions on how to track back the cause (with very few hints), or why FastMM is involved (which I explained above).
A few of the most relevant links:
- [WayBack] Race condition inside interface release – embarcadero.delphi.rtl: Just ran into a pretty bad thread race bug, it turns out to be a race condition during interface release operation. I have two threads competing o…
- [Archive.is] 43. Interface type in Object Pascal
- [WayBack] The Delphi Geek: How to Find a Missing _Release
- [WayBack] delphi – Add, Remove Folder from IShellLibrary – Stack Overflow
- [WayBack] Hallvard’s Blog: Getting a list of implemented interfaces
The System._IntfClear
method is called a lot, not just during tear down of the application; in face any method having an interface local variable will implicitly call System._IntfClear
(and is one of the reasons it is implemented in assembler):
[WayBack] DirectMusic -> Forum on Sourcebooks.Ru
The fact is that with pointers to the Delphi interface works in a special way. In particular, when exiting a block (procedures, functions, programs) for all local variables of this type, the system function
_IntfClear
is called. It checks the pointer given to it to nil and, if necessary, reduces the reference count. In your case, this is what happened. And it happened after DirectX and COM itself were unloaded from memory. This caused an error.When we replace all calls to
IInterface._Release
by assigning nil (by the way, this also leads to an automatic call to_Release
), then when exiting the block,_Release
is not called on this pointer anymore, since it is zero. And all in the end it turns type-top.
Back to System._IntfClear
(which you sometimes see in traces as System.@IntfClear
or just System.IntfClear
): the code below is how it looks in most Delphi versions, and it does only a few things:
- it gets a pointer to the memory location referring to the interface
- if non-nil, it:
- sets that memory location to nil, so a next call to
System._IntfClear
with itwill be very fast - calls the
_Release
method on the interface (see [WayBack] IInterface Interface and [WayBack] IInterface._Release Method)
- sets that memory location to nil, so a next call to
function _IntfClear(var Dest: IInterface): Pointer; {$IFDEF PUREPASCAL} var P: Pointer; begin Result := @Dest; if Dest <> nil then begin P := Pointer(Dest); Pointer(Dest) := nil; IInterface(P)._Release; end; end; {$ELSE !PUREPASCAL} {$IFDEF CPUX86} asm MOV EDX,[EAX] TEST EDX,EDX JE @@1 MOV DWORD PTR [EAX],0 {$IFDEF ALIGN_STACK} SUB ESP, 4 {$ENDIF ALIGN_STACK} PUSH EAX PUSH EDX MOV EAX,[EDX] CALL DWORD PTR [EAX] + VMTOFFSET IInterface._Release POP EAX {$IFDEF ALIGN_STACK} ADD ESP, 4 {$ENDIF ALIGN_STACK} @@1: end; {$ENDIF CPUX86} {$ENDIF !PUREPASCAL}
for which the disassembled Win32 version looks like this in almost any Delphi version:
004106AE 8BC0 mov eax,eax System.pas.36501: MOV EDX,[EAX] 004106B0 8B10 mov edx,[eax] System.pas.36502: TEST EDX,EDX 004106B2 85D2 test edx,edx System.pas.36503: JE @@1 004106B4 740E jz $004106c4 System.pas.36504: MOV DWORD PTR [EAX],0 004106B6 C70000000000 mov [eax],$00000000 System.pas.36508: PUSH EAX 004106BC 50 push eax System.pas.36509: PUSH EDX 004106BD 52 push edx System.pas.36510: MOV EAX,[EDX] 004106BE 8B02 mov eax,[edx] System.pas.36511: CALL DWORD PTR [EAX] + VMTOFFSET IInterface._Release 004106C0 FF5008 call dword ptr [eax+$08] System.pas.36512: POP EAX 004106C3 58 pop eax System.pas.36517: end; 004106C4 C3 ret
Tracking the cause down
Tracking the cause down is hard. For one, memory is already hosed, so it is very hard to get any useful information. Another thing is that – even when the memory was not hosed – you cannot get the GUID from an interface reference:
- [WayBack] delphi – Is it possible to get the value of a GUID on an interface using RTTI? – Stack Overflow.
- [WayBack] How do I get the GUID from an interface reference? – delphi
The only relevant information is the Dest
parameter to the System._Intf
call. This is in the EDX register (EAX contains the offending value):
EAX:
80808080
EDX:71B26D94
The EDX will likely change on every offending call, so you need to find a pattern to enable a breakpoint at (in my case) memory location 004106C0
which is for the line
CALL DWORD PTR [EAX] + VMTOFFSET IInterface._Release
You can either make this a conditional breakpoint (breaking on EAX = $80808088
) or a grouped breakpoint that is by default disabled, but enables under certain conditions.
The former is very slow. The latter is much faster, but harder to do and involves finding the context. To get the context, it helps to is look up to the stack trace and find a pattern or a code path leading to the exception which in my case looks like this:
System._IntfClear(???)
:004106c0 @IntfClear + $10
System._BeforeDestruction(???,???)
:0040a4c4 @BeforeDestruction + $C
System._IntfClear(???)
:004106c3 @IntfClear + $13
System.TObject.Free
Tools.ObjectCache.{System.Generics.Collections}TObjectList.Notify($72548DD0,cnRemoved)
Tools.ObjectCache.TGeneralCache.TInstanceList.Notify($72548DD0,cnRemoved)
System.Classes.{System.Generics.Collections}TList.InternalNotify(???,???)
System.Generics.Collections.TListHelper.InternalDeleteRange4(???,???)
System.Generics.Collections.TListHelper.InternalSetCount4(0)
System.Classes.{System.Generics.Collections}TList.SetCount(???)
System.Classes.{System.Generics.Collections}TList.Destroy
You can enrich context by setting logging breakpoints with the “Eval expression” IntToHex(Integer(Pointer(Self)), 8) + ' in Method with class ' +Self.QualifiedClassName
at these methods (replace Method
with the actual method name):
function TInterfacedObject._AddRef: Integer;
at the lineResult := AtomicIncrement(FRefCount);
function TInterfacedObject._Release: Integer;
at the line__MarkDestroying(Self);
procedure TInterfacedObject.BeforeDestruction;
at the lineif RefCount <> 0 then
procedure TObject.Free;
at the lineif Self <> nil then
procedure _ClassDestroy(const Instance: TObject);
at the lineInstance.FreeInstance;
(and useInstance
instead ofSelf
)destructor TObject.Destroy;
at the lineend;
- zzz at the line zzz
- zzz at the line zzz
In my case, I did also add them in descending methods of
System.Generics.Collections.TObjectList<T: class>
methodprocedure TObjectList<T>.Notify(const Value: T; Action: TCollectionNotification);
at any line.System.Generics.Collections.TObjectDictionary<TKey,TValue>
methodprocedure TObjectDictionary<TKey,TValue>.ValueNotify(const Value: TValue; Action: TCollectionNotification);
at any line
For both methods, I also set the condition Action = cnRemoved
so it would only fire when removing objects.
Put all of these in a breakpoint group that gets enabled close to where you think the problem starts.
Keep zooming in, and fingers crossed it is not a heisenbug.
In my case, the dictionary was of type <TClass, TInstanceList>
where the hash code of the class influenced the bucket order.
Sometimes the order caused the problems to appear, at other times everything was dandy.
_AddRef
and _Release
in TInterfacedObject
Supports
and _SafeIntfAsClass
Sometimes similar access violations happen inside the System.Supports
method (see [WayBack] Object Interfaces).
Usually setting breakpoints with these conditions reveal them before they actually happen:
PPointer(Instance)^ = Pointer($80808080)
PCardinal(Instance)^ = $80808080
PCardinal(Instance)^ and $FFFFFF80 = $80808080
The first two have identical effect, but only break when the interface points to an object at $80808080
. This is not always the case, but usually the pointer is like $808080XX
so the and
expression masks for exactly that pattern.
Target methods for the above breakpoints:
function Supports(const Instance: IInterface; const IID: TGUID; out Intf): Boolean;
function Supports(const Instance: TObject; const IID: TGUID; out Intf): Boolean;
Similarly, you can do this in the _SafeIntfAsClass
method that gets called when casting an interface to TObject
(like if MyInterface is TObject
) through _IntfIsClass
.
Inside the breakpoint condition, change Instance
to Intf
as the method has a different signature:
function _SafeIntfAsClass(const Intf: IInterface; Parent: TClass): TObject; begin if (Intf <> nil) and (Intf.QueryInterface(ObjCastGUID, Pointer(Result)) = S_OK) and (Result is Parent) then Exit; Result := nil; end;
Related
Delphi: when calling TThread.Synchronize, ensure the synchronised method handles exceptions
–jeroen
Stefan Glienke said
TL’DR – use CatchUseOfFreedInterfaces with FastMM4.
jpluimers said
That did not work because those interfaces still had references to them.
The actual cause here was mixing interface and class references, combined with implicit “friends” where code in the same unit can access non-strict marked storage.
Stefan Glienke said
Err no? Your entire article is about tracking a reference to an interface whose object instance has been destroyed. That is what CatchUseOfFreedInterfaces takes care of – it replaces the IMT in a destroyed object that implements interfaces with a special table that raises an error indicating this issue. No weird AV in _Release but “FastMM has detected an attempt to use an interface to a freed object … etc” exception
jpluimers said
I remember trying that and not catching these.
Hopefully after surgery I recover well enough to revisit this one day.
Olaf said
All the best wishes for the operation.
jpluimers said
Danke!
sglienke said
See https://github.com/pleriche/FastMM4/blob/master/FastMM4Options.inc#L134
jpluimers said
I know. That did not help back then, hence the notes.
One day, I will try to make this reproducible with a nice small set of code. Hence the “some notes” as this was to jot down at least some bits of knowledge so I can later hopefully revisit.