Add support for profiling GDScript with tracy. #113279

Ivorforce · 2025-11-28T14:09:22Z

Supersedes [WIP] Add profiling macros for GDScript module #112707
Follow-up of Add profiler_path option to SCons builds, with support for tracy and perfetto. #104851

This PR allows users to profile their games' GDScript functions using the tracy profiler.
It is a collaborative work between @enetheru and me.

Users can use the tracy instructions on the docs (currently in PR form, soon available here) to start profiling their games. Note that profiling with tracy requires recompiling the engine.

I've profiled tps-demo for about ~10 minutes straight, and both Godot and tracy stay responsive, without any apparent slowdowns or leaks.

Approach

Tracy profile zones generally make use of constexpr tracy::SourceLocationData *, which are passed to the profiling macros. With this trick, tracy can introspect millions of events, because it only needs to store a start location, and end location, and a pointer to the SourceLocationData.

While tracy supports dynamic source locations / strings, effectively it just copies the contents and leaks it. This works, but is wasteful, because tons of data is leaked. A previous implementation (by the developers of Halls of Torment) ran into limitations with this approach, because it led to gigabytes of data filling up after a few seconds of profiling (I don't know for sure they used tracy's leaky approach, but it would match the observations).

Instead of using tracy's dynamic source location API, this PR interns source locations for GDScriptFunction objects, and leaks those instead. This means that every function is only leaked once, instead of once per call, which allows us to profile for much, much longer. This takes inspiration from StringName's interning approach.
The implementation is fully contained in the tracy glue internals, and doesn't affect games that don't compile with tracy. It is further contained to .cpp files including profiling.h, so recompiling with a profiler attached stays trivial.

(it would be possible to attach a void * to GDScriptFunction instances instead of interning SourceLocationData. This would be faster, but it wouldn't be self-contained anymore. In the future, if people start running into performance overhead from the profiler, we can amend the implementation to do this instead).

Ivorforce · 2025-11-28T14:31:45Z

Note that Godot currently crashes on exit (when tracy is used). This doesn't affect the ability of the PR to trace events.
I'll see if this can be fixed, but I think it would be acceptable to merge the PR even in this state, because it's more of a beauty error than a problem.
Edit: Fixed

core/profiling/profiling.cpp

This adds macro `GodotProfileZoneGroupedFirstScript`, and uses interning for speedy lookups. Co-authored-by: Samuel Nicholas <nicholas.samuel@gmail.com>

vnen · 2025-11-28T18:02:46Z

Tested this for a bit. I think it's pretty good but it fails to capture engine calls from GDScript. So if a script function calls something else and that something is slow, it's not really possible to tell.

I wonder if you can hook some of the API into the CALL* opcodes in the VM so the profiler is aware that the control is leaving the function (with some information of what's being called). This would give a better view of the execution time vs. the self time.

Ivorforce · 2025-11-28T18:17:54Z

Tested this for a bit. I think it's pretty good but it fails to capture engine calls from GDScript. So if a script function calls something else and that something is slow, it's not really possible to tell.

Right, since tracy is a tracing profiler, by nature we only trace what we explicitly annotate. (Most) engine functions aren't annotated, so they aren't traced.
However, I would argue that this PR solves the more important problem already (figuring out where your frametime goes). Benchmarking the function parts from there is a lot easier than locating the source of your problems in the first place.

I wonder if you can hook some of the API into the CALL* opcodes in the VM so the profiler is aware that the control is leaving the function (with some information of what's being called). This would give a better view of the execution time vs. the self time.

This should be possible — and indeed, #112707 had some support for it.
However, adding support for this is not required for GDScript tracing to work, so I removed it from this PR. I think it is better added and reviewed in isolation, after this PR is merged.
(An alternative to instrumenting engine calls is using tracy's sampling function. This should fill in the gaps in a more complete way)

enetheru · 2025-11-28T22:14:13Z

and indeed, #112707 had some support for it.

I put a lot of effort into capturing all calls from gdscript , specifically tailoring it for capturing engine calls as well. I was originally capturing all opcodes separately, and figured out which ones could be captured at the start, and which ones i needed to keep.

I havent had time to look over and test the rest yet, but this makes me sad.

Ivorforce · 2025-11-28T22:17:33Z

and indeed, #112707 had some support for it.

I put a lot of effort into capturing all calls from gdscript , specifically tailoring it for capturing engine calls as well. I was originally capturing all opcodes separately, and figured out which ones could be captured at the start, and which ones i needed to keep.

I havent had time to look over and test the rest yet, but this makes me sad.

It's not a problem, we can add those back in a follow-up PR. But I much prefer separation of concerns in PRs, to give each added feature the attention that's needed to review it.

Ivorforce · 2025-11-28T23:03:17Z

To go a little bit more in-depth with my response:

First off, thank you for putting in the time to add support for tracing system calls. Judging by @vnen's and @AdriaandeJongh's reaction, this is a feature that is anticipated, and will be useful for profiling GDScript.

Still, I'll try to explain why I decided to remove it from this PR. The reason for that is simply that I want to keep the complexity of the PR as low as possible. I strongly believe in the power of code reviews, and code reviews are most effective when the code is easy to understand, and focuses on one change at a time. In short, I'm not removing system call tracing because I believe it should not be added — I'm just removing it temporarily to make this PR easier to review and merge. I'm looking forward to your follow-up PR to re-add those calls!

You've also asked why I've added interning for SourceLocationData, at the same time while I removed the system call tracing. The reason for that is that it is needed for the feature to run smoothly. Let's review the alternatives to my approach:

Use Tracy's dynamic source location API: This is you did in your PR #112707. However, using this API makes Tracy copy and leak the strings. In bigger projects, this can lead to large amounts of data being leaked, effectively making us unable to trace for longer than a few seconds (as described in the Halls of Torment retrospective).

Use and leak SourceLocationData: I (inadvertently) tested this. Tracy supports a maximum of 32k SourceLocationData objects. With a few calls each frame, this is maxed out after a few seconds, preventing longer tracing, the same as the previous approach.

Considering this, we need a way to keep track of SourceLocationData objects and reuse them. To keep the profiler logic free from non-profiler builds, I decided to intern them. This approach works well: As described in the OP, we should be able to profile (almost) indefinitely with this design. Since this design is vital for sustainable profiling, I believe it should inform the API design, and is therefore needed in the initial PR.

clayjohn

Let's goooooo

akien-mga

LGTM. Great work @enetheru and @Ivorforce!

akien-mga · 2025-12-02T13:32:47Z

Thanks!

platform/macos/os_macos.mm

mihe · 2025-12-05T22:46:06Z

I don't have time to look into this until next week, but as mentioned here earlier I've cherrypicked this PR and all the preceding profiling PRs to a fairly large-scale project that's running Godot 4.5, and it seems this (or possibly one of the previous profiling PRs) more or less broke resource importing when building the editor with profiling enabled, because memory usage skyrockets during it.

When deleting .godot and reimporting everything, it seems to normally hover at around 2 GB, and with the profiling stuff enabled it quickly climbs to 30+ GB instead.

It's possible that it's something unique to this project (which does have custom importers written in GDScript), or that something went wrong in the cherrypicking to 4.5, but I figured I'd at least give a heads up before I do a deeper dive next week.

cridenour · 2025-12-08T05:10:09Z

I've backported all the profiling work (including #113632) onto 4.3, and others should be able to cherry pick the commit here: cridenour@9f746bf

One thing to note is on 4.3 the build system glob was unwilling to add TracyClient.hpp to our source files. Not sure when it was fixed but changing one line in core/profiling/SCSub was able to bypass it. You can see the line here: cridenour@9f746bf#diff-9c24e4731c36f13b1b630b3166e98b34fdeba56db12b311c6a416ad054f0a5d7R66

@mihe I wonder if the memory leak is because the current profiling solution doesn't use on demand profiling. Adding

env_tracy.Append(CPPDEFINES=["TRACY_ON_DEMAND"])

in core/profiling/SCSub next to TRACY_ENABLE may help here and may be the desired behavior, especially if you have profiling enabled in the editor and not just a special profiling release export.

mihe · 2025-12-08T18:16:26Z

it seems this (or possibly one of the previous profiling PRs) more or less broke resource importing when building the editor with profiling enabled, because memory usage skyrockets during it.

A couple of observations on the memory build-up issue so far:

I've been able to reproduce this on macOS Tahoe, Linux (Fedora 42) and Windows 11.
Removing the added line to gdscript_vm.cpp makes no difference, so I guess that rules out this particular PR.
Building with TRACY_ON_DEMAND did not help. It did however cause Tracy to sometimes immediately terminate the profiling session when connecting due to the same profiling zone having been exited multiple times, or something along those lines.
Having Tracy connected or not connected makes no difference, with or without TRACY_ON_DEMAND.
On Windows (even with 15-ish GB of memory to spare) the editor crashes at a memcpy in the Tracy code halfway through the resource importing, due to a tracy_malloc seemingly having returned a nullptr, while Tracy is trying to allocate more capacity for its m_serialQueue as part of GodotProfileFree. I assume this crashes because tracy::FastVector always doubles the memory on every grow, and the next doubling would exceed my 64 GB of total memory.
Looking at the calls to tracy::FastVector<T>::AllocMore(), I can confirm that it is indeed tracy::Profiler::m_serialQueue that grows so very big.
I figured it might be that Tracy only flushes this m_serialQueue every frame or something, and ProgressDialog (which is a potentially long-running modal dialog) does not have a GodotProfileFrameMark in its _update_ui method, but adding one made no difference, unfortunately.
Setting GodotProfileAlloc and GodotProfileFree to be no-ops does resolve the issue.

So it would seem that this is caused by Tracy's memory profiling somehow, either because we're failing to do something to flush this queue, or we're just doing so many allocations during resource importing that this sort of tracking scales poorly to bigger projects.

Any ideas?

(I'll try to create a dedicated issue for this, but making an MRP for this might be tough. I also still need to try this in latest master, as opposed to 4.5.)

Ivorforce · 2025-12-08T19:02:13Z

Any ideas?

Worst case we could add an option to disable/enable allocation tracing. Probably disabling by default would make most sense if it unexpectedly fails sometimes.
We could ask the tracy dev why it might add to RAM bloat, and how we could address this. 30gb does seem unexpected in either case.

mihe · 2025-12-09T16:40:58Z

Worst case we could add an option to disable/enable allocation tracing. Probably disabling by default would make most sense if it unexpectedly fails sometimes.

I've opened a formal issue (#113805) as well as a pull request to add the suggested build option (#113807). I haven't had time to try to make an MRP yet, but I have now run the project in latest master and can confirm that it happens there as well.

Ivorforce requested review from a team as code owners November 28, 2025 14:09

Ivorforce added enhancement topic:core performance labels Nov 28, 2025

Ivorforce force-pushed the tracy-gdscript-codeloc branch from 471292f to f9cec74 Compare November 28, 2025 14:11

Ivorforce added this to the 4.6 milestone Nov 28, 2025

Ivorforce force-pushed the tracy-gdscript-codeloc branch from f9cec74 to 60cfe84 Compare November 28, 2025 14:29

Ivorforce force-pushed the tracy-gdscript-codeloc branch from 60cfe84 to 65af0fc Compare November 28, 2025 14:48

AThousandShips reviewed Nov 28, 2025

View reviewed changes

core/profiling/profiling.cpp Outdated Show resolved Hide resolved

Add support for profiling GDScript with tracy.

acefbbb

This adds macro `GodotProfileZoneGroupedFirstScript`, and uses interning for speedy lookups. Co-authored-by: Samuel Nicholas <nicholas.samuel@gmail.com>

Ivorforce force-pushed the tracy-gdscript-codeloc branch from 65af0fc to acefbbb Compare November 28, 2025 15:37

Ivorforce added the topic:gdscript label Nov 28, 2025

clayjohn approved these changes Dec 2, 2025

View reviewed changes

akien-mga approved these changes Dec 2, 2025

View reviewed changes

akien-mga merged commit 71d4ded into godotengine:master Dec 2, 2025
20 checks passed

akien-mga mentioned this pull request Dec 2, 2025

[WIP] Add profiling macros for GDScript module #112707

Closed

Ivorforce deleted the tracy-gdscript-codeloc branch December 2, 2025 13:38

Ivorforce mentioned this pull request Dec 2, 2025

Core: Add Apple Instruments support #113342

Merged

mihe reviewed Dec 3, 2025

View reviewed changes

platform/macos/os_macos.mm Show resolved Hide resolved

Ivorforce mentioned this pull request Dec 5, 2025

Add support for profiling system calls from GDScript with the tracy integration. #113632

Merged

mihe mentioned this pull request Dec 9, 2025

profiler=tracy allocates dozens of gigabytes during resource importing #113805

Open

Uh oh!

Add support for profiling GDScript with tracy. #113279

Add support for profiling GDScript with tracy. #113279

Conversation

Ivorforce commented Nov 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Approach

Uh oh!

Ivorforce commented Nov 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

vnen commented Nov 28, 2025

Uh oh!

Ivorforce commented Nov 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

enetheru commented Nov 28, 2025

Uh oh!

Ivorforce commented Nov 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Ivorforce commented Nov 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

clayjohn left a comment

Choose a reason for hiding this comment

Uh oh!

akien-mga left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

akien-mga commented Dec 2, 2025

Uh oh!

Uh oh!

mihe commented Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cridenour commented Dec 8, 2025

Uh oh!

mihe commented Dec 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Ivorforce commented Dec 8, 2025

Uh oh!

mihe commented Dec 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

Ivorforce commented Nov 28, 2025 •

edited

Loading

Ivorforce commented Nov 28, 2025 •

edited

Loading

Ivorforce commented Nov 28, 2025 •

edited

Loading

Ivorforce commented Nov 28, 2025 •

edited

Loading

Ivorforce commented Nov 28, 2025 •

edited

Loading

mihe commented Dec 5, 2025 •

edited

Loading

mihe commented Dec 8, 2025 •

edited

Loading