Skip to content

Conversation

@Ivorforce
Copy link
Member

@Ivorforce Ivorforce commented Nov 28, 2025

This PR allows users to profile their games' GDScript functions using the tracy profiler.
It is a collaborative work between @enetheru and me.

SCR-20251128-nknw

Users can use the tracy instructions on the docs (currently in PR form, soon available here) to start profiling their games. Note that profiling with tracy requires recompiling the engine.

I've profiled tps-demo for about ~10 minutes straight, and both Godot and tracy stay responsive, without any apparent slowdowns or leaks.

Approach

Tracy profile zones generally make use of constexpr tracy::SourceLocationData *, which are passed to the profiling macros. With this trick, tracy can introspect millions of events, because it only needs to store a start location, and end location, and a pointer to the SourceLocationData.

While tracy supports dynamic source locations / strings, effectively it just copies the contents and leaks it. This works, but is wasteful, because tons of data is leaked. A previous implementation (by the developers of Halls of Torment) ran into limitations with this approach, because it led to gigabytes of data filling up after a few seconds of profiling (I don't know for sure they used tracy's leaky approach, but it would match the observations).

Instead of using tracy's dynamic source location API, this PR interns source locations for GDScriptFunction objects, and leaks those instead. This means that every function is only leaked once, instead of once per call, which allows us to profile for much, much longer. This takes inspiration from StringName's interning approach.
The implementation is fully contained in the tracy glue internals, and doesn't affect games that don't compile with tracy. It is further contained to .cpp files including profiling.h, so recompiling with a profiler attached stays trivial.

(it would be possible to attach a void * to GDScriptFunction instances instead of interning SourceLocationData. This would be faster, but it wouldn't be self-contained anymore. In the future, if people start running into performance overhead from the profiler, we can amend the implementation to do this instead).

@Ivorforce
Copy link
Member Author

Ivorforce commented Nov 28, 2025

Note that Godot currently crashes on exit (when tracy is used). This doesn't affect the ability of the PR to trace events.
I'll see if this can be fixed, but I think it would be acceptable to merge the PR even in this state, because it's more of a beauty error than a problem.
Edit: Fixed

@Ivorforce Ivorforce force-pushed the tracy-gdscript-codeloc branch from 60cfe84 to 65af0fc Compare November 28, 2025 14:48
This adds macro `GodotProfileZoneGroupedFirstScript`, and uses interning for speedy lookups.

Co-authored-by: Samuel Nicholas <nicholas.samuel@gmail.com>
@vnen
Copy link
Member

vnen commented Nov 28, 2025

Tested this for a bit. I think it's pretty good but it fails to capture engine calls from GDScript. So if a script function calls something else and that something is slow, it's not really possible to tell.

I wonder if you can hook some of the API into the CALL* opcodes in the VM so the profiler is aware that the control is leaving the function (with some information of what's being called). This would give a better view of the execution time vs. the self time.

@Ivorforce
Copy link
Member Author

Ivorforce commented Nov 28, 2025

Tested this for a bit. I think it's pretty good but it fails to capture engine calls from GDScript. So if a script function calls something else and that something is slow, it's not really possible to tell.

Right, since tracy is a tracing profiler, by nature we only trace what we explicitly annotate. (Most) engine functions aren't annotated, so they aren't traced.
However, I would argue that this PR solves the more important problem already (figuring out where your frametime goes). Benchmarking the function parts from there is a lot easier than locating the source of your problems in the first place.

I wonder if you can hook some of the API into the CALL* opcodes in the VM so the profiler is aware that the control is leaving the function (with some information of what's being called). This would give a better view of the execution time vs. the self time.

This should be possible — and indeed, #112707 had some support for it.
However, adding support for this is not required for GDScript tracing to work, so I removed it from this PR. I think it is better added and reviewed in isolation, after this PR is merged.
(An alternative to instrumenting engine calls is using tracy's sampling function. This should fill in the gaps in a more complete way)

@enetheru
Copy link
Contributor

and indeed, #112707 had some support for it.

I put a lot of effort into capturing all calls from gdscript , specifically tailoring it for capturing engine calls as well. I was originally capturing all opcodes separately, and figured out which ones could be captured at the start, and which ones i needed to keep.

I havent had time to look over and test the rest yet, but this makes me sad.

@Ivorforce
Copy link
Member Author

Ivorforce commented Nov 28, 2025

and indeed, #112707 had some support for it.

I put a lot of effort into capturing all calls from gdscript , specifically tailoring it for capturing engine calls as well. I was originally capturing all opcodes separately, and figured out which ones could be captured at the start, and which ones i needed to keep.

I havent had time to look over and test the rest yet, but this makes me sad.

It's not a problem, we can add those back in a follow-up PR. But I much prefer separation of concerns in PRs, to give each added feature the attention that's needed to review it.

@Ivorforce
Copy link
Member Author

Ivorforce commented Nov 28, 2025

To go a little bit more in-depth with my response:

First off, thank you for putting in the time to add support for tracing system calls. Judging by @vnen's and @AdriaandeJongh's reaction, this is a feature that is anticipated, and will be useful for profiling GDScript.

Still, I'll try to explain why I decided to remove it from this PR. The reason for that is simply that I want to keep the complexity of the PR as low as possible. I strongly believe in the power of code reviews, and code reviews are most effective when the code is easy to understand, and focuses on one change at a time. In short, I'm not removing system call tracing because I believe it should not be added — I'm just removing it temporarily to make this PR easier to review and merge. I'm looking forward to your follow-up PR to re-add those calls!

You've also asked why I've added interning for SourceLocationData, at the same time while I removed the system call tracing. The reason for that is that it is needed for the feature to run smoothly. Let's review the alternatives to my approach:

Use Tracy's dynamic source location API: This is you did in your PR #112707. However, using this API makes Tracy copy and leak the strings. In bigger projects, this can lead to large amounts of data being leaked, effectively making us unable to trace for longer than a few seconds (as described in the Halls of Torment retrospective).

Use and leak SourceLocationData: I (inadvertently) tested this. Tracy supports a maximum of 32k SourceLocationData objects. With a few calls each frame, this is maxed out after a few seconds, preventing longer tracing, the same as the previous approach.

Considering this, we need a way to keep track of SourceLocationData objects and reuse them. To keep the profiler logic free from non-profiler builds, I decided to intern them. This approach works well: As described in the OP, we should be able to profile (almost) indefinitely with this design. Since this design is vital for sustainable profiling, I believe it should inform the API design, and is therefore needed in the initial PR.

Copy link
Member

@clayjohn clayjohn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's goooooo

Copy link
Member

@akien-mga akien-mga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Great work @enetheru and @Ivorforce!

@akien-mga akien-mga merged commit 71d4ded into godotengine:master Dec 2, 2025
20 checks passed
@akien-mga
Copy link
Member

Thanks!

@Ivorforce Ivorforce deleted the tracy-gdscript-codeloc branch December 2, 2025 13:38
@mihe
Copy link
Contributor

mihe commented Dec 5, 2025

I don't have time to look into this until next week, but as mentioned here earlier I've cherrypicked this PR and all the preceding profiling PRs to a fairly large-scale project that's running Godot 4.5, and it seems this (or possibly one of the previous profiling PRs) more or less broke resource importing when building the editor with profiling enabled, because memory usage skyrockets during it.

When deleting .godot and reimporting everything, it seems to normally hover at around 2 GB, and with the profiling stuff enabled it quickly climbs to 30+ GB instead.

It's possible that it's something unique to this project (which does have custom importers written in GDScript), or that something went wrong in the cherrypicking to 4.5, but I figured I'd at least give a heads up before I do a deeper dive next week.

@cridenour
Copy link
Contributor

I've backported all the profiling work (including #113632) onto 4.3, and others should be able to cherry pick the commit here: cridenour@9f746bf

One thing to note is on 4.3 the build system glob was unwilling to add TracyClient.hpp to our source files. Not sure when it was fixed but changing one line in core/profiling/SCSub was able to bypass it. You can see the line here: cridenour@9f746bf#diff-9c24e4731c36f13b1b630b3166e98b34fdeba56db12b311c6a416ad054f0a5d7R66

@mihe I wonder if the memory leak is because the current profiling solution doesn't use on demand profiling. Adding

env_tracy.Append(CPPDEFINES=["TRACY_ON_DEMAND"])

in core/profiling/SCSub next to TRACY_ENABLE may help here and may be the desired behavior, especially if you have profiling enabled in the editor and not just a special profiling release export.

@mihe
Copy link
Contributor

mihe commented Dec 8, 2025

it seems this (or possibly one of the previous profiling PRs) more or less broke resource importing when building the editor with profiling enabled, because memory usage skyrockets during it.

A couple of observations on the memory build-up issue so far:

  1. I've been able to reproduce this on macOS Tahoe, Linux (Fedora 42) and Windows 11.
  2. Removing the added line to gdscript_vm.cpp makes no difference, so I guess that rules out this particular PR.
  3. Building with TRACY_ON_DEMAND did not help. It did however cause Tracy to sometimes immediately terminate the profiling session when connecting due to the same profiling zone having been exited multiple times, or something along those lines.
  4. Having Tracy connected or not connected makes no difference, with or without TRACY_ON_DEMAND.
  5. On Windows (even with 15-ish GB of memory to spare) the editor crashes at a memcpy in the Tracy code halfway through the resource importing, due to a tracy_malloc seemingly having returned a nullptr, while Tracy is trying to allocate more capacity for its m_serialQueue as part of GodotProfileFree. I assume this crashes because tracy::FastVector always doubles the memory on every grow, and the next doubling would exceed my 64 GB of total memory.
  6. Looking at the calls to tracy::FastVector<T>::AllocMore(), I can confirm that it is indeed tracy::Profiler::m_serialQueue that grows so very big.
  7. I figured it might be that Tracy only flushes this m_serialQueue every frame or something, and ProgressDialog (which is a potentially long-running modal dialog) does not have a GodotProfileFrameMark in its _update_ui method, but adding one made no difference, unfortunately.
  8. Setting GodotProfileAlloc and GodotProfileFree to be no-ops does resolve the issue.

So it would seem that this is caused by Tracy's memory profiling somehow, either because we're failing to do something to flush this queue, or we're just doing so many allocations during resource importing that this sort of tracking scales poorly to bigger projects.

Any ideas?

(I'll try to create a dedicated issue for this, but making an MRP for this might be tough. I also still need to try this in latest master, as opposed to 4.5.)

@Ivorforce
Copy link
Member Author

Any ideas?

Worst case we could add an option to disable/enable allocation tracing. Probably disabling by default would make most sense if it unexpectedly fails sometimes.
We could ask the tracy dev why it might add to RAM bloat, and how we could address this. 30gb does seem unexpected in either case.

@mihe
Copy link
Contributor

mihe commented Dec 9, 2025

Worst case we could add an option to disable/enable allocation tracing. Probably disabling by default would make most sense if it unexpectedly fails sometimes.

I've opened a formal issue (#113805) as well as a pull request to add the suggested build option (#113807). I haven't had time to try to make an MRP yet, but I have now run the project in latest master and can confirm that it happens there as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants