Thanks for the great work!
We are very interested in reproducing the results from the paper. By default, we are using the GPT-5 Azure API and vLLM hosted with Uitars 1.5. However, we still notice a performance gap on OSworld in a single run.
Could you share any additional details about the setup? This would be very helpful for reproducing the paper results.