Hello!
Awesome X announcement that you folks put out for Simular and kudos on the Agent S & Agent S2 papers.
I was curious about the performance of the Agent S2 system on some notable, static offline datasets for Android device manipulation. While AndroidWorld provides one useful signal for a system's capability to operate mobile phones, there is a wide variety of tasks that are not captured in it's distribution, which I believe are present among other open source device manipulation datasets, such as:
- AndroidControl
- AMEX
- GUIOdyssey
Would it be possible to evaluate the Agent S2 system on these data sources? This question is complementary to this issue related to the evaluation setup for AndroidWorld.