Skip to content

Mobile Device Evaluations -- AndroidControl, GuiOdyssey, et al. #112

@ckgresla

Description

@ckgresla

Hello!

Awesome X announcement that you folks put out for Simular and kudos on the Agent S & Agent S2 papers.

I was curious about the performance of the Agent S2 system on some notable, static offline datasets for Android device manipulation. While AndroidWorld provides one useful signal for a system's capability to operate mobile phones, there is a wide variety of tasks that are not captured in it's distribution, which I believe are present among other open source device manipulation datasets, such as:

  1. AndroidControl
  2. AMEX
  3. GUIOdyssey

Would it be possible to evaluate the Agent S2 system on these data sources? This question is complementary to this issue related to the evaluation setup for AndroidWorld.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions