Skip to content

Performance decreasing when passing from 0 to few shots? #3452

@MikeCorv

Description

@MikeCorv

Hello! In order to evaluate In-Context Learning in some LLMs, I tried the following code to check for the performances of google/gemma-2-9b on mmlu_elementary_mathematics.

`for n_shots in shots_list:
print(f"\nTesting with {n_shots} shot(s)...")
gc.collect()
torch.cuda.empty_cache()
print("Memory Cleaned!")

eval_output = lm_eval.simple_evaluate(
    model="hf",
    model_args=f"pretrained={model_id},load_in_4bit=True",
    tasks=task_list,
    num_fewshot=n_shots,
    limit=10,
    batch_size=8,
    log_samples= True,
    
    # CORRECT SETTINGS FOR BASE MODELS
    apply_chat_template=False,
    fewshot_as_multiturn=False
)

accuracy = eval_output['results']["mmlu_elementary_mathematics"]['acc,none']
results_data.append({
    "shots": n_shots,
    "accuracy": accuracy,
    "samples": json.dumps(eval_output['samples'], indent=4)
})`

Results are as follows:

0-Shots: Accuracy = 0.7
1-Shot: Accuracy = 0.5
2-Shots: Accuracy = 0.6.

Is it possible that accuracy decreases passing from 0 to 1 and then 2 shots? Am I doing something wrong? Do you have any suggestions if I want to demonstrate the increase in accuracy with in-context learning?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions