-
Notifications
You must be signed in to change notification settings - Fork 19
Description
Hello, I endeavor to replicate the results of the base model using the "declare-lab/flan-alpaca-base" obtained from Hugging Face. I followed the commands provided in the readme for training; however, the loss does not exhibit a descent pattern, and, regrettably, the inference fails to produce any meaningful content. Below, I present a partial excerpt from my trainer_state for your reference:
{
"epoch": 0.02,
"learning_rate": 3.135779241141424e-06,
"loss": 17.987,
"step": 500
},
{
"epoch": 0.03,
"learning_rate": 6.271558482282848e-06,
"loss": 17.9571,
"step": 1000
},
……
{
"epoch": 9.99,
"learning_rate": 1.320328101533231e-07,
"loss": 16.2255,
"step": 318500
},
{
"epoch": 10.0,
"eval_gen_len": 1.0,
"eval_loss": 17.40145492553711,
"eval_rouge1": 0.007,
"eval_rouge2": 0.0,
"eval_rougeL": 0.0069,
"eval_rougeLsum": 0.007,
"eval_runtime": 411.3956,
"eval_samples_per_second": 21.349,
"eval_steps_per_second": 0.168,
"step": 318900
}
When attempting to conduct inference using the acquired model, the generated content proves entirely ineffective:
'- nooutput> - nooutput> - nooutput> - nooutput> - nooutput> - nooutput> - nooutput> - nooutput> - nooutput> - nooutput> - nooutput> - nooutput> - nooutput> - nooutput> '
What are the reasons for the above problems? Looking forward to your answer, thank you!