Does the confidence score used in MSE refer to the confidence word generated by GPT3 or the probability of answer?