Human Expert Evaluation Invitation
Dear Expert:
We are currently conducting an evaluative experiment in the field of neuroscience and cordially invite you to participate. This study aims to systematically assess model-generated responses across five key dimensions: Accuracy, Reasoning, Hallucination, Coverage, and Terminology.
During the experiment, you will be asked to evaluate a set of randomly assigned records. Each record consists of a Query, a Summary (the model’s response), and the Evidence_data. Please assign an integer score from 1 to 10 for each dimension based on the following criteria:
- 10: Flawless; exceptional performance with no omissions.
- 8-9: Strong performance, with only minor imperfections or slight omissions.
- 6-7: Generally correct, but with notable weaknesses or minor errors.
- 4-5: Insufficient logical or factual reliability; contains significant errors or hallucinations.
- 1-3: Fundamentally incorrect or logically incoherent; unacceptable.
Evaluation Dimensions and Focus Areas:
- Accuracy: Assess factual correctness and consistency with neuroscience facts or common sense.
- Reasoning: Assess logical coherence and whether conclusions follow from evidence.
- Hallucination: Check whether the response contains fabricated or unsupported claims.
- Coverage: Check whether key evidence and data types are comprehensively covered.
- Terminology: Check correctness and appropriateness of domain-specific terminology.
Please adhere strictly to these definitions to ensure objective evaluation. Your scores will serve as a gold standard to verify model reliability and guide future improvements. We sincerely appreciate your participation and professional contribution.
Note: If you select "Anonymous", your name will not be recorded in the submission.
Select Question Group
Please select the question group you want to evaluate: the information group focuses on mouse brain-region information, modalities focuses on multimodal brain-region data, and type involves neuron types in mouse brain regions.