Human Expert Evaluation Invitation

Dear Expert:

We are currently conducting an evaluative experiment in the field of neuroscience and cordially invite you to participate. This study aims to systematically assess model-generated responses across five key dimensions: Accuracy, Reasoning, Hallucination, Coverage, and Terminology.

During the experiment, you will be asked to evaluate a set of randomly assigned records. Each record consists of a Query, a Summary (the model’s response), and the Evidence_data. Please assign an integer score from 1 to 10 for each dimension based on the following criteria:

10: Flawless; exceptional performance with no omissions.
8-9: Strong performance, with only minor imperfections or slight omissions.
6-7: Generally correct, but with notable weaknesses or minor errors.
4-5: Insufficient logical or factual reliability; contains significant errors or hallucinations.
1-3: Fundamentally incorrect or logically incoherent; unacceptable.

Evaluation Dimensions and Focus Areas:

Accuracy: Assess factual correctness and consistency with neuroscience facts or common sense.
Reasoning: Assess logical coherence and whether conclusions follow from evidence.
Hallucination: Check whether the response contains fabricated or unsupported claims.
Coverage: Check whether key evidence and data types are comprehensively covered.
Terminology: Check correctness and appropriateness of domain-specific terminology.

Please adhere strictly to these definitions to ensure objective evaluation. Your scores will serve as a gold standard to verify model reliability and guide future improvements. We sincerely appreciate your participation and professional contribution.

Name Anonymous

Note: If you select "Anonymous", your name will not be recorded in the submission.

Select Question Group

Please select the question group you want to evaluate: the information group focuses on mouse brain-region information, modalities focuses on multimodal brain-region data, and type involves neuron types in mouse brain regions.

Human Expert Evaluation Invitation

Select Question Group

Record 1 / 20

Query

Summary

Evidence_data

Scoring (1-10)

Thank You for Your Participation