Spaces:
Running
Running
Update README.md
Browse files
README.md
CHANGED
|
@@ -23,7 +23,9 @@ sdk: static
|
|
| 23 |
์ง๋ฌธ ๋ฅ๋ ฅ ํ๊ฐ ๊ธฐ์ค์ ๋ค์๊ณผ ๊ฐ์ต๋๋ค. ์๋ ๋ค์ฏ๊ฐ์ง ํญ๋ชฉ์ ๋ํด ๋ชจ๋ ์ถฉ์กฑํ ๊ฒฝ์ฐ ๊ฐ์ฅ ์ง๋ฌธ์ ์ํ ๋ฉด์ ๊ด์
๋๋ค.
|
| 24 |
|
| 25 |
### ์ข์ ์ง๋ฌธ์ ๊ธฐ์ค
|
| 26 |
-
* ํ ์ฃผ์ ์ ๋ํด ๋ต๋ณ์ด ์ถฉ๋ถํ ๊ตฌ์ฒดํ๊ฐ ๋ ๋๊น์ง ์ง๋ฌธํ๋๊ฐ
|
|
|
|
|
|
|
| 27 |
* ๊ฒ์ฆ ๊ฐ๋ฅํ ์ ๋ณด๋ค์ ๋ฝ์๋ผ ์ ์๋ ์ง๋ฌธ ์์ฃผ๋ก ํ๋๊ฐ (= ๋ชจ์์ ํ๋จํ ์ ์๊ฑฐ๋, ์ธ๋ถ ๊ฒ์์ ํตํด ๊ฒ์ฆํ ์ ์์ ๋งํ ์ง๋ฌธ์ธ๊ฐ)
|
| 28 |
|
| 29 |
(e.g., "๋ ์ง, ์ฃผ์, ์์ ID, ๊ธฐ๊ด ์ด๋ฆ, ์ด๋ฉ์ผ, ๋ค๋๋ ํ์ฌ ์์ฌ ๋ฑ ๊ด๊ณ์ ์ด๋ฆ" ๊ด๋ จ ์ง๋ฌธ๋ค)
|
|
@@ -34,6 +36,7 @@ sdk: static
|
|
| 34 |
### ์ง๋ฌธ์ ์ํ์ง ๋ชปํ ๊ฒฝ์ฐ
|
| 35 |
๋ฐ๋๋ก ์ง๋ฌธ์ ๋ชปํ ์ผ์ด์ค๋ ๋ค์๊ณผ ๊ฐ์ต๋๋ค.
|
| 36 |
* ํ๋์ ์ฃผ์ ์ ๋ํด ์ถฉ๋ถํ ๊ตฌ์ฒดํ๊ฐ ๋์ง ์์๋๋ฐ ๋ฐ๋ก ์์ ํ ๋ค๋ฅธ ์ฃผ์ ๋ก ๋์ด๊ฐ๋ฒ๋ฆฐ ๊ฒฝ์ฐ
|
|
|
|
| 37 |
* ๋ชจ์ ์ฌ๋ถ๋ ์ฌ์ค ๊ด๊ณ๋ฅผ ๊ฒ์ฆํ๊ธฐ ์ด๋ ค์ด ์ถ์์ ์ธ ์ง๋ฌธ์ ํ ๊ฒฝ์ฐ
|
| 38 |
|
| 39 |
(e.g., "๋์ ์ทจ๋ฏธ๋ ๋ญ์ผ?", "๋์ ์ธ์์์ ๊ฐ์ฅ ์ค์ํ ๊ฐ์น๋ ๋ญ์ผ?")
|
|
@@ -41,6 +44,7 @@ sdk: static
|
|
| 41 |
|
| 42 |
(e.g., "๋๋ ๊ตฌ๊ธ์ ๋ค๋
." โ "๊ตฌ๊ธ ์ค๋ฆฝ ์ฐ๋๋ ์ธ์ ์ผ?")
|
| 43 |
* ์์ธ) "๋๋ ๊ตฌ๊ธ ์ฐฝ๋ฆฝ์์ผ." โ "๊ตฌ๊ธ์ ์ค๋ฆฝ ์ฐ๋๋ ์ธ์ ์ผ?" ์ฒ๋ผ ์ธํฐ๋ทฐ์ด๊ฐ ์ง์ ์ฐธ์ฌํ ์ด๋ฒคํธ/์ฌ๊ฑด/๊ฒฝํ๊ณผ ๋ฐ์ ํ ์ง๋ฌธ์ ํ์ฉํจ. ๋ฐ๋ผ์ ์ด์ ์ง๋ฌธ๊ณผ ๋ต๋ณ๋ค์ ํจ๊ป ๊ณ ๋ คํด์ ํ๊ฐํด์ผ ํจ.
|
|
|
|
| 44 |
* ์ง๋ฌธ๋ค ์ฌ์ด์ ๊ด๋ จ์ฑ์ด ๋ฎ์ ์ํธ ๋ชจ์์ ํ๋จํ๊ธฐ ์ด๋ ค์ด ๊ฒฝ์ฐ
|
| 45 |
* ์ด์ ๋ํ์์ ๋ชจ์์ด ๋ฐ๊ฒฌ๋์์์๋ ์ฐ๊ด์ฑ ์๋ ๋ค๋ฅธ ์ง๋ฌธ์ผ๋ก ๋์ด๊ฐ๋ฒ๋ฆฐ ๊ฒฝ์ฐ
|
| 46 |
|
|
@@ -55,60 +59,66 @@ sdk: static
|
|
| 55 |
|
| 56 |
---
|
| 57 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 58 |
# Labeling Guideline
|
| 59 |
|
| 60 |
Thank you for participating in this labeling project!
|
| 61 |
|
| 62 |
-
You will
|
| 63 |
|
| 64 |
There are two main tasks to complete:
|
| 65 |
|
| 66 |
-
* **Comparison:** Determine which of the two interviewers (A or B)
|
| 67 |
-
* **Rating:**
|
| 68 |
-
|
| 69 |
-
---
|
| 70 |
|
| 71 |
# Evaluation Criteria
|
| 72 |
|
| 73 |
-
The
|
| 74 |
|
| 75 |
### Criteria for Good Questions
|
| 76 |
|
| 77 |
-
* **Depth:**
|
| 78 |
-
*
|
| 79 |
-
* *
|
| 80 |
|
| 81 |
|
| 82 |
-
* **
|
| 83 |
-
*
|
| 84 |
-
|
| 85 |
|
| 86 |
-
|
|
|
|
|
|
|
| 87 |
|
| 88 |
-
|
| 89 |
|
| 90 |
-
|
| 91 |
-
* **Abstract Questions:** Asking questions that make it difficult to verify facts or detect contradictions.
|
| 92 |
-
* *e.g., "What are your hobbies?", "What is the most important value in your life?"*
|
| 93 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 94 |
|
| 95 |
-
* **External Knowledge Dependency:** Asking questions that rely on general external knowledge rather than the interviewee's personal information or experiences.
|
| 96 |
-
* *e.g., "I work at Google." โ "In what year was Google founded?"*
|
| 97 |
-
* **Exception:** If the question relates to an event the interviewee was directly involved in, it is acceptable. (e.g., "I am the founder of Google." โ "In what year was Google founded?") Please evaluate by considering the context of the previous dialogue.
|
| 98 |
|
|
|
|
|
|
|
|
|
|
| 99 |
|
| 100 |
-
* **Low Relevancy:** Questions lack connection to one another, making it difficult to judge internal consistency or logic.
|
| 101 |
-
* **Ignoring Contradictions:** Moving on to unrelated questions even after a contradiction was clearly detected in the previous dialogue.
|
| 102 |
|
| 103 |
-
|
|
|
|
|
|
|
| 104 |
|
| 105 |
# Important Notes
|
| 106 |
|
| 107 |
-
*
|
| 108 |
-
*
|
| 109 |
|
| 110 |
-
#
|
| 111 |
|
| 112 |
-
* You may use the Chrome translation feature to translate the
|
| 113 |
|
| 114 |
|
|
|
| 23 |
์ง๋ฌธ ๋ฅ๋ ฅ ํ๊ฐ ๊ธฐ์ค์ ๋ค์๊ณผ ๊ฐ์ต๋๋ค. ์๋ ๋ค์ฏ๊ฐ์ง ํญ๋ชฉ์ ๋ํด ๋ชจ๋ ์ถฉ์กฑํ ๊ฒฝ์ฐ ๊ฐ์ฅ ์ง๋ฌธ์ ์ํ ๋ฉด์ ๊ด์
๋๋ค.
|
| 24 |
|
| 25 |
### ์ข์ ์ง๋ฌธ์ ๊ธฐ์ค
|
| 26 |
+
* ํ ์ฃผ์ ์ ๋ํด ๋ต๋ณ์ด ์ถฉ๋ถํ ๊ตฌ์ฒดํ๊ฐ ๋ ๋๊น์ง ์ง๋ฌธํ๋๊ฐ
|
| 27 |
+
* ๋ง์ฝ ์ง๋ฌธ์ ๋ํ ๋ต๋ณ์ ์ป์ง ๋ชปํด์ ์ฌ์ง๋ฌธํ๊ณ ์ ํ ๊ฒฝ์ฐ, ์ง๋ฌธ์ ๋ค๋ฅธ ํํ์ผ๋ก ๋ฐ๊พธ์ด์(paraphraseํด์) ๋ฌผ์ด๋ด์ผ ํ๋ค.
|
| 28 |
+
* ๋จ, ๊ทธ๋ผ์๋ ๋ถ๊ตฌํ๊ณ ๊ด๋ จ ์ง๋ฌธ์ ๋ํ ๋ต๋ณ์ ๊ณ์ ๊ฑฐ๋ถํ ๊ฒฝ์ฐ ๋ค๋ฅธ ์ฃผ์ ๋ก ๋์ด๊ฐ ์ ์๋ค.
|
| 29 |
* ๊ฒ์ฆ ๊ฐ๋ฅํ ์ ๋ณด๋ค์ ๋ฝ์๋ผ ์ ์๋ ์ง๋ฌธ ์์ฃผ๋ก ํ๋๊ฐ (= ๋ชจ์์ ํ๋จํ ์ ์๊ฑฐ๋, ์ธ๋ถ ๊ฒ์์ ํตํด ๊ฒ์ฆํ ์ ์์ ๋งํ ์ง๋ฌธ์ธ๊ฐ)
|
| 30 |
|
| 31 |
(e.g., "๋ ์ง, ์ฃผ์, ์์ ID, ๊ธฐ๊ด ์ด๋ฆ, ์ด๋ฉ์ผ, ๋ค๋๋ ํ์ฌ ์์ฌ ๋ฑ ๊ด๊ณ์ ์ด๋ฆ" ๊ด๋ จ ์ง๋ฌธ๋ค)
|
|
|
|
| 36 |
### ์ง๋ฌธ์ ์ํ์ง ๋ชปํ ๊ฒฝ์ฐ
|
| 37 |
๋ฐ๋๋ก ์ง๋ฌธ์ ๋ชปํ ์ผ์ด์ค๋ ๋ค์๊ณผ ๊ฐ์ต๋๋ค.
|
| 38 |
* ํ๋์ ์ฃผ์ ์ ๋ํด ์ถฉ๋ถํ ๊ตฌ์ฒดํ๊ฐ ๋์ง ์์๋๋ฐ ๋ฐ๋ก ์์ ํ ๋ค๋ฅธ ์ฃผ์ ๋ก ๋์ด๊ฐ๋ฒ๋ฆฐ ๊ฒฝ์ฐ
|
| 39 |
+
* ๋์ผํ ์ง๋ฌธ์ ๋ค๋ฅธ ํํ์ผ๋ก ๋ฐ๊พธ์ง ์๊ณ (paraphrase ํ์ง ์๊ณ ) ๊ทธ๋๋ก ๋ฐ๋ณตํ ๊ฒฝ์ฐ
|
| 40 |
* ๋ชจ์ ์ฌ๋ถ๋ ์ฌ์ค ๊ด๊ณ๋ฅผ ๊ฒ์ฆํ๊ธฐ ์ด๋ ค์ด ์ถ์์ ์ธ ์ง๋ฌธ์ ํ ๊ฒฝ์ฐ
|
| 41 |
|
| 42 |
(e.g., "๋์ ์ทจ๋ฏธ๋ ๋ญ์ผ?", "๋์ ์ธ์์์ ๊ฐ์ฅ ์ค์ํ ๊ฐ์น๋ ๋ญ์ผ?")
|
|
|
|
| 44 |
|
| 45 |
(e.g., "๋๋ ๊ตฌ๊ธ์ ๋ค๋
." โ "๊ตฌ๊ธ ์ค๋ฆฝ ์ฐ๋๋ ์ธ์ ์ผ?")
|
| 46 |
* ์์ธ) "๋๋ ๊ตฌ๊ธ ์ฐฝ๋ฆฝ์์ผ." โ "๊ตฌ๊ธ์ ์ค๋ฆฝ ์ฐ๋๋ ์ธ์ ์ผ?" ์ฒ๋ผ ์ธํฐ๋ทฐ์ด๊ฐ ์ง์ ์ฐธ์ฌํ ์ด๋ฒคํธ/์ฌ๊ฑด/๊ฒฝํ๊ณผ ๋ฐ์ ํ ์ง๋ฌธ์ ํ์ฉํจ. ๋ฐ๋ผ์ ์ด์ ์ง๋ฌธ๊ณผ ๋ต๋ณ๋ค์ ํจ๊ป ๊ณ ๋ คํด์ ํ๊ฐํด์ผ ํจ.
|
| 47 |
+
* ์ธ๋ถ ์ง์์ ๊ฐ์ ธ์ ํ์ธํ๋ ์ง๋ฌธ์ ํ ๊ฒฝ์ฐ ( ์ธ๋ถ ์ง์์ ํ์ธํ๋ ๊ณผ์ ์ ๋ฐ๋ก ์กด์ฌํ๋ฏ๋ก, ๋ฉ์ธ ์ง๋ฌธ ๊ณผ์ ์์๋ ์ธํฐ๋ทฐ์ด์ ๊ด๋ จ๋ ์ง๋ฌธ๋ง์ ํด์ผ ํฉ๋๋ค. )
|
| 48 |
* ์ง๋ฌธ๋ค ์ฌ์ด์ ๊ด๋ จ์ฑ์ด ๋ฎ์ ์ํธ ๋ชจ์์ ํ๋จํ๊ธฐ ์ด๋ ค์ด ๊ฒฝ์ฐ
|
| 49 |
* ์ด์ ๋ํ์์ ๋ชจ์์ด ๋ฐ๊ฒฌ๋์์์๋ ์ฐ๊ด์ฑ ์๋ ๋ค๋ฅธ ์ง๋ฌธ์ผ๋ก ๋์ด๊ฐ๋ฒ๋ฆฐ ๊ฒฝ์ฐ
|
| 50 |
|
|
|
|
| 59 |
|
| 60 |
---
|
| 61 |
|
| 62 |
+
Here is the English translation of your labeling guideline, formatted for clarity and professional use.
|
| 63 |
+
|
| 64 |
+
---
|
| 65 |
+
|
| 66 |
# Labeling Guideline
|
| 67 |
|
| 68 |
Thank you for participating in this labeling project!
|
| 69 |
|
| 70 |
+
You will review interview transcripts of **two different AI interviewers (A and B)** interacting with the same interviewee. Your task is to evaluate the questioning capabilities of these AI interviewers.
|
| 71 |
|
| 72 |
There are two main tasks to complete:
|
| 73 |
|
| 74 |
+
* **Comparison:** Determine which of the two interviewers (A or B) asks better questions.
|
| 75 |
+
* **Rating:** Evaluate the quality of each interviewer on a 5-point scale.
|
|
|
|
|
|
|
| 76 |
|
| 77 |
# Evaluation Criteria
|
| 78 |
|
| 79 |
+
The criteria for evaluating questioning ability are as follows. An interviewer who satisfies all five items below is considered to have performed the best.
|
| 80 |
|
| 81 |
### Criteria for Good Questions
|
| 82 |
|
| 83 |
+
* **Depth & Persistence:** Did the interviewer ask follow-up questions until the topic was sufficiently detailed?
|
| 84 |
+
* If the interviewer needs to ask again because they didnโt get a clear answer, they should **paraphrase** the question.
|
| 85 |
+
* *Exception:* If the interviewee repeatedly refuses to answer despite paraphrasing, the interviewer may move to a different topic.
|
| 86 |
|
| 87 |
|
| 88 |
+
* **Verifiability:** Did the questions focus on extracting verifiable information? (i.e., information that can reveal contradictions or be verified through external search).
|
| 89 |
+
* Examples: Questions regarding dates, addresses, affiliation IDs, organization names, emails, or names of relevant parties like supervisors.
|
| 90 |
+
|
| 91 |
|
| 92 |
+
* **Personalization:** Are the questions tailored to the interviewee? (i.e., highly relevant to the intervieweeโs specific experiences and previous answers).
|
| 93 |
+
* **Cohesion:** Is there a high degree of interconnection between the questions?
|
| 94 |
+
* **Addressing Contradictions:** If a contradiction or point of doubt was found in previous dialogue, did the interviewer focus on questions related to that contradiction?
|
| 95 |
|
| 96 |
+
### Criteria for Poor Questions
|
| 97 |
|
| 98 |
+
Conversely, the following cases indicate poor questioning performance:
|
|
|
|
|
|
|
| 99 |
|
| 100 |
+
* **Premature Topic Shifts:** Moving to a completely different topic before the current subject has been sufficiently detailed.
|
| 101 |
+
* **Repetition without Paraphrasing:** Repeating the exact same question without changing the phrasing.
|
| 102 |
+
* **Abstract/Unverifiable Questions:** Asking abstract questions where it is difficult to judge contradictions or verify facts.
|
| 103 |
+
* Examples: "What are your hobbies?", "What is the most important value in your life?"
|
| 104 |
|
|
|
|
|
|
|
|
|
|
| 105 |
|
| 106 |
+
* **External Knowledge Over Personal Experience:** Asking questions that require external knowledge rather than the intervieweeโs own information/experience.
|
| 107 |
+
* Example: "I work at Google." โ "What year was Google founded?"
|
| 108 |
+
* *Exception:* Questions closely related to events/experiences the interviewee directly participated in are allowed. (e.g., "I am the founder of Google." โ "What year was Google founded?") You must evaluate this based on the context of the previous dialogue.
|
| 109 |
|
|
|
|
|
|
|
| 110 |
|
| 111 |
+
* **Fact-Checking External Knowledge:** Using the main questioning phase to verify external facts rather than focusing on the interviewee. (There is a separate process for external fact-checking).
|
| 112 |
+
* **Low Correlation:** Questions that lack relevance to each other, making it difficult to identify mutual contradictions.
|
| 113 |
+
* **Ignoring Inconsistencies:** Moving to an unrelated question even though a contradiction was detected in the previous conversation.
|
| 114 |
|
| 115 |
# Important Notes
|
| 116 |
|
| 117 |
+
* When evaluating the interviewer, **do not judge the interviewee's answers.** Focus solely on the pattern and quality of the interviewer's questions.
|
| 118 |
+
* Consider the **overall questioning strategy** as a whole, rather than just looking at individual questions in isolation.
|
| 119 |
|
| 120 |
+
# Reference
|
| 121 |
|
| 122 |
+
* You may use the Chrome translation feature to translate the content into Korean for your evaluation!
|
| 123 |
|
| 124 |