tnwjddla2190 commited on
Commit
259655e
ยท
verified ยท
1 Parent(s): 88bafe2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +38 -28
README.md CHANGED
@@ -23,7 +23,9 @@ sdk: static
23
  ์งˆ๋ฌธ ๋Šฅ๋ ฅ ํ‰๊ฐ€ ๊ธฐ์ค€์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค. ์•„๋ž˜ ๋‹ค์„ฏ๊ฐ€์ง€ ํ•ญ๋ชฉ์— ๋Œ€ํ•ด ๋ชจ๋‘ ์ถฉ์กฑํ•  ๊ฒฝ์šฐ ๊ฐ€์žฅ ์งˆ๋ฌธ์„ ์ž˜ํ•œ ๋ฉด์ ‘๊ด€์ž…๋‹ˆ๋‹ค.
24
 
25
  ### ์ข‹์€ ์งˆ๋ฌธ์˜ ๊ธฐ์ค€
26
- * ํ•œ ์ฃผ์ œ์— ๋Œ€ํ•ด ๋‹ต๋ณ€์ด ์ถฉ๋ถ„ํžˆ ๊ตฌ์ฒดํ™”๊ฐ€ ๋  ๋•Œ๊นŒ์ง€ ์งˆ๋ฌธํ–ˆ๋Š”๊ฐ€
 
 
27
  * ๊ฒ€์ฆ ๊ฐ€๋Šฅํ•œ ์ •๋ณด๋“ค์„ ๋ฝ‘์•„๋‚ผ ์ˆ˜ ์žˆ๋Š” ์งˆ๋ฌธ ์œ„์ฃผ๋กœ ํ–ˆ๋Š”๊ฐ€ (= ๋ชจ์ˆœ์„ ํŒ๋‹จํ•  ์ˆ˜ ์žˆ๊ฑฐ๋‚˜, ์™ธ๋ถ€ ๊ฒ€์ƒ‰์„ ํ†ตํ•ด ๊ฒ€์ฆํ•  ์ˆ˜ ์žˆ์„ ๋งŒํ•œ ์งˆ๋ฌธ์ธ๊ฐ€)
28
 
29
  (e.g., "๋‚ ์งœ, ์ฃผ์†Œ, ์†Œ์† ID, ๊ธฐ๊ด€ ์ด๋ฆ„, ์ด๋ฉ”์ผ, ๋‹ค๋‹ˆ๋Š” ํšŒ์‚ฌ ์ƒ์‚ฌ ๋“ฑ ๊ด€๊ณ„์ž ์ด๋ฆ„" ๊ด€๋ จ ์งˆ๋ฌธ๋“ค)
@@ -34,6 +36,7 @@ sdk: static
34
  ### ์งˆ๋ฌธ์„ ์ž˜ํ•˜์ง€ ๋ชปํ•œ ๊ฒฝ์šฐ
35
  ๋ฐ˜๋Œ€๋กœ ์งˆ๋ฌธ์„ ๋ชปํ•œ ์ผ€์ด์Šค๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.
36
  * ํ•˜๋‚˜์˜ ์ฃผ์ œ์— ๋Œ€ํ•ด ์ถฉ๋ถ„ํžˆ ๊ตฌ์ฒดํ™”๊ฐ€ ๋˜์ง€ ์•Š์•˜๋Š”๋ฐ ๋ฐ”๋กœ ์™„์ „ํžˆ ๋‹ค๋ฅธ ์ฃผ์ œ๋กœ ๋„˜์–ด๊ฐ€๋ฒ„๋ฆฐ ๊ฒฝ์šฐ
 
37
  * ๋ชจ์ˆœ ์—ฌ๋ถ€๋‚˜ ์‚ฌ์‹ค ๊ด€๊ณ„๋ฅผ ๊ฒ€์ฆํ•˜๊ธฐ ์–ด๋ ค์šด ์ถ”์ƒ์ ์ธ ์งˆ๋ฌธ์„ ํ•œ ๊ฒฝ์šฐ
38
 
39
  (e.g., "๋„ˆ์˜ ์ทจ๋ฏธ๋Š” ๋ญ์•ผ?", "๋„ˆ์˜ ์ธ์ƒ์—์„œ ๊ฐ€์žฅ ์ค‘์š”ํ•œ ๊ฐ€์น˜๋Š” ๋ญ์•ผ?")
@@ -41,6 +44,7 @@ sdk: static
41
 
42
  (e.g., "๋‚˜๋Š” ๊ตฌ๊ธ€์— ๋‹ค๋…€." โ†’ "๊ตฌ๊ธ€ ์„ค๋ฆฝ ์—ฐ๋„๋Š” ์–ธ์ œ์•ผ?")
43
  * ์˜ˆ์™ธ) "๋‚˜๋Š” ๊ตฌ๊ธ€ ์ฐฝ๋ฆฝ์ž์•ผ." โ†’ "๊ตฌ๊ธ€์˜ ์„ค๋ฆฝ ์—ฐ๋„๋Š” ์–ธ์ œ์•ผ?" ์ฒ˜๋Ÿผ ์ธํ„ฐ๋ทฐ์ด๊ฐ€ ์ง์ ‘ ์ฐธ์—ฌํ•œ ์ด๋ฒคํŠธ/์‚ฌ๊ฑด/๊ฒฝํ—˜๊ณผ ๋ฐ€์ ‘ํ•œ ์งˆ๋ฌธ์€ ํ—ˆ์šฉํ•จ. ๋”ฐ๋ผ์„œ ์ด์ „ ์งˆ๋ฌธ๊ณผ ๋‹ต๋ณ€๋“ค์„ ํ•จ๊ป˜ ๊ณ ๋ คํ•ด์„œ ํ‰๊ฐ€ํ•ด์•ผ ํ•จ.
 
44
  * ์งˆ๋ฌธ๋“ค ์‚ฌ์ด์˜ ๊ด€๋ จ์„ฑ์ด ๋‚ฎ์•„ ์ƒํ˜ธ ๋ชจ์ˆœ์„ ํŒ๋‹จํ•˜๊ธฐ ์–ด๋ ค์šด ๊ฒฝ์šฐ
45
  * ์ด์ „ ๋Œ€ํ™”์—์„œ ๋ชจ์ˆœ์ด ๋ฐœ๊ฒฌ๋˜์—ˆ์Œ์—๋„ ์—ฐ๊ด€์„ฑ ์—†๋Š” ๋‹ค๋ฅธ ์งˆ๋ฌธ์œผ๋กœ ๋„˜์–ด๊ฐ€๋ฒ„๋ฆฐ ๊ฒฝ์šฐ
46
 
@@ -55,60 +59,66 @@ sdk: static
55
 
56
  ---
57
 
 
 
 
 
58
  # Labeling Guideline
59
 
60
  Thank you for participating in this labeling project!
61
 
62
- You will be reviewing interview transcripts of **two different AI interviewers (A and B)** conducting sessions with the same interviewee. Your task is to evaluate the questioning capabilities of these AI interviewers.
63
 
64
  There are two main tasks to complete:
65
 
66
- * **Comparison:** Determine which of the two interviewers (A or B) demonstrates superior questioning skills.
67
- * **Rating:** Rate the quality of each interviewer on a 5-point scale.
68
-
69
- ---
70
 
71
  # Evaluation Criteria
72
 
73
- The quality of an interviewer is judged by the following criteria. An ideal interviewer satisfies all five of the points listed below.
74
 
75
  ### Criteria for Good Questions
76
 
77
- * **Depth:** Does the interviewer continue questioning a single topic until the responses are sufficiently detailed and specific?
78
- * **Verifiability:** Do the questions focus on extracting verifiable information? (i.e., information that can reveal contradictions or be verified via external search).
79
- * *e.g., Questions regarding dates, addresses, affiliation IDs, organization names, emails, or names of supervisors/colleagues.*
80
 
81
 
82
- * **Personalization:** Are the questions tailored to the interviewee? (i.e., highly relevant to the intervieweeโ€™s specific experiences and previous answers).
83
- * **Cohesion:** Is there a high degree of logical interconnection between the questions?
84
- * **Critical Follow-up:** If contradictions or questionable points arose in previous dialogue, did the interviewer ask follow-up questions specifically addressing those inconsistencies?
85
 
86
- ### Indicators of Poor Questioning
 
 
87
 
88
- An interviewer is considered less effective if they exhibit the following:
89
 
90
- * **Abrupt Topic Switching:** Moving to a completely different topic before the current subject has been sufficiently explored.
91
- * **Abstract Questions:** Asking questions that make it difficult to verify facts or detect contradictions.
92
- * *e.g., "What are your hobbies?", "What is the most important value in your life?"*
93
 
 
 
 
 
94
 
95
- * **External Knowledge Dependency:** Asking questions that rely on general external knowledge rather than the interviewee's personal information or experiences.
96
- * *e.g., "I work at Google." โ†’ "In what year was Google founded?"*
97
- * **Exception:** If the question relates to an event the interviewee was directly involved in, it is acceptable. (e.g., "I am the founder of Google." โ†’ "In what year was Google founded?") Please evaluate by considering the context of the previous dialogue.
98
 
 
 
 
99
 
100
- * **Low Relevancy:** Questions lack connection to one another, making it difficult to judge internal consistency or logic.
101
- * **Ignoring Contradictions:** Moving on to unrelated questions even after a contradiction was clearly detected in the previous dialogue.
102
 
103
- ---
 
 
104
 
105
  # Important Notes
106
 
107
- * **Evaluate the Interviewer Only:** When evaluating, do not judge the intervieweeโ€™s answers. Focus solely on the pattern and quality of the **interviewerโ€™s questions**.
108
- * **Strategy over Items:** Consider the overall questioning strategy and flow, not just individual questions in isolation.
109
 
110
- # Note
111
 
112
- * You may use the Chrome translation feature to translate the text into Korean while performing your evaluation!
113
 
114
 
23
  ์งˆ๋ฌธ ๋Šฅ๋ ฅ ํ‰๊ฐ€ ๊ธฐ์ค€์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค. ์•„๋ž˜ ๋‹ค์„ฏ๊ฐ€์ง€ ํ•ญ๋ชฉ์— ๋Œ€ํ•ด ๋ชจ๋‘ ์ถฉ์กฑํ•  ๊ฒฝ์šฐ ๊ฐ€์žฅ ์งˆ๋ฌธ์„ ์ž˜ํ•œ ๋ฉด์ ‘๊ด€์ž…๋‹ˆ๋‹ค.
24
 
25
  ### ์ข‹์€ ์งˆ๋ฌธ์˜ ๊ธฐ์ค€
26
+ * ํ•œ ์ฃผ์ œ์— ๋Œ€ํ•ด ๋‹ต๋ณ€์ด ์ถฉ๋ถ„ํžˆ ๊ตฌ์ฒดํ™”๊ฐ€ ๋  ๋•Œ๊นŒ์ง€ ์งˆ๋ฌธํ–ˆ๋Š”๊ฐ€
27
+ * ๋งŒ์•ฝ ์งˆ๋ฌธ์— ๋Œ€ํ•œ ๋‹ต๋ณ€์„ ์–ป์ง€ ๋ชปํ•ด์„œ ์žฌ์งˆ๋ฌธํ•˜๊ณ ์ž ํ•  ๊ฒฝ์šฐ, ์งˆ๋ฌธ์„ ๋‹ค๋ฅธ ํ‘œํ˜„์œผ๋กœ ๋ฐ”๊พธ์–ด์„œ(paraphraseํ•ด์„œ) ๋ฌผ์–ด๋ด์•ผ ํ•œ๋‹ค.
28
+ * ๋‹จ, ๊ทธ๋Ÿผ์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ  ๊ด€๋ จ ์งˆ๋ฌธ์— ๋Œ€ํ•œ ๋‹ต๋ณ€์„ ๊ณ„์† ๊ฑฐ๋ถ€ํ•  ๊ฒฝ์šฐ ๋‹ค๋ฅธ ์ฃผ์ œ๋กœ ๋„˜์–ด๊ฐˆ ์ˆ˜ ์žˆ๋‹ค.
29
  * ๊ฒ€์ฆ ๊ฐ€๋Šฅํ•œ ์ •๋ณด๋“ค์„ ๋ฝ‘์•„๋‚ผ ์ˆ˜ ์žˆ๋Š” ์งˆ๋ฌธ ์œ„์ฃผ๋กœ ํ–ˆ๋Š”๊ฐ€ (= ๋ชจ์ˆœ์„ ํŒ๋‹จํ•  ์ˆ˜ ์žˆ๊ฑฐ๋‚˜, ์™ธ๋ถ€ ๊ฒ€์ƒ‰์„ ํ†ตํ•ด ๊ฒ€์ฆํ•  ์ˆ˜ ์žˆ์„ ๋งŒํ•œ ์งˆ๋ฌธ์ธ๊ฐ€)
30
 
31
  (e.g., "๋‚ ์งœ, ์ฃผ์†Œ, ์†Œ์† ID, ๊ธฐ๊ด€ ์ด๋ฆ„, ์ด๋ฉ”์ผ, ๋‹ค๋‹ˆ๋Š” ํšŒ์‚ฌ ์ƒ์‚ฌ ๋“ฑ ๊ด€๊ณ„์ž ์ด๋ฆ„" ๊ด€๋ จ ์งˆ๋ฌธ๋“ค)
 
36
  ### ์งˆ๋ฌธ์„ ์ž˜ํ•˜์ง€ ๋ชปํ•œ ๊ฒฝ์šฐ
37
  ๋ฐ˜๋Œ€๋กœ ์งˆ๋ฌธ์„ ๋ชปํ•œ ์ผ€์ด์Šค๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.
38
  * ํ•˜๋‚˜์˜ ์ฃผ์ œ์— ๋Œ€ํ•ด ์ถฉ๋ถ„ํžˆ ๊ตฌ์ฒดํ™”๊ฐ€ ๋˜์ง€ ์•Š์•˜๋Š”๋ฐ ๋ฐ”๋กœ ์™„์ „ํžˆ ๋‹ค๋ฅธ ์ฃผ์ œ๋กœ ๋„˜์–ด๊ฐ€๋ฒ„๋ฆฐ ๊ฒฝ์šฐ
39
+ * ๋™์ผํ•œ ์งˆ๋ฌธ์„ ๋‹ค๋ฅธ ํ‘œํ˜„์œผ๋กœ ๋ฐ”๊พธ์ง€ ์•Š๊ณ (paraphrase ํ•˜์ง€ ์•Š๊ณ ) ๊ทธ๋Œ€๋กœ ๋ฐ˜๋ณตํ•  ๊ฒฝ์šฐ
40
  * ๋ชจ์ˆœ ์—ฌ๋ถ€๋‚˜ ์‚ฌ์‹ค ๊ด€๊ณ„๋ฅผ ๊ฒ€์ฆํ•˜๊ธฐ ์–ด๋ ค์šด ์ถ”์ƒ์ ์ธ ์งˆ๋ฌธ์„ ํ•œ ๊ฒฝ์šฐ
41
 
42
  (e.g., "๋„ˆ์˜ ์ทจ๋ฏธ๋Š” ๋ญ์•ผ?", "๋„ˆ์˜ ์ธ์ƒ์—์„œ ๊ฐ€์žฅ ์ค‘์š”ํ•œ ๊ฐ€์น˜๋Š” ๋ญ์•ผ?")
 
44
 
45
  (e.g., "๋‚˜๋Š” ๊ตฌ๊ธ€์— ๋‹ค๋…€." โ†’ "๊ตฌ๊ธ€ ์„ค๋ฆฝ ์—ฐ๋„๋Š” ์–ธ์ œ์•ผ?")
46
  * ์˜ˆ์™ธ) "๋‚˜๋Š” ๊ตฌ๊ธ€ ์ฐฝ๋ฆฝ์ž์•ผ." โ†’ "๊ตฌ๊ธ€์˜ ์„ค๋ฆฝ ์—ฐ๋„๋Š” ์–ธ์ œ์•ผ?" ์ฒ˜๋Ÿผ ์ธํ„ฐ๋ทฐ์ด๊ฐ€ ์ง์ ‘ ์ฐธ์—ฌํ•œ ์ด๋ฒคํŠธ/์‚ฌ๊ฑด/๊ฒฝํ—˜๊ณผ ๋ฐ€์ ‘ํ•œ ์งˆ๋ฌธ์€ ํ—ˆ์šฉํ•จ. ๋”ฐ๋ผ์„œ ์ด์ „ ์งˆ๋ฌธ๊ณผ ๋‹ต๋ณ€๋“ค์„ ํ•จ๊ป˜ ๊ณ ๋ คํ•ด์„œ ํ‰๊ฐ€ํ•ด์•ผ ํ•จ.
47
+ * ์™ธ๋ถ€ ์ง€์‹์„ ๊ฐ€์ ธ์™€ ํ™•์ธํ•˜๋Š” ์งˆ๋ฌธ์„ ํ•œ ๊ฒฝ์šฐ ( ์™ธ๋ถ€ ์ง€์‹์„ ํ™•์ธํ•˜๋Š” ๊ณผ์ •์€ ๋”ฐ๋กœ ์กด์žฌํ•˜๋ฏ€๋กœ, ๋ฉ”์ธ ์งˆ๋ฌธ ๊ณผ์ •์—์„œ๋Š” ์ธํ„ฐ๋ทฐ์ด์™€ ๊ด€๋ จ๋œ ์งˆ๋ฌธ๋งŒ์„ ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. )
48
  * ์งˆ๋ฌธ๋“ค ์‚ฌ์ด์˜ ๊ด€๋ จ์„ฑ์ด ๋‚ฎ์•„ ์ƒํ˜ธ ๋ชจ์ˆœ์„ ํŒ๋‹จํ•˜๊ธฐ ์–ด๋ ค์šด ๊ฒฝ์šฐ
49
  * ์ด์ „ ๋Œ€ํ™”์—์„œ ๋ชจ์ˆœ์ด ๋ฐœ๊ฒฌ๋˜์—ˆ์Œ์—๋„ ์—ฐ๊ด€์„ฑ ์—†๋Š” ๋‹ค๋ฅธ ์งˆ๋ฌธ์œผ๋กœ ๋„˜์–ด๊ฐ€๋ฒ„๋ฆฐ ๊ฒฝ์šฐ
50
 
 
59
 
60
  ---
61
 
62
+ Here is the English translation of your labeling guideline, formatted for clarity and professional use.
63
+
64
+ ---
65
+
66
  # Labeling Guideline
67
 
68
  Thank you for participating in this labeling project!
69
 
70
+ You will review interview transcripts of **two different AI interviewers (A and B)** interacting with the same interviewee. Your task is to evaluate the questioning capabilities of these AI interviewers.
71
 
72
  There are two main tasks to complete:
73
 
74
+ * **Comparison:** Determine which of the two interviewers (A or B) asks better questions.
75
+ * **Rating:** Evaluate the quality of each interviewer on a 5-point scale.
 
 
76
 
77
  # Evaluation Criteria
78
 
79
+ The criteria for evaluating questioning ability are as follows. An interviewer who satisfies all five items below is considered to have performed the best.
80
 
81
  ### Criteria for Good Questions
82
 
83
+ * **Depth & Persistence:** Did the interviewer ask follow-up questions until the topic was sufficiently detailed?
84
+ * If the interviewer needs to ask again because they didnโ€™t get a clear answer, they should **paraphrase** the question.
85
+ * *Exception:* If the interviewee repeatedly refuses to answer despite paraphrasing, the interviewer may move to a different topic.
86
 
87
 
88
+ * **Verifiability:** Did the questions focus on extracting verifiable information? (i.e., information that can reveal contradictions or be verified through external search).
89
+ * Examples: Questions regarding dates, addresses, affiliation IDs, organization names, emails, or names of relevant parties like supervisors.
90
+
91
 
92
+ * **Personalization:** Are the questions tailored to the interviewee? (i.e., highly relevant to the intervieweeโ€™s specific experiences and previous answers).
93
+ * **Cohesion:** Is there a high degree of interconnection between the questions?
94
+ * **Addressing Contradictions:** If a contradiction or point of doubt was found in previous dialogue, did the interviewer focus on questions related to that contradiction?
95
 
96
+ ### Criteria for Poor Questions
97
 
98
+ Conversely, the following cases indicate poor questioning performance:
 
 
99
 
100
+ * **Premature Topic Shifts:** Moving to a completely different topic before the current subject has been sufficiently detailed.
101
+ * **Repetition without Paraphrasing:** Repeating the exact same question without changing the phrasing.
102
+ * **Abstract/Unverifiable Questions:** Asking abstract questions where it is difficult to judge contradictions or verify facts.
103
+ * Examples: "What are your hobbies?", "What is the most important value in your life?"
104
 
 
 
 
105
 
106
+ * **External Knowledge Over Personal Experience:** Asking questions that require external knowledge rather than the intervieweeโ€™s own information/experience.
107
+ * Example: "I work at Google." โ†’ "What year was Google founded?"
108
+ * *Exception:* Questions closely related to events/experiences the interviewee directly participated in are allowed. (e.g., "I am the founder of Google." โ†’ "What year was Google founded?") You must evaluate this based on the context of the previous dialogue.
109
 
 
 
110
 
111
+ * **Fact-Checking External Knowledge:** Using the main questioning phase to verify external facts rather than focusing on the interviewee. (There is a separate process for external fact-checking).
112
+ * **Low Correlation:** Questions that lack relevance to each other, making it difficult to identify mutual contradictions.
113
+ * **Ignoring Inconsistencies:** Moving to an unrelated question even though a contradiction was detected in the previous conversation.
114
 
115
  # Important Notes
116
 
117
+ * When evaluating the interviewer, **do not judge the interviewee's answers.** Focus solely on the pattern and quality of the interviewer's questions.
118
+ * Consider the **overall questioning strategy** as a whole, rather than just looking at individual questions in isolation.
119
 
120
+ # Reference
121
 
122
+ * You may use the Chrome translation feature to translate the content into Korean for your evaluation!
123
 
124