Korean Political News Stance Classifier v2
KoBERT ๊ธฐ๋ฐ ํ๊ตญ์ด ์ ์น ๋ด์ค ์คํ ์ค(์
์ฅ) ๋ถ๋ฅ ๋ชจ๋ธ์
๋๋ค.
Model Description
- Base Model: monologg/kobert
- Task: 3-class stance classification (์นํธ/์ค๋ฆฝ/๋นํ)
- Language: Korean
- Training Data: ~12,000 labeled political news articles
Performance
| Metric |
Score |
| Test Accuracy |
73.93% |
| Test F1 (macro) |
0.7395 |
Labels
| Label ID |
Korean |
English |
Description |
| 0 |
์นํธ |
support |
์ ๋ถ/์ฌ๋น์ ์ฐํธ์ |
| 1 |
์ค๋ฆฝ |
neutral |
๊ฐ๊ด์ ์ฌ์ค ์ ๋ฌ |
| 2 |
๋นํ |
oppose |
์ ๋ถ/์ฌ๋น์ ๋นํ์ |
Usage
import torch
from transformers import BertModel, AutoTokenizer
from huggingface_hub import hf_hub_download
import torch.nn as nn
class StanceClassifier(nn.Module):
def __init__(self, bert_model, num_classes=3, dropout_rate=0.3):
super().__init__()
self.bert = bert_model
self.dropout = nn.Dropout(dropout_rate)
self.classifier = nn.Linear(768, num_classes)
def forward(self, input_ids, attention_mask, token_type_ids=None):
outputs = self.bert(input_ids=input_ids, attention_mask=attention_mask, token_type_ids=token_type_ids)
pooled_output = outputs.pooler_output
pooled_output = self.dropout(pooled_output)
return self.classifier(pooled_output)
model_path = hf_hub_download(repo_id="gaaahee/stance-classifier-v2", filename="pytorch_model.pt")
checkpoint = torch.load(model_path, map_location='cpu')
bert_model = BertModel.from_pretrained('monologg/kobert')
model = StanceClassifier(bert_model)
model.load_state_dict(checkpoint['model_state_dict'])
model.eval()
tokenizer = AutoTokenizer.from_pretrained('monologg/kobert', trust_remote_code=True)
text = "์ ๋ถ์ ์ ์ ์ฑ
์ด ๊ฒฝ์ ์ฑ์ฅ์ ํฌ๊ฒ ๊ธฐ์ฌํ ๊ฒ์ผ๋ก ๊ธฐ๋๋๋ค"
encoding = tokenizer(text, truncation=True, max_length=512, padding='max_length', return_tensors='pt')
with torch.no_grad():
logits = model(encoding['input_ids'], encoding['attention_mask'])
probs = torch.softmax(logits, dim=1)
pred = torch.argmax(probs, dim=1).item()
labels = ['์นํธ', '์ค๋ฆฝ', '๋นํ']
print(f"Prediction: {labels[pred]} ({probs[0][pred].item()*100:.1f}%)")
Training Details
| Parameter |
Value |
| Base Model |
monologg/kobert |
| Max Length |
512 |
| Batch Size |
64 |
| Learning Rate |
2e-05 |
| Dropout |
0.3 |
| Loss Function |
Focal Loss (gamma=2.0) |
| Early Stopping |
patience=3 |
Citation
@misc{korean-stance-classifier-v2,
title={Korean Political News Stance Classifier v2},
year={2024},
publisher={HuggingFace}
}