|
|
---
|
|
|
language: ko
|
|
|
license: mit
|
|
|
tags:
|
|
|
- pytorch
|
|
|
- bert
|
|
|
- kobert
|
|
|
- text-classification
|
|
|
- stance-detection
|
|
|
- korean
|
|
|
- news
|
|
|
- political
|
|
|
datasets:
|
|
|
- custom
|
|
|
metrics:
|
|
|
- accuracy
|
|
|
- f1
|
|
|
model-index:
|
|
|
- name: stance-classifier-v2
|
|
|
results:
|
|
|
- task:
|
|
|
type: text-classification
|
|
|
name: Stance Classification
|
|
|
metrics:
|
|
|
- type: accuracy
|
|
|
value: 73.93
|
|
|
name: Test Accuracy
|
|
|
- type: f1
|
|
|
value: 0.7395
|
|
|
name: Test F1
|
|
|
---
|
|
|
|
|
|
# Korean Political News Stance Classifier v2
|
|
|
|
|
|
KoBERT ๊ธฐ๋ฐ ํ๊ตญ์ด ์ ์น ๋ด์ค ์คํ ์ค(์
์ฅ) ๋ถ๋ฅ ๋ชจ๋ธ์
๋๋ค.
|
|
|
|
|
|
## Model Description
|
|
|
|
|
|
- **Base Model**: monologg/kobert
|
|
|
- **Task**: 3-class stance classification (์นํธ/์ค๋ฆฝ/๋นํ)
|
|
|
- **Language**: Korean
|
|
|
- **Training Data**: ~12,000 labeled political news articles
|
|
|
|
|
|
## Performance
|
|
|
|
|
|
| Metric | Score |
|
|
|
|--------|-------|
|
|
|
| Test Accuracy | 73.93% |
|
|
|
| Test F1 (macro) | 0.7395 |
|
|
|
|
|
|
## Labels
|
|
|
|
|
|
| Label ID | Korean | English | Description |
|
|
|
|----------|--------|---------|-------------|
|
|
|
| 0 | ์นํธ | support | ์ ๋ถ/์ฌ๋น์ ์ฐํธ์ |
|
|
|
| 1 | ์ค๋ฆฝ | neutral | ๊ฐ๊ด์ ์ฌ์ค ์ ๋ฌ |
|
|
|
| 2 | ๋นํ | oppose | ์ ๋ถ/์ฌ๋น์ ๋นํ์ |
|
|
|
|
|
|
## Usage
|
|
|
|
|
|
```python
|
|
|
import torch
|
|
|
from transformers import BertModel, AutoTokenizer
|
|
|
from huggingface_hub import hf_hub_download
|
|
|
import torch.nn as nn
|
|
|
|
|
|
# ๋ชจ๋ธ ์ ์
|
|
|
class StanceClassifier(nn.Module):
|
|
|
def __init__(self, bert_model, num_classes=3, dropout_rate=0.3):
|
|
|
super().__init__()
|
|
|
self.bert = bert_model
|
|
|
self.dropout = nn.Dropout(dropout_rate)
|
|
|
self.classifier = nn.Linear(768, num_classes)
|
|
|
|
|
|
def forward(self, input_ids, attention_mask, token_type_ids=None):
|
|
|
outputs = self.bert(input_ids=input_ids, attention_mask=attention_mask, token_type_ids=token_type_ids)
|
|
|
pooled_output = outputs.pooler_output
|
|
|
pooled_output = self.dropout(pooled_output)
|
|
|
return self.classifier(pooled_output)
|
|
|
|
|
|
# ๋ชจ๋ธ ๋ก๋
|
|
|
model_path = hf_hub_download(repo_id="gaaahee/stance-classifier-v2", filename="pytorch_model.pt")
|
|
|
checkpoint = torch.load(model_path, map_location='cpu')
|
|
|
|
|
|
bert_model = BertModel.from_pretrained('monologg/kobert')
|
|
|
model = StanceClassifier(bert_model)
|
|
|
model.load_state_dict(checkpoint['model_state_dict'])
|
|
|
model.eval()
|
|
|
|
|
|
# ํ ํฌ๋์ด์ ๋ก๋
|
|
|
tokenizer = AutoTokenizer.from_pretrained('monologg/kobert', trust_remote_code=True)
|
|
|
|
|
|
# ์์ธก
|
|
|
text = "์ ๋ถ์ ์ ์ ์ฑ
์ด ๊ฒฝ์ ์ฑ์ฅ์ ํฌ๊ฒ ๊ธฐ์ฌํ ๊ฒ์ผ๋ก ๊ธฐ๋๋๋ค"
|
|
|
encoding = tokenizer(text, truncation=True, max_length=512, padding='max_length', return_tensors='pt')
|
|
|
|
|
|
with torch.no_grad():
|
|
|
logits = model(encoding['input_ids'], encoding['attention_mask'])
|
|
|
probs = torch.softmax(logits, dim=1)
|
|
|
pred = torch.argmax(probs, dim=1).item()
|
|
|
|
|
|
labels = ['์นํธ', '์ค๋ฆฝ', '๋นํ']
|
|
|
print(f"Prediction: {labels[pred]} ({probs[0][pred].item()*100:.1f}%)")
|
|
|
```
|
|
|
|
|
|
## Training Details
|
|
|
|
|
|
| Parameter | Value |
|
|
|
|-----------|-------|
|
|
|
| Base Model | monologg/kobert |
|
|
|
| Max Length | 512 |
|
|
|
| Batch Size | 64 |
|
|
|
| Learning Rate | 2e-05 |
|
|
|
| Dropout | 0.3 |
|
|
|
| Loss Function | Focal Loss (gamma=2.0) |
|
|
|
| Early Stopping | patience=3 |
|
|
|
|
|
|
## Citation
|
|
|
|
|
|
```bibtex
|
|
|
@misc{korean-stance-classifier-v2,
|
|
|
title={Korean Political News Stance Classifier v2},
|
|
|
year={2024},
|
|
|
publisher={HuggingFace}
|
|
|
}
|
|
|
```
|
|
|
|