gaaahee's picture
Update model: Test Acc 73.93%, F1 0.7395, max_length=512
4bfd654 verified
---
language: ko
license: mit
tags:
- pytorch
- bert
- kobert
- text-classification
- stance-detection
- korean
- news
- political
datasets:
- custom
metrics:
- accuracy
- f1
model-index:
- name: stance-classifier-v2
results:
- task:
type: text-classification
name: Stance Classification
metrics:
- type: accuracy
value: 73.93
name: Test Accuracy
- type: f1
value: 0.7395
name: Test F1
---
# Korean Political News Stance Classifier v2
KoBERT ๊ธฐ๋ฐ˜ ํ•œ๊ตญ์–ด ์ •์น˜ ๋‰ด์Šค ์Šคํƒ ์Šค(์ž…์žฅ) ๋ถ„๋ฅ˜ ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค.
## Model Description
- **Base Model**: monologg/kobert
- **Task**: 3-class stance classification (์˜นํ˜ธ/์ค‘๋ฆฝ/๋น„ํŒ)
- **Language**: Korean
- **Training Data**: ~12,000 labeled political news articles
## Performance
| Metric | Score |
|--------|-------|
| Test Accuracy | 73.93% |
| Test F1 (macro) | 0.7395 |
## Labels
| Label ID | Korean | English | Description |
|----------|--------|---------|-------------|
| 0 | ์˜นํ˜ธ | support | ์ •๋ถ€/์—ฌ๋‹น์— ์šฐํ˜ธ์  |
| 1 | ์ค‘๋ฆฝ | neutral | ๊ฐ๊ด€์  ์‚ฌ์‹ค ์ „๋‹ฌ |
| 2 | ๋น„ํŒ | oppose | ์ •๋ถ€/์—ฌ๋‹น์— ๋น„ํŒ์  |
## Usage
```python
import torch
from transformers import BertModel, AutoTokenizer
from huggingface_hub import hf_hub_download
import torch.nn as nn
# ๋ชจ๋ธ ์ •์˜
class StanceClassifier(nn.Module):
def __init__(self, bert_model, num_classes=3, dropout_rate=0.3):
super().__init__()
self.bert = bert_model
self.dropout = nn.Dropout(dropout_rate)
self.classifier = nn.Linear(768, num_classes)
def forward(self, input_ids, attention_mask, token_type_ids=None):
outputs = self.bert(input_ids=input_ids, attention_mask=attention_mask, token_type_ids=token_type_ids)
pooled_output = outputs.pooler_output
pooled_output = self.dropout(pooled_output)
return self.classifier(pooled_output)
# ๋ชจ๋ธ ๋กœ๋“œ
model_path = hf_hub_download(repo_id="gaaahee/stance-classifier-v2", filename="pytorch_model.pt")
checkpoint = torch.load(model_path, map_location='cpu')
bert_model = BertModel.from_pretrained('monologg/kobert')
model = StanceClassifier(bert_model)
model.load_state_dict(checkpoint['model_state_dict'])
model.eval()
# ํ† ํฌ๋‚˜์ด์ € ๋กœ๋“œ
tokenizer = AutoTokenizer.from_pretrained('monologg/kobert', trust_remote_code=True)
# ์˜ˆ์ธก
text = "์ •๋ถ€์˜ ์ƒˆ ์ •์ฑ…์ด ๊ฒฝ์ œ ์„ฑ์žฅ์— ํฌ๊ฒŒ ๊ธฐ์—ฌํ•  ๊ฒƒ์œผ๋กœ ๊ธฐ๋Œ€๋œ๋‹ค"
encoding = tokenizer(text, truncation=True, max_length=512, padding='max_length', return_tensors='pt')
with torch.no_grad():
logits = model(encoding['input_ids'], encoding['attention_mask'])
probs = torch.softmax(logits, dim=1)
pred = torch.argmax(probs, dim=1).item()
labels = ['์˜นํ˜ธ', '์ค‘๋ฆฝ', '๋น„ํŒ']
print(f"Prediction: {labels[pred]} ({probs[0][pred].item()*100:.1f}%)")
```
## Training Details
| Parameter | Value |
|-----------|-------|
| Base Model | monologg/kobert |
| Max Length | 512 |
| Batch Size | 64 |
| Learning Rate | 2e-05 |
| Dropout | 0.3 |
| Loss Function | Focal Loss (gamma=2.0) |
| Early Stopping | patience=3 |
## Citation
```bibtex
@misc{korean-stance-classifier-v2,
title={Korean Political News Stance Classifier v2},
year={2024},
publisher={HuggingFace}
}
```