--- language: ko license: mit tags: - pytorch - bert - kobert - text-classification - stance-detection - korean - news - political datasets: - custom metrics: - accuracy - f1 model-index: - name: stance-classifier-v2 results: - task: type: text-classification name: Stance Classification metrics: - type: accuracy value: 73.93 name: Test Accuracy - type: f1 value: 0.7395 name: Test F1 --- # Korean Political News Stance Classifier v2 KoBERT 기반 한국어 정치 뉴스 스탠스(입장) 분류 모델입니다. ## Model Description - **Base Model**: monologg/kobert - **Task**: 3-class stance classification (옹호/중립/비판) - **Language**: Korean - **Training Data**: ~12,000 labeled political news articles ## Performance | Metric | Score | |--------|-------| | Test Accuracy | 73.93% | | Test F1 (macro) | 0.7395 | ## Labels | Label ID | Korean | English | Description | |----------|--------|---------|-------------| | 0 | 옹호 | support | 정부/여당에 우호적 | | 1 | 중립 | neutral | 객관적 사실 전달 | | 2 | 비판 | oppose | 정부/여당에 비판적 | ## Usage ```python import torch from transformers import BertModel, AutoTokenizer from huggingface_hub import hf_hub_download import torch.nn as nn # 모델 정의 class StanceClassifier(nn.Module): def __init__(self, bert_model, num_classes=3, dropout_rate=0.3): super().__init__() self.bert = bert_model self.dropout = nn.Dropout(dropout_rate) self.classifier = nn.Linear(768, num_classes) def forward(self, input_ids, attention_mask, token_type_ids=None): outputs = self.bert(input_ids=input_ids, attention_mask=attention_mask, token_type_ids=token_type_ids) pooled_output = outputs.pooler_output pooled_output = self.dropout(pooled_output) return self.classifier(pooled_output) # 모델 로드 model_path = hf_hub_download(repo_id="gaaahee/stance-classifier-v2", filename="pytorch_model.pt") checkpoint = torch.load(model_path, map_location='cpu') bert_model = BertModel.from_pretrained('monologg/kobert') model = StanceClassifier(bert_model) model.load_state_dict(checkpoint['model_state_dict']) model.eval() # 토크나이저 로드 tokenizer = AutoTokenizer.from_pretrained('monologg/kobert', trust_remote_code=True) # 예측 text = "정부의 새 정책이 경제 성장에 크게 기여할 것으로 기대된다" encoding = tokenizer(text, truncation=True, max_length=512, padding='max_length', return_tensors='pt') with torch.no_grad(): logits = model(encoding['input_ids'], encoding['attention_mask']) probs = torch.softmax(logits, dim=1) pred = torch.argmax(probs, dim=1).item() labels = ['옹호', '중립', '비판'] print(f"Prediction: {labels[pred]} ({probs[0][pred].item()*100:.1f}%)") ``` ## Training Details | Parameter | Value | |-----------|-------| | Base Model | monologg/kobert | | Max Length | 512 | | Batch Size | 64 | | Learning Rate | 2e-05 | | Dropout | 0.3 | | Loss Function | Focal Loss (gamma=2.0) | | Early Stopping | patience=3 | ## Citation ```bibtex @misc{korean-stance-classifier-v2, title={Korean Political News Stance Classifier v2}, year={2024}, publisher={HuggingFace} } ```