--- language: - en pipeline_tag: text-classification library_name: peft base_model: microsoft/deberta-v3-large datasets: - stealthcode/ai-detection tags: - lora - ai-detection - binary-classification - deberta-v3-large metrics: - accuracy - f1 - auroc - average_precision model-index: - name: AI Detector LoRA (DeBERTa-v3-large) results: - task: type: text-classification name: AI Text Detection dataset: name: stealthcode/ai-detection type: stealthcode/ai-detection metrics: - type: auroc value: 0.9985 - type: f1 value: 0.9812 - type: accuracy value: 0.9814 --- # AI Detector LoRA (DeBERTa-v3-large) LoRA adapter for binary AI-text vs Human-text detection, trained on ~2.7M English samples (`label: 1 = AI, 0 = Human`) using `microsoft/deberta-v3-large` as the base model. - **Base model:** `microsoft/deberta-v3-large` - **Task:** Binary classification (AI vs Human) - **Head:** Single-logit + `BCEWithLogitsLoss` - **Adapter type:** LoRA (`peft`) - **Hardware:** 8 x RTX 5090, bf16, multi-GPU - **Final decision threshold:** **0.8697** (max-F1 on calibration set) --- ## Files in this repo - `adapter/` – LoRA weights saved with `peft_model.save_pretrained(...)` - `merged_model/` – fully merged model (base + LoRA) for standalone use - `threshold.json` – chosen deployment threshold and validation F1 - `calibration.json` – temperature scaling parameters and calibration metrics - `results.json` – hyperparameters, validation threshold search, test metrics - `training_log_history.csv` – raw Trainer log history - `predictions_calib.csv` – calibration-set probabilities and labels - `predictions_test.csv` – test probabilities and labels - `figures/` – training and evaluation plots - `README.md` – this file --- ## Metrics (test set, n=279,241) Using threshold **0.8697**: | Metric | Value | | ---------------------- | ------ | | AUROC | 0.9985 | | Average Precision (AP) | 0.9985 | | F1 | 0.9812 | | Accuracy | 0.9814 | | Precision (AI) | 0.9902 | | Recall (AI) | 0.9724 | | Precision (Human) | 0.9728 | | Recall (Human) | 0.9904 | Confusion matrix (test): - **True Negatives (Human correctly)**: 138,276 - **False Positives (Human → AI)**: 1,345 - **False Negatives (AI → Human)**: 3,859 - **True Positives (AI correctly)**: 135,761 ### Calibration - **Method:** temperature scaling - **Temperature (T):** 1.4437 - **Calibration set:** calibration - Test ECE: 0.0075 → 0.0116 (after calibration) - Test Brier: 0.0157 → 0.0156 (after calibration) --- ## Plots ### Training & validation - Learning curves: ![Learning curves](./figures/fig_learning_curves.png) - Eval metrics over time: ![Eval metrics](./figures/fig_eval_metrics.png) ### Validation set - ROC: ![ROC (calib)](./figures/fig_roc_calib.png) - Precision–Recall: ![PR (calib)](./figures/fig_pr_calib.png) - Calibration curve: ![Calibration (calib)](./figures/fig_calibration_calib.png) - F1 vs threshold: ![F1 vs threshold (calib)](./figures/fig_threshold_f1_calib.png) ### Test set - ROC: ![ROC (test)](./figures/fig_roc_test.png) - Precision–Recall: ![PR (test)](./figures/fig_pr_test.png) - Calibration curve: ![Calibration (test)](./figures/fig_calibration_test.png) - Confusion matrix: ![Confusion matrix (test)](./figures/fig_confusion_test.png) --- ## Usage ### Load base + LoRA adapter ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification from peft import PeftModel import torch import json base_model_id = "microsoft/deberta-v3-large" adapter_id = "stealthcode/ai-detection" # or local: "./adapter" tokenizer = AutoTokenizer.from_pretrained(base_model_id) base_model = AutoModelForSequenceClassification.from_pretrained( base_model_id, num_labels=1, # single logit for BCEWithLogitsLoss ) model = PeftModel.from_pretrained(base_model, adapter_id) model.eval() ``` ### Inference with threshold ```python # load threshold with open("threshold.json") as f: thr = json.load(f)["threshold"] # 0.8697 def predict_proba(texts): enc = tokenizer( texts, padding=True, truncation=True, max_length=512, return_tensors="pt", ) with torch.no_grad(): logits = model(**enc).logits.squeeze(-1) probs = torch.sigmoid(logits) return probs.cpu().numpy() def predict_label(texts, threshold=thr): probs = predict_proba(texts) return (probs >= threshold).astype(int) # example texts = ["Some example text to classify"] probs = predict_proba(texts) labels = predict_label(texts) print(probs, labels) # label 1 = AI, 0 = Human ``` ### Load merged model (no PEFT required) ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification import torch, json model_dir = "./merged_model" tokenizer = AutoTokenizer.from_pretrained(model_dir) model = AutoModelForSequenceClassification.from_pretrained(model_dir) model.eval() with open("threshold.json") as f: thr = json.load(f)["threshold"] # 0.8697 def predict_proba(texts): enc = tokenizer(texts, padding=True, truncation=True, max_length=512, return_tensors="pt") with torch.no_grad(): logits = model(**enc).logits.squeeze(-1) probs = torch.sigmoid(logits) return probs.cpu().numpy() ``` ### Optional: apply temperature scaling to logits ```python import json with open("calibration.json") as f: T = json.load(f)["temperature"] # e.g., 1.4437 def predict_proba_calibrated(texts): enc = tokenizer(texts, padding=True, truncation=True, max_length=512, return_tensors="pt") with torch.no_grad(): logits = model(**enc).logits.squeeze(-1) probs = torch.sigmoid(logits / T) return probs.cpu().numpy() ``` --- ## Notes - Classifier head is **trainable** together with LoRA layers (unfrozen after applying PEFT). - **LoRA config:** - `r=32`, `alpha=128`, `dropout=0.0` - Target modules: `query_proj`, `key_proj`, `value_proj` - **Training config:** - `bf16=True` - `optim="adamw_torch_fused"` - `lr_scheduler_type="cosine_with_restarts"` - `num_train_epochs=2` - `per_device_train_batch_size=8`, `gradient_accumulation_steps=4` - `max_grad_norm=0.5` - Threshold `0.8697` was chosen as the **max-F1** point on the calibration set. You can adjust it if you prefer fewer false positives or fewer false negatives.