OctoMed-7B Digital Twin v1
A medical reasoning AI fine-tuned with GRPO (Group Relative Policy Optimization) for transparent clinical decision support. This model extends OctoMed's multimodal medical capabilities with enhanced reasoning chains.
Model Description
OctoMed-7B Digital Twin v1 is a 7-billion parameter medical language model fine-tuned using reinforcement learning from human feedback (RLHF). Built on top of OctoMed-7B, a state-of-the-art multimodal medical model, this variant specializes in:
- Transparent Medical Reasoning: Uses
<think>...</think>tags to show step-by-step clinical reasoning - Evidence-Based Responses: Trained to provide accurate, semantically grounded medical information
- Clinical Decision Support: Assists both patients and healthcare professionals with medical queries
- Multimodal Capabilities: Inherits OctoMed's vision-language understanding (image analysis requires base model)
Key Features
- 🧠 Structured Reasoning: Explicit reasoning chains for medical transparency
- 🎯 GRPO Training: Adaptive reward balancing for format (40%) and semantic accuracy (60%)
- 💾 Parameter Efficient: LoRA adapters with rank 32 (~0.5% trainable parameters)
- ⚡ 4-bit Quantization: Optimized for deployment on consumer hardware
- 🏥 Medical Specialization: Fine-tuned on 500 medical reasoning examples
Model Architecture
| Component | Specification |
|---|---|
| Base Model | OctoMed/OctoMed-7B |
| Parameters | 7B (base) + 32M (LoRA adapters) |
| Context Length | 4096 tokens |
| Quantization | 4-bit NF4 |
| LoRA Rank | 32 |
| Target Modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| Training Method | GRPO (Group Relative Policy Optimization) |
Training Details
Training Configuration
Training Steps: 200 (100 warmup steps)
Batch Size: 4 per device
Gradient Accumulation: 4 steps (effective batch = 16)
Learning Rate: 5e-5 with cosine scheduler
Optimizer: AdamW (8-bit)
Mixed Precision: BF16
Dataset: FreedomIntelligence/medical-o1-reasoning-SFT (500 examples)
Reward Functions
The model was trained using two complementary reward signals:
Format Reward (40% final weight):
- Encourages use of
<think>reasoning tags - Rewards substantial reasoning (10+ words)
- Scaled rewards for partial compliance
- Encourages use of
Semantic Reward (60% final weight):
- Cosine similarity to ground truth answers
- Uses all-MiniLM-L6-v2 for embeddings
- Focuses on answer accuracy, not reasoning style
Reward weights were adaptively adjusted during training from 90%/10% to 40%/60% to balance format adherence with semantic accuracy.
Usage
Using Transformers (Standard Method)
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
"OctoMed/OctoMed-7B",
load_in_4bit=True,
device_map="auto"
)
# Load LoRA adapters
model = PeftModel.from_pretrained(base_model, "AhmedSSoliman/octomed-7b-digital-twin-v1")
tokenizer = AutoTokenizer.from_pretrained("AhmedSSoliman/octomed-7b-digital-twin-v1")
# Generate response
question = "What are the early signs of sepsis and how should it be managed?"
messages = [
{"role": "system", "content": "You are a medical AI assistant. Think through your reasoning step-by-step using <think> tags before providing your final answer."},
{"role": "user", "content": question}
]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to("cuda")
outputs = model.generate(inputs, max_new_tokens=512, temperature=0.7, do_sample=True)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
Using Unsloth (Optimized & Recommended)
from unsloth import FastLanguageModel
# Load model with Unsloth optimizations
model, tokenizer = FastLanguageModel.from_pretrained(
model_name="AhmedSSoliman/octomed-7b-digital-twin-v1",
max_seq_length=4096,
dtype=None, # Auto-detect best dtype
load_in_4bit=True,
fast_inference=False # Python 3.13+ compatibility
)
# Enable inference mode for 2x faster generation
FastLanguageModel.for_inference(model)
# Generate response
messages = [
{"role": "user", "content": "Explain the pathophysiology of type 2 diabetes."}
]
inputs = tokenizer.apply_chat_template(messages, tokenize=True, return_tensors="pt").to("cuda")
outputs = model.generate(
input_ids=inputs,
max_new_tokens=512,
temperature=0.7,
top_p=0.9,
do_sample=True
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Example Outputs
Question: What causes chest pain in a 45-year-old with hypertension?
Model Response:
<think>
A 45-year-old patient with hypertension presenting with chest pain requires immediate evaluation for cardiac causes. Key considerations:
1. Cardiac causes (most urgent):
- Acute coronary syndrome (ACS)
- Myocardial infarction
- Unstable angina
2. Risk factors present:
- Age (45, middle-aged male risk)
- Known hypertension (damages vessels)
3. Differential diagnosis:
- Costochondritis (musculoskeletal)
- GERD (gastrointestinal)
- Pulmonary embolism (respiratory)
Given hypertension history, cardiac workup is priority.
</think>
Chest pain in a 45-year-old with hypertension requires urgent evaluation for acute coronary syndrome. Immediate steps include:
1. **Emergency Assessment**: Call emergency services (911) if severe
2. **ECG**: Within 10 minutes of presentation
3. **Cardiac Biomarkers**: Troponin levels
4. **Risk Stratification**: HEART score or similar
Hypertension is a major risk factor for coronary artery disease. Do not delay seeking medical attention—chest pain in this context could indicate a heart attack requiring immediate intervention.
Hardware Requirements
| Configuration | VRAM Required | Speed |
|---|---|---|
| 4-bit (Recommended) | ~14 GB | Fast |
| 8-bit | ~28 GB | Medium |
| FP16 | ~56 GB | Slow |
Recommended Setup:
- GPU: NVIDIA RTX 3090/4090, A100, or similar
- RAM: 32GB+ system memory
- Python: 3.9-3.13
- CUDA: 11.8+
Limitations & Disclaimers
⚠️ Medical Disclaimer
THIS MODEL IS FOR RESEARCH AND EDUCATIONAL PURPOSES ONLY. IT IS NOT A SUBSTITUTE FOR PROFESSIONAL MEDICAL ADVICE, DIAGNOSIS, OR TREATMENT.
- Not FDA Approved: This AI has not been evaluated or approved by any regulatory body
- No Medical License: The model cannot practice medicine or replace licensed healthcare providers
- Potential Errors: AI outputs may contain inaccuracies, hallucinations, or outdated information
- No Emergency Use: Never use this model for medical emergencies—call emergency services immediately
- Always Consult Professionals: Seek advice from qualified healthcare providers for medical decisions
Known Limitations
- Training Data Cutoff: Knowledge may not reflect the latest medical research
- Reasoning Artifacts:
<think>tags may sometimes contain verbose or redundant reasoning - Multimodal Gap: This LoRA adapter focuses on text; image analysis requires full base model
- Demographic Bias: Medical datasets may underrepresent certain populations
- Context Window: 4096 tokens limits handling of very long medical histories
Evaluation
The model was evaluated on clinical reasoning tasks with the following metrics:
- Format Compliance: 85% of responses properly use reasoning tags
- Semantic Similarity: Average 0.72 cosine similarity to ground truth
- Reasoning Quality: Median 45 words per reasoning chain
- Response Coherence: Qualitatively assessed as clear and structured
Note: Formal clinical validation has not been performed.
Citation
If you use this model in your research, please cite:
@misc{octomed-7b-digital-twin-v1,
author = {Ahmed S. Soliman},
title = {OctoMed-7B Digital Twin v1: GRPO-Enhanced Medical Reasoning},
year = {2025},
publisher = {HuggingFace},
howpublished = {\url{https://huggingface.co/AhmedSSoliman/octomed-7b-digital-twin-v1}},
note = {Fine-tuned with Group Relative Policy Optimization for transparent clinical reasoning}
}
Also cite the base OctoMed model:
@misc{octomed2025,
title={OctoMed: Multimodal Medical AI},
author={OctoMed Team},
year={2025},
publisher={HuggingFace},
howpublished={\url{https://huggingface.co/OctoMed/OctoMed-7B}}
}
Acknowledgments
- Base Model: OctoMed-7B by the OctoMed Team
- Training Framework: Unsloth for efficient LoRA training
- Dataset: FreedomIntelligence for medical reasoning data
- RL Algorithm: TRL library's GRPO implementation
License
This model inherits the Apache 2.0 license from OctoMed-7B. Use responsibly and in compliance with medical AI regulations in your jurisdiction.
Model Card Contact
For questions or issues, please contact:
- GitHub: AhmedSSoliman
- HuggingFace: AhmedSSoliman
Developed: December 2025
Framework: Unsloth + TRL + Transformers
Training Method: GRPO (Group Relative Policy Optimization)
- Downloads last month
- 39