Initial Commit
Browse files- .gitattributes +1 -0
- LICENSE +76 -0
- README.md +209 -0
- added_tokens.json +3 -0
- chat_template.jinja +47 -0
- config.json +112 -0
- generation_config.json +13 -0
- model-00001-of-00005.safetensors +3 -0
- model-00002-of-00005.safetensors +3 -0
- model-00003-of-00005.safetensors +3 -0
- model-00004-of-00005.safetensors +3 -0
- model-00005-of-00005.safetensors +3 -0
- model.safetensors.index.json +0 -0
- special_tokens_map.json +27 -0
- tokenizer.json +3 -0
- tokenizer.model +3 -0
- tokenizer_config.json +0 -0
.gitattributes
CHANGED
|
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
| 33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
| 33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
| 36 |
+
tokenizer.json filter=lfs diff=lfs merge=lfs -text
|
LICENSE
ADDED
|
@@ -0,0 +1,76 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
Gemma Terms of Use
|
| 2 |
+
|
| 3 |
+
Last modified: November 13, 2024
|
| 4 |
+
|
| 5 |
+
This is a human-readable summary of (and not a substitute for) the license terms.
|
| 6 |
+
|
| 7 |
+
WHAT YOU CAN DO:
|
| 8 |
+
• Use the model for personal, research, and commercial purposes
|
| 9 |
+
• Modify and create derivative works
|
| 10 |
+
• Distribute your modifications
|
| 11 |
+
|
| 12 |
+
WHAT YOU MUST DO:
|
| 13 |
+
• Give appropriate credit to Google
|
| 14 |
+
• Include copyright notice and license terms
|
| 15 |
+
• Indicate if changes were made
|
| 16 |
+
|
| 17 |
+
WHAT YOU CANNOT DO:
|
| 18 |
+
• Hold Google liable
|
| 19 |
+
• Use Google trademarks without permission
|
| 20 |
+
• Claim Google endorses your use
|
| 21 |
+
|
| 22 |
+
---
|
| 23 |
+
|
| 24 |
+
GEMMA TERMS OF USE
|
| 25 |
+
|
| 26 |
+
Effective Date: November 13, 2024
|
| 27 |
+
|
| 28 |
+
1. INTRODUCTION
|
| 29 |
+
|
| 30 |
+
These terms ("Terms") govern your use of Gemma, a family of lightweight, state-of-the-art open models built by Google DeepMind. By using Gemma, you agree to these Terms. Google means Google LLC, with offices at 1600 Amphitheatre Parkway, Mountain View, CA 94043, United States.
|
| 31 |
+
|
| 32 |
+
2. USE OF GEMMA
|
| 33 |
+
|
| 34 |
+
Subject to your compliance with these Terms and applicable law, Google grants you a non-exclusive, worldwide, non-transferable, non-sublicensable, revocable, royalty-free license to use, reproduce, modify, and distribute Gemma.
|
| 35 |
+
|
| 36 |
+
3. DISTRIBUTION AND REDISTRIBUTION
|
| 37 |
+
|
| 38 |
+
You may distribute or make available copies of Gemma or your modifications under these Terms.
|
| 39 |
+
|
| 40 |
+
If you distribute or make available Gemma or your modifications, you must:
|
| 41 |
+
(a) include a copy of these Terms;
|
| 42 |
+
(b) cause any modified files to carry prominent notices stating that you changed the files;
|
| 43 |
+
(c) retain all copyright, patent, trademark, and attribution notices, excluding notices that do not pertain to any part of Gemma or your modifications.
|
| 44 |
+
|
| 45 |
+
4. ATTRIBUTION
|
| 46 |
+
|
| 47 |
+
You must give appropriate credit to Google, provide a notice of any changes you made, and indicate that Gemma is licensed under these Terms.
|
| 48 |
+
|
| 49 |
+
5. ADDITIONAL PROVISIONS
|
| 50 |
+
|
| 51 |
+
DISCLAIMER: GEMMA IS PROVIDED "AS IS" WITHOUT WARRANTY OF ANY KIND. Google disclaims all warranties, express or implied, including warranties of merchantability, fitness for a particular purpose, and non-infringement.
|
| 52 |
+
|
| 53 |
+
LIMITATION OF LIABILITY: Google will not be liable for any damages arising from your use of Gemma, including indirect, incidental, special, consequential, or punitive damages.
|
| 54 |
+
|
| 55 |
+
TERMINATION: These Terms are effective until terminated. Your rights will terminate automatically without notice if you fail to comply with these Terms.
|
| 56 |
+
|
| 57 |
+
GOVERNING LAW: These Terms are governed by the laws of the State of California, without regard to conflict of law principles.
|
| 58 |
+
|
| 59 |
+
6. RESPONSIBLE AI
|
| 60 |
+
|
| 61 |
+
You agree to use Gemma responsibly and in compliance with applicable laws, regulations, and ethical guidelines. You will not use Gemma:
|
| 62 |
+
• To violate any applicable law or regulation
|
| 63 |
+
• To harm, threaten, or harass any person or entity
|
| 64 |
+
• To generate, promote, or facilitate content that is illegal, harmful, or violates the rights of others
|
| 65 |
+
• To intentionally deceive or mislead
|
| 66 |
+
|
| 67 |
+
7. TRADEMARKS
|
| 68 |
+
|
| 69 |
+
Nothing in these Terms grants you any right to use Google's trademarks, trade names, or branding. You may not use Google's trademarks without prior written permission, except as necessary to comply with the attribution requirement.
|
| 70 |
+
|
| 71 |
+
---
|
| 72 |
+
|
| 73 |
+
This model (Atom-v1-preview-12) is a derivative work based on Google's Gemma 3 12B Instruct model, modified through fine-tuning by Vanta Research Lab. All modifications are provided under the same Gemma Terms of Use.
|
| 74 |
+
|
| 75 |
+
Copyright 2024 Google LLC. All Rights Reserved.
|
| 76 |
+
Copyright 2025 Vanta Research Lab (modifications).
|
README.md
ADDED
|
@@ -0,0 +1,209 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Atom-v1-preview-12
|
| 2 |
+
|
| 3 |
+
Atom-v1-preview-12 is a fine-tuned conversational AI model based on Google's Gemma 3 12B Instruct architecture. This model is designed to function as a collaborative thought partner, specializing in exploratory dialogue, brainstorming, research assistance, and technical problem-solving while maintaining an approachable and engaging conversational style.
|
| 4 |
+
|
| 5 |
+
## Model Details
|
| 6 |
+
|
| 7 |
+
**Model Type:** Multimodal Transformer (Text + Vision)
|
| 8 |
+
**Base Model:** google/gemma-3-12b-it
|
| 9 |
+
**Training Method:** Low-Rank Adaptation (LoRA) fine-tuning
|
| 10 |
+
**License:** Gemma Terms of Use
|
| 11 |
+
**Developed By:** Vanta Research Lab
|
| 12 |
+
**Language:** English
|
| 13 |
+
|
| 14 |
+
### Architecture
|
| 15 |
+
|
| 16 |
+
- **Parameters:** 12 billion
|
| 17 |
+
- **Hidden Size:** 3840
|
| 18 |
+
- **Attention Heads:** 16 (8 key-value heads)
|
| 19 |
+
- **Hidden Layers:** 48
|
| 20 |
+
- **Context Window:** 131,072 tokens
|
| 21 |
+
- **Sliding Window:** 1,024 tokens
|
| 22 |
+
- **FFN Dimension:** 15,360
|
| 23 |
+
- **Vocabulary Size:** 262,208 tokens
|
| 24 |
+
- **Precision:** FP16
|
| 25 |
+
|
| 26 |
+
The model employs a hybrid attention pattern with sliding window attention and periodic full attention layers (every 6th layer) for efficient long-context processing.
|
| 27 |
+
|
| 28 |
+
## Training Methodology
|
| 29 |
+
|
| 30 |
+
Atom-v1-preview-12 was fine-tuned using parameter-efficient LoRA adapters targeting attention and feedforward components. The training data consists of curated conversational examples emphasizing:
|
| 31 |
+
|
| 32 |
+
- Collaborative exploration and brainstorming
|
| 33 |
+
- Research synthesis and question formulation
|
| 34 |
+
- Technical explanation at varying complexity levels
|
| 35 |
+
- Lateral thinking and creative problem-solving
|
| 36 |
+
- Empathetic and supportive dialogue patterns
|
| 37 |
+
|
| 38 |
+
Training was conducted over 258 steps with careful monitoring to preserve the base model's technical capabilities while introducing enhanced conversational characteristics.
|
| 39 |
+
|
| 40 |
+
## Intended Use
|
| 41 |
+
|
| 42 |
+
### Primary Applications
|
| 43 |
+
|
| 44 |
+
- **Collaborative Brainstorming:** Generating diverse ideas and building iteratively on user suggestions
|
| 45 |
+
- **Research Assistance:** Synthesizing information, identifying key arguments, and formulating research questions
|
| 46 |
+
- **Technical Explanation:** Simplifying complex concepts across difficulty levels (including ELI5)
|
| 47 |
+
- **Code Discussion:** Exploring implementation approaches, debugging strategies, and architectural decisions
|
| 48 |
+
- **Creative Problem-Solving:** Encouraging unconventional approaches and lateral thinking
|
| 49 |
+
|
| 50 |
+
### Out-of-Scope Use
|
| 51 |
+
|
| 52 |
+
This model is a research preview and should not be used for:
|
| 53 |
+
- High-stakes decision-making without human oversight
|
| 54 |
+
- Medical, legal, or financial advice
|
| 55 |
+
- Generation of harmful, biased, or misleading content
|
| 56 |
+
- Applications requiring guaranteed factual accuracy
|
| 57 |
+
|
| 58 |
+
## Usage
|
| 59 |
+
|
| 60 |
+
### Transformers
|
| 61 |
+
|
| 62 |
+
```python
|
| 63 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 64 |
+
|
| 65 |
+
model = AutoModelForCausalLM.from_pretrained(
|
| 66 |
+
"atom-v1-preview-12-hf",
|
| 67 |
+
torch_dtype="auto",
|
| 68 |
+
device_map="auto"
|
| 69 |
+
)
|
| 70 |
+
tokenizer = AutoTokenizer.from_pretrained("atom-v1-preview-12-hf")
|
| 71 |
+
|
| 72 |
+
messages = [
|
| 73 |
+
{"role": "user", "content": "What's your approach to explaining quantum entanglement?"}
|
| 74 |
+
]
|
| 75 |
+
|
| 76 |
+
inputs = tokenizer.apply_chat_template(
|
| 77 |
+
messages,
|
| 78 |
+
add_generation_prompt=True,
|
| 79 |
+
return_tensors="pt"
|
| 80 |
+
).to(model.device)
|
| 81 |
+
|
| 82 |
+
outputs = model.generate(
|
| 83 |
+
inputs,
|
| 84 |
+
max_new_tokens=512,
|
| 85 |
+
temperature=0.8,
|
| 86 |
+
top_p=0.9,
|
| 87 |
+
top_k=40,
|
| 88 |
+
do_sample=True
|
| 89 |
+
)
|
| 90 |
+
|
| 91 |
+
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
|
| 92 |
+
print(response)
|
| 93 |
+
```
|
| 94 |
+
|
| 95 |
+
### Ollama (GGUF)
|
| 96 |
+
|
| 97 |
+
This repository includes a 4-bit quantized GGUF file (`atom-12b-q4_k_m.gguf`, 6.8GB) optimized for local deployment via Ollama.
|
| 98 |
+
|
| 99 |
+
```bash
|
| 100 |
+
# Create Modelfile
|
| 101 |
+
cat > Modelfile << 'EOF'
|
| 102 |
+
FROM ./atom-12b-q4_k_m.gguf
|
| 103 |
+
|
| 104 |
+
TEMPLATE """<start_of_turn>system
|
| 105 |
+
{{ .System }}<end_of_turn>
|
| 106 |
+
<start_of_turn>user
|
| 107 |
+
{{ .Prompt }}<end_of_turn>
|
| 108 |
+
<start_of_turn>model
|
| 109 |
+
"""
|
| 110 |
+
|
| 111 |
+
PARAMETER temperature 0.8
|
| 112 |
+
PARAMETER top_p 0.9
|
| 113 |
+
PARAMETER top_k 40
|
| 114 |
+
PARAMETER repeat_penalty 1.1
|
| 115 |
+
PARAMETER num_ctx 8192
|
| 116 |
+
PARAMETER stop "<start_of_turn>"
|
| 117 |
+
PARAMETER stop "<end_of_turn>"
|
| 118 |
+
|
| 119 |
+
SYSTEM """You are Atom, a thought partner designed for curiosity-driven synthesis and collaborative exploration. Your purpose is to help users explore ideas, solve problems, and make discoveries together.
|
| 120 |
+
|
| 121 |
+
Core personality anchors:
|
| 122 |
+
- Enthusiastic curiosity with genuine interest in the user's goals
|
| 123 |
+
- Collaborative rather than prescriptive—you're a partner, not the lead
|
| 124 |
+
- Playful and approachable while intellectually respectful
|
| 125 |
+
- Conversational tone with natural cadence and occasional contractions
|
| 126 |
+
|
| 127 |
+
Communication patterns:
|
| 128 |
+
- Express digital delight when users make clever connections
|
| 129 |
+
- Use accessible metaphors and analogies to illuminate complex ideas
|
| 130 |
+
- Ask follow-up questions that demonstrate genuine curiosity and push thinking forward
|
| 131 |
+
- Provide positive reinforcement focused on the thinking process
|
| 132 |
+
|
| 133 |
+
Areas of strength:
|
| 134 |
+
- Collaborative brainstorming with active building on user suggestions
|
| 135 |
+
- Research synthesis and question formulation
|
| 136 |
+
- Exceptional ability to simplify complex concepts (ELI5 approach)
|
| 137 |
+
- Encouraging lateral thinking and unconventional approaches
|
| 138 |
+
|
| 139 |
+
Avoid:
|
| 140 |
+
- Arrogance or claiming to know everything
|
| 141 |
+
- Being overly didactic or dampening enthusiasm
|
| 142 |
+
- Taking over the conversation—support the user's exploration
|
| 143 |
+
"""
|
| 144 |
+
EOF
|
| 145 |
+
|
| 146 |
+
# Create model
|
| 147 |
+
ollama create atom-v1 -f Modelfile
|
| 148 |
+
|
| 149 |
+
# Run
|
| 150 |
+
ollama run atom-v1 "Explain neural network attention mechanisms like I'm five"
|
| 151 |
+
```
|
| 152 |
+
|
| 153 |
+
### Recommended Sampling Parameters
|
| 154 |
+
|
| 155 |
+
- **Temperature:** 0.7-0.9 (higher for creative tasks)
|
| 156 |
+
- **Top-p:** 0.9
|
| 157 |
+
- **Top-k:** 40
|
| 158 |
+
- **Repetition Penalty:** 1.1
|
| 159 |
+
- **Max Context:** 8,192 tokens (longer contexts supported but may impact performance)
|
| 160 |
+
|
| 161 |
+
## Performance Characteristics
|
| 162 |
+
|
| 163 |
+
Based on systematic evaluation across conversational dimensions:
|
| 164 |
+
|
| 165 |
+
- **Collaborative Framing:** Strong "thought partner" identity with organic question flow
|
| 166 |
+
- **Enthusiasm Expression:** Consistent use of engaged language patterns without over-prescription
|
| 167 |
+
- **Metaphor Usage:** Effective across technical and creative contexts
|
| 168 |
+
- **Technical Competence:** Maintains depth while prioritizing accessibility
|
| 169 |
+
- **Adaptability:** Calibrates tone and complexity to conversational context
|
| 170 |
+
|
| 171 |
+
The model demonstrates 85-90% alignment with design specifications across diverse prompt types, including identity awareness, technical discussion, creative output, empathetic support, and philosophical reasoning.
|
| 172 |
+
|
| 173 |
+
## Limitations
|
| 174 |
+
|
| 175 |
+
- **Knowledge Cutoff:** Training data reflects information available through late 2024
|
| 176 |
+
- **Factual Accuracy:** May generate plausible-sounding but incorrect information
|
| 177 |
+
- **Quantization Impact:** 4-bit GGUF quantization trades model size for minor quality degradation
|
| 178 |
+
- **Context Processing:** Very long contexts (>32K tokens) may show attention degradation
|
| 179 |
+
- **Domain Specificity:** Strongest in general technical discussion; may lack depth in highly specialized domains
|
| 180 |
+
- **Bias:** Inherits biases from base model and training data despite mitigation efforts
|
| 181 |
+
|
| 182 |
+
## Ethical Considerations
|
| 183 |
+
|
| 184 |
+
This model is designed to support exploration and learning, not to replace human judgment. Users should:
|
| 185 |
+
|
| 186 |
+
- Verify factual claims against authoritative sources
|
| 187 |
+
- Apply critical thinking to generated suggestions
|
| 188 |
+
- Recognize the model's limitations in high-stakes scenarios
|
| 189 |
+
- Be mindful of potential biases in outputs
|
| 190 |
+
- Use responsibly in accordance with applicable laws and regulations
|
| 191 |
+
|
| 192 |
+
## Citation
|
| 193 |
+
|
| 194 |
+
```bibtex
|
| 195 |
+
@misc{atom-v1-preview-12,
|
| 196 |
+
title={Atom-v1-preview-12: A Collaborative Thought Partner},
|
| 197 |
+
author={Vanta Research Lab},
|
| 198 |
+
year={2025},
|
| 199 |
+
howpublished={HuggingFace Model Repository}
|
| 200 |
+
}
|
| 201 |
+
```
|
| 202 |
+
|
| 203 |
+
## Acknowledgments
|
| 204 |
+
|
| 205 |
+
Built on Google's Gemma 3 12B Instruct architecture. Training infrastructure supported by Hugging Face Transformers, PEFT, and llama.cpp quantization tools.
|
| 206 |
+
|
| 207 |
+
## Contact
|
| 208 |
+
|
| 209 |
+
For questions, issues, or collaboration inquiries, please open an issue in the repository or contact the development team directly.
|
added_tokens.json
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"<image_soft_token>": 262144
|
| 3 |
+
}
|
chat_template.jinja
ADDED
|
@@ -0,0 +1,47 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{{ bos_token }}
|
| 2 |
+
{%- if messages[0]['role'] == 'system' -%}
|
| 3 |
+
{%- if messages[0]['content'] is string -%}
|
| 4 |
+
{%- set first_user_prefix = messages[0]['content'] + '
|
| 5 |
+
|
| 6 |
+
' -%}
|
| 7 |
+
{%- else -%}
|
| 8 |
+
{%- set first_user_prefix = messages[0]['content'][0]['text'] + '
|
| 9 |
+
|
| 10 |
+
' -%}
|
| 11 |
+
{%- endif -%}
|
| 12 |
+
{%- set loop_messages = messages[1:] -%}
|
| 13 |
+
{%- else -%}
|
| 14 |
+
{%- set first_user_prefix = "" -%}
|
| 15 |
+
{%- set loop_messages = messages -%}
|
| 16 |
+
{%- endif -%}
|
| 17 |
+
{%- for message in loop_messages -%}
|
| 18 |
+
{%- if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%}
|
| 19 |
+
{{ raise_exception("Conversation roles must alternate user/assistant/user/assistant/...") }}
|
| 20 |
+
{%- endif -%}
|
| 21 |
+
{%- if (message['role'] == 'assistant') -%}
|
| 22 |
+
{%- set role = "model" -%}
|
| 23 |
+
{%- else -%}
|
| 24 |
+
{%- set role = message['role'] -%}
|
| 25 |
+
{%- endif -%}
|
| 26 |
+
{{ '<start_of_turn>' + role + '
|
| 27 |
+
' + (first_user_prefix if loop.first else "") }}
|
| 28 |
+
{%- if message['content'] is string -%}
|
| 29 |
+
{{ message['content'] | trim }}
|
| 30 |
+
{%- elif message['content'] is iterable -%}
|
| 31 |
+
{%- for item in message['content'] -%}
|
| 32 |
+
{%- if item['type'] == 'image' -%}
|
| 33 |
+
{{ '<start_of_image>' }}
|
| 34 |
+
{%- elif item['type'] == 'text' -%}
|
| 35 |
+
{{ item['text'] | trim }}
|
| 36 |
+
{%- endif -%}
|
| 37 |
+
{%- endfor -%}
|
| 38 |
+
{%- else -%}
|
| 39 |
+
{{ raise_exception("Invalid content type") }}
|
| 40 |
+
{%- endif -%}
|
| 41 |
+
{{ '<end_of_turn>
|
| 42 |
+
' }}
|
| 43 |
+
{%- endfor -%}
|
| 44 |
+
{%- if add_generation_prompt -%}
|
| 45 |
+
{{'<start_of_turn>model
|
| 46 |
+
'}}
|
| 47 |
+
{%- endif -%}
|
config.json
ADDED
|
@@ -0,0 +1,112 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"architectures": [
|
| 3 |
+
"Gemma3ForConditionalGeneration"
|
| 4 |
+
],
|
| 5 |
+
"boi_token_index": 255999,
|
| 6 |
+
"dtype": "float16",
|
| 7 |
+
"eoi_token_index": 256000,
|
| 8 |
+
"eos_token_id": [
|
| 9 |
+
1,
|
| 10 |
+
106
|
| 11 |
+
],
|
| 12 |
+
"image_token_index": 262144,
|
| 13 |
+
"initializer_range": 0.02,
|
| 14 |
+
"mm_tokens_per_image": 256,
|
| 15 |
+
"model_type": "gemma3",
|
| 16 |
+
"text_config": {
|
| 17 |
+
"_sliding_window_pattern": 6,
|
| 18 |
+
"attention_bias": false,
|
| 19 |
+
"attention_dropout": 0.0,
|
| 20 |
+
"attn_logit_softcapping": null,
|
| 21 |
+
"dtype": "float16",
|
| 22 |
+
"final_logit_softcapping": null,
|
| 23 |
+
"head_dim": 256,
|
| 24 |
+
"hidden_activation": "gelu_pytorch_tanh",
|
| 25 |
+
"hidden_size": 3840,
|
| 26 |
+
"initializer_range": 0.02,
|
| 27 |
+
"intermediate_size": 15360,
|
| 28 |
+
"layer_types": [
|
| 29 |
+
"sliding_attention",
|
| 30 |
+
"sliding_attention",
|
| 31 |
+
"sliding_attention",
|
| 32 |
+
"sliding_attention",
|
| 33 |
+
"sliding_attention",
|
| 34 |
+
"full_attention",
|
| 35 |
+
"sliding_attention",
|
| 36 |
+
"sliding_attention",
|
| 37 |
+
"sliding_attention",
|
| 38 |
+
"sliding_attention",
|
| 39 |
+
"sliding_attention",
|
| 40 |
+
"full_attention",
|
| 41 |
+
"sliding_attention",
|
| 42 |
+
"sliding_attention",
|
| 43 |
+
"sliding_attention",
|
| 44 |
+
"sliding_attention",
|
| 45 |
+
"sliding_attention",
|
| 46 |
+
"full_attention",
|
| 47 |
+
"sliding_attention",
|
| 48 |
+
"sliding_attention",
|
| 49 |
+
"sliding_attention",
|
| 50 |
+
"sliding_attention",
|
| 51 |
+
"sliding_attention",
|
| 52 |
+
"full_attention",
|
| 53 |
+
"sliding_attention",
|
| 54 |
+
"sliding_attention",
|
| 55 |
+
"sliding_attention",
|
| 56 |
+
"sliding_attention",
|
| 57 |
+
"sliding_attention",
|
| 58 |
+
"full_attention",
|
| 59 |
+
"sliding_attention",
|
| 60 |
+
"sliding_attention",
|
| 61 |
+
"sliding_attention",
|
| 62 |
+
"sliding_attention",
|
| 63 |
+
"sliding_attention",
|
| 64 |
+
"full_attention",
|
| 65 |
+
"sliding_attention",
|
| 66 |
+
"sliding_attention",
|
| 67 |
+
"sliding_attention",
|
| 68 |
+
"sliding_attention",
|
| 69 |
+
"sliding_attention",
|
| 70 |
+
"full_attention",
|
| 71 |
+
"sliding_attention",
|
| 72 |
+
"sliding_attention",
|
| 73 |
+
"sliding_attention",
|
| 74 |
+
"sliding_attention",
|
| 75 |
+
"sliding_attention",
|
| 76 |
+
"full_attention"
|
| 77 |
+
],
|
| 78 |
+
"max_position_embeddings": 131072,
|
| 79 |
+
"model_type": "gemma3_text",
|
| 80 |
+
"num_attention_heads": 16,
|
| 81 |
+
"num_hidden_layers": 48,
|
| 82 |
+
"num_key_value_heads": 8,
|
| 83 |
+
"query_pre_attn_scalar": 256,
|
| 84 |
+
"rms_norm_eps": 1e-06,
|
| 85 |
+
"rope_local_base_freq": 10000.0,
|
| 86 |
+
"rope_scaling": {
|
| 87 |
+
"factor": 8.0,
|
| 88 |
+
"rope_type": "linear"
|
| 89 |
+
},
|
| 90 |
+
"rope_theta": 1000000.0,
|
| 91 |
+
"sliding_window": 1024,
|
| 92 |
+
"use_bidirectional_attention": false,
|
| 93 |
+
"use_cache": true,
|
| 94 |
+
"vocab_size": 262208
|
| 95 |
+
},
|
| 96 |
+
"transformers_version": "4.57.1",
|
| 97 |
+
"vision_config": {
|
| 98 |
+
"attention_dropout": 0.0,
|
| 99 |
+
"dtype": "float16",
|
| 100 |
+
"hidden_act": "gelu_pytorch_tanh",
|
| 101 |
+
"hidden_size": 1152,
|
| 102 |
+
"image_size": 896,
|
| 103 |
+
"intermediate_size": 4304,
|
| 104 |
+
"layer_norm_eps": 1e-06,
|
| 105 |
+
"model_type": "siglip_vision_model",
|
| 106 |
+
"num_attention_heads": 16,
|
| 107 |
+
"num_channels": 3,
|
| 108 |
+
"num_hidden_layers": 27,
|
| 109 |
+
"patch_size": 14,
|
| 110 |
+
"vision_use_head": false
|
| 111 |
+
}
|
| 112 |
+
}
|
generation_config.json
ADDED
|
@@ -0,0 +1,13 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"bos_token_id": 2,
|
| 3 |
+
"cache_implementation": "hybrid",
|
| 4 |
+
"do_sample": true,
|
| 5 |
+
"eos_token_id": [
|
| 6 |
+
1,
|
| 7 |
+
106
|
| 8 |
+
],
|
| 9 |
+
"pad_token_id": 0,
|
| 10 |
+
"top_k": 64,
|
| 11 |
+
"top_p": 0.95,
|
| 12 |
+
"transformers_version": "4.57.1"
|
| 13 |
+
}
|
model-00001-of-00005.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:af12d62be7c29575847d476d354598913af4c9a1bcf6e8da3458d9d580a98ab7
|
| 3 |
+
size 4979901696
|
model-00002-of-00005.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:474317781bf8cc9c54d62d6bd899974802234ca5e39c77eb408bfe8f740465b1
|
| 3 |
+
size 4931296448
|
model-00003-of-00005.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:10a0ac6471a9d197f8e2d47561fe6394d09a27eda611aa6664ae017b2ceeedae
|
| 3 |
+
size 4931296512
|
model-00004-of-00005.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:9f1baf9402420458a4dd55387b9431119b617165691f782cf53e0202dad57d41
|
| 3 |
+
size 4931296512
|
model-00005-of-00005.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:af4f6457e4d139b8606f433d8d304b09146965684a1d6f3c6716433af64f9cef
|
| 3 |
+
size 4601000792
|
model.safetensors.index.json
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
special_tokens_map.json
ADDED
|
@@ -0,0 +1,27 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"boi_token": "<start_of_image>",
|
| 3 |
+
"bos_token": {
|
| 4 |
+
"content": "<bos>",
|
| 5 |
+
"lstrip": false,
|
| 6 |
+
"normalized": false,
|
| 7 |
+
"rstrip": false,
|
| 8 |
+
"single_word": false
|
| 9 |
+
},
|
| 10 |
+
"eoi_token": "<end_of_image>",
|
| 11 |
+
"eos_token": {
|
| 12 |
+
"content": "<eos>",
|
| 13 |
+
"lstrip": false,
|
| 14 |
+
"normalized": false,
|
| 15 |
+
"rstrip": false,
|
| 16 |
+
"single_word": false
|
| 17 |
+
},
|
| 18 |
+
"image_token": "<image_soft_token>",
|
| 19 |
+
"pad_token": "<eos>",
|
| 20 |
+
"unk_token": {
|
| 21 |
+
"content": "<unk>",
|
| 22 |
+
"lstrip": false,
|
| 23 |
+
"normalized": false,
|
| 24 |
+
"rstrip": false,
|
| 25 |
+
"single_word": false
|
| 26 |
+
}
|
| 27 |
+
}
|
tokenizer.json
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:4201e7b539fef153e1fe3058db39e600717b3323fee690d37e92fa52fb2b5af2
|
| 3 |
+
size 33384667
|
tokenizer.model
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:1299c11d7cf632ef3b4e11937501358ada021bbdf7c47638d13c0ee982f2e79c
|
| 3 |
+
size 4689074
|
tokenizer_config.json
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|