kye commited on
Commit
c596ea6
·
1 Parent(s): 69031b6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +297 -0
README.md CHANGED
@@ -1,3 +1,300 @@
1
  ---
2
  license: mit
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
  ---
4
+
5
+ GigaBind
6
+
7
+ A finetuned ImageBind using Lora for images, audio, and many many other modalitiesi
8
+
9
+
10
+
11
+
12
+
13
+
14
+ ## Usage
15
+ ```python
16
+ import logging
17
+ import torch
18
+ import data
19
+
20
+ from models import imagebind_model
21
+ from models.imagebind_model import ModalityType, load_module
22
+ from models import lora as LoRA
23
+
24
+ logging.basicConfig(level=logging.INFO, force=True)
25
+
26
+
27
+ lora = True
28
+ linear_probing = False
29
+ device = "cpu" # "cuda:0" if torch.cuda.is_available() else "cpu"
30
+ load_head_post_proc_finetuned = True
31
+
32
+ assert not (linear_probing and lora), \
33
+ "Linear probing is a subset of LoRA training procedure for ImageBind. " \
34
+ "Cannot set both linear_probing=True and lora=True. "
35
+
36
+ if lora and not load_head_post_proc_finetuned:
37
+ # Hack: adjust lora_factor to the `max batch size used during training / temperature` to compensate missing norm
38
+ lora_factor = 12 / 0.07
39
+ else:
40
+ # This assumes proper loading of all params but results in shift from original dist in case of LoRA
41
+ lora_factor = 1
42
+
43
+ text_list=["bird",
44
+ "car",
45
+ "dog3",
46
+ "dog5",
47
+ "dog8",
48
+ "grey_sloth_plushie"]
49
+ image_paths=[".assets/bird_image.jpg",
50
+ ".assets/car_image.jpg",
51
+ ".assets/dog3.jpg",
52
+ ".assets/dog5.jpg",
53
+ ".assets/dog8.jpg",
54
+ ".assets/grey_sloth_plushie.jpg"]
55
+ audio_paths=[".assets/bird_audio.wav",
56
+ ".assets/car_audio.wav",
57
+ ".assets/dog_audio.wav"]
58
+
59
+ # Instantiate model
60
+ model = imagebind_model.imagebind_huge(pretrained=True)
61
+ if lora:
62
+ model.modality_trunks.update(
63
+ LoRA.apply_lora_modality_trunks(model.modality_trunks, rank=4,
64
+ layer_idxs={ModalityType.TEXT: [0, 1, 2, 3, 4, 5, 6, 7, 8],
65
+ ModalityType.VISION: [0, 1, 2, 3, 4, 5, 6, 7, 8]},
66
+ modality_names=[ModalityType.TEXT, ModalityType.VISION]))
67
+
68
+ # Load LoRA params if found
69
+ LoRA.load_lora_modality_trunks(model.modality_trunks,
70
+ checkpoint_dir=".checkpoints/lora/550_epochs_lora", postfix="_dreambooth_last")
71
+
72
+ if load_head_post_proc_finetuned:
73
+ # Load postprocessors & heads
74
+ load_module(model.modality_postprocessors, module_name="postprocessors",
75
+ checkpoint_dir=".checkpoints/lora/550_epochs_lora", postfix="_dreambooth_last")
76
+ load_module(model.modality_heads, module_name="heads",
77
+ checkpoint_dir=".checkpoints/lora/550_epochs_lora", postfix="_dreambooth_last")
78
+ elif linear_probing:
79
+ # Load heads
80
+ load_module(model.modality_heads, module_name="heads",
81
+ checkpoint_dir="./.checkpoints/lora/500_epochs_lp", postfix="_dreambooth_last")
82
+
83
+ model.eval()
84
+ model.to(device)
85
+
86
+ # Load data
87
+ inputs = {
88
+ ModalityType.TEXT: data.load_and_transform_text(text_list, device),
89
+ ModalityType.VISION: data.load_and_transform_vision_data(image_paths, device, to_tensor=True),
90
+ ModalityType.AUDIO: data.load_and_transform_audio_data(audio_paths, device),
91
+ }
92
+
93
+ with torch.no_grad():
94
+ embeddings = model(inputs)
95
+
96
+ print(
97
+ "Vision x Text: ",
98
+ torch.softmax(embeddings[ModalityType.VISION] @ embeddings[ModalityType.TEXT].T * (lora_factor if lora else 1), dim=-1),
99
+ )
100
+ print(
101
+ "Audio x Text: ",
102
+ torch.softmax(embeddings[ModalityType.AUDIO] @ embeddings[ModalityType.TEXT].T * (lora_factor if lora else 1), dim=-1),
103
+ )
104
+ print(
105
+ "Vision x Audio: ",
106
+ torch.softmax(embeddings[ModalityType.VISION] @ embeddings[ModalityType.AUDIO].T, dim=-1),
107
+ )
108
+ ```
109
+
110
+
111
+ ## Model Details
112
+
113
+ ### Model Description
114
+
115
+ <!-- Provide a longer summary of what this model is. -->
116
+
117
+
118
+
119
+ - **Developed by:** [More Information Needed]
120
+ - **Funded by [optional]:** [More Information Needed]
121
+ - **Shared by [optional]:** [More Information Needed]
122
+ - **Model type:** [More Information Needed]
123
+ - **Language(s) (NLP):** [More Information Needed]
124
+ - **License:** [More Information Needed]
125
+ - **Finetuned from model [optional]:** [More Information Needed]
126
+
127
+ ### Model Sources [optional]
128
+
129
+ <!-- Provide the basic links for the model. -->
130
+
131
+ - **Repository:** [More Information Needed]
132
+ - **Paper [optional]:** [More Information Needed]
133
+ - **Demo [optional]:** [More Information Needed]
134
+
135
+ ## Uses
136
+
137
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
138
+
139
+ ### Direct Use
140
+
141
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
142
+
143
+ [More Information Needed]
144
+
145
+ ### Downstream Use [optional]
146
+
147
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
148
+
149
+ [More Information Needed]
150
+
151
+ ### Out-of-Scope Use
152
+
153
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
154
+
155
+ [More Information Needed]
156
+
157
+ ## Bias, Risks, and Limitations
158
+
159
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
160
+
161
+ [More Information Needed]
162
+
163
+ ### Recommendations
164
+
165
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
166
+
167
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
168
+
169
+ ## How to Get Started with the Model
170
+
171
+ Use the code below to get started with the model.
172
+
173
+ [More Information Needed]
174
+
175
+ ## Training Details
176
+
177
+ ### Training Data
178
+
179
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
180
+
181
+ [More Information Needed]
182
+
183
+ ### Training Procedure
184
+
185
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
186
+
187
+ #### Preprocessing [optional]
188
+
189
+ [More Information Needed]
190
+
191
+
192
+ #### Training Hyperparameters
193
+
194
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
195
+
196
+ #### Speeds, Sizes, Times [optional]
197
+
198
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
199
+
200
+ [More Information Needed]
201
+
202
+ ## Evaluation
203
+
204
+ <!-- This section describes the evaluation protocols and provides the results. -->
205
+
206
+ ### Testing Data, Factors & Metrics
207
+
208
+ #### Testing Data
209
+
210
+ <!-- This should link to a Dataset Card if possible. -->
211
+
212
+ [More Information Needed]
213
+
214
+ #### Factors
215
+
216
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
217
+
218
+ [More Information Needed]
219
+
220
+ #### Metrics
221
+
222
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
223
+
224
+ [More Information Needed]
225
+
226
+ ### Results
227
+
228
+ [More Information Needed]
229
+
230
+ #### Summary
231
+
232
+
233
+
234
+ ## Model Examination [optional]
235
+
236
+ <!-- Relevant interpretability work for the model goes here -->
237
+
238
+ [More Information Needed]
239
+
240
+ ## Environmental Impact
241
+
242
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
243
+
244
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
245
+
246
+ - **Hardware Type:** [More Information Needed]
247
+ - **Hours used:** [More Information Needed]
248
+ - **Cloud Provider:** [More Information Needed]
249
+ - **Compute Region:** [More Information Needed]
250
+ - **Carbon Emitted:** [More Information Needed]
251
+
252
+ ## Technical Specifications [optional]
253
+
254
+ ### Model Architecture and Objective
255
+
256
+ [More Information Needed]
257
+
258
+ ### Compute Infrastructure
259
+
260
+ [More Information Needed]
261
+
262
+ #### Hardware
263
+
264
+ [More Information Needed]
265
+
266
+ #### Software
267
+
268
+ [More Information Needed]
269
+
270
+ ## Citation [optional]
271
+
272
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
273
+
274
+ **BibTeX:**
275
+
276
+ [More Information Needed]
277
+
278
+ **APA:**
279
+
280
+ [More Information Needed]
281
+
282
+ ## Glossary [optional]
283
+
284
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
285
+
286
+ [More Information Needed]
287
+
288
+ ## More Information [optional]
289
+
290
+ [More Information Needed]
291
+
292
+ ## Model Card Authors [optional]
293
+
294
+ [More Information Needed]
295
+
296
+ ## Model Card Contact
297
+
298
+ [More Information Needed]
299
+
300
+