Transformers

You are viewing v5.0.0rc0 version. A newer version v5.0.0rc1 is available.

Join the Hugging Face community

and get access to the augmented documentation experience

Collaborate on models, datasets and Spaces

Faster examples with accelerated inference

Switch between documentation themes

to get started

Efficient Inference on a Multiple GPUs

この文書には、複数のGPUで効率的に推論を行う方法に関する情報が含まれています。

注意: 複数のGPUセットアップは、単一のGPUセクションで説明されているほとんどの戦略を使用できます。ただし、より良い使用法のために使用できる簡単なテクニックについても認識しておく必要があります。

Flash Attention 2の統合は、複数のGPUセットアップでも機能します。詳細については、単一のGPUセクションの適切なセクションをご覧ください。