T5 |
2019/10 |
T5 & Flan-T5, Flan-T5-xxl (HF) |
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer |
0.06 – 11 |
512 |
Apache 2.0 |
T5-Large |
UL2 |
2022/10 |
UL2 & Flan-UL2, Flan-UL2 (HF) |
UL2 20B: An Open Source Unified Language Learner |
20 |
512, 2048 |
Apache 2.0 |
|
Cerebras-GPT |
2023/03 |
Cerebras-GPT |
Cerebras-GPT: A Family of Open, Compute-efficient, Large Language Models (Paper) |
0.111 – 13 |
2048 |
Apache 2.0 |
Cerebras-GPT-1.3B |
Open Assistant (Pythia family) |
2023/03 |
OA-Pythia-12B-SFT-8, OA-Pythia-12B-SFT-4, OA-Pythia-12B-SFT-1 |
Democratizing Large Language Model Alignment |
12 |
2048 |
Apache 2.0 |
Pythia-2.8B |
Pythia |
2023/04 |
pythia 70M – 12B |
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling |
0.07 – 12 |
2048 |
Apache 2.0 |
|
Dolly |
2023/04 |
dolly-v2-12b |
Free Dolly: Introducing the World’s First Truly Open Instruction-Tuned LLM |
3, 7, 12 |
2048 |
MIT |
|
DLite |
2023/05 |
dlite-v2-1_5b |
Announcing DLite V2: Lightweight, Open LLMs That Can Run Anywhere |
0.124 – 1.5 |
1024 |
Apache 2.0 |
DLite-v2-1.5B |
RWKV |
2021/08 |
RWKV, ChatRWKV |
The RWKV Language Model (and my LM tricks) |
0.1 – 14 |
infinity (RNN) |
Apache 2.0 |
|
GPT-J-6B |
2023/06 |
GPT-J-6B, GPT4All-J |
GPT-J-6B: 6B JAX-Based Transformer |
6 |
2048 |
Apache 2.0 |
|
GPT-NeoX-20B |
2022/04 |
GPT-NEOX-20B |
GPT-NeoX-20B: An Open-Source Autoregressive Language Model |
20 |
2048 |
Apache 2.0 |
|
Bloom |
2022/11 |
Bloom |
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model |
176 |
2048 |
OpenRAIL-M v1 |
|
StableLM-Alpha |
2023/04 |
StableLM-Alpha |
Stability AI Launches the First of its StableLM Suite of Language Models |
3 – 65 |
4096 |
CC BY-SA-4.0 |
|
FastChat-T5 |
2023/04 |
fastchat-t5-3b-v1.0 |
We are excited to release FastChat-T5: our compact and commercial-friendly chatbot! |
3 |
512 |
Apache 2.0 |
|
h2oGPT |
2023/05 |
h2oGPT |
Building the World’s Best Open-Source Large Language Model: H2O.ai’s Journey |
12 – 20 |
256 – 2048 |
Apache 2.0 |
|
MPT-7B |
2023/05 |
MPT-7B, MPT-7B-Instruct |
Introducing MPT-7B: A New Standard for Open-Source, Commercially Usable LLMs |
7 |
84k (ALiBi) |
Apache 2.0, CC BY-SA-3.0 |
|
RedPajama-INCITE |
2023/05 |
RedPajama-INCITE |
Releasing 3B and 7B RedPajama-INCITE family of models including base, instruction-tuned & chat models |
3 – 7 |
2048 |
Apache 2.0 |
RedPajama-INCITE-Instruct-3B-v1 |
OpenLLaMA |
2023/05 |
open_llama_3b, open_llama_7b, open_llama_13b |
OpenLLaMA: An Open Reproduction of LLaMA |
3, 7 |
2048 |
Apache 2.0 |
OpenLLaMA-7B-Preview_200bt |
Falcon |
2023/05 |
Falcon-180B, Falcon-40B, Falcon-7B |
The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only |
180, 40, 7 |
2048 |
Apache 2.0 |
MPT-30B |
2023/06 |
MPT-30B, MPT-30B-instruct |
MPT-30B: Raising the bar for open-source foundation models |
30 |
8192 |
Apache 2.0, CC BY-SA-3.0 |
MPT 30B inference code using CPU |
LLaMA 2 |
2023/06 |
LLaMA 2 Weights |
Llama 2: Open Foundation and Fine-Tuned Chat Models |
7 – 70 |
4096 |
Custom Free if you have under 700M users and you cannot use LLaMA outputs to train other LLMs besides LLaMA and its derivatives |
HuggingChat |
OpenLM |
2023/09 |
OpenLM 1B, OpenLM 7B |
Open LM: a minimal but performative language modeling (LM) repository |
1, 7 |
2048 |
MIT |
|
Mistral 7B |
2023/09 |
Mistral-7B-v0.1, Mistral-7B-Instruct-v0.1 |
Mistral 7B |
7 |
4096-16K with Sliding Windows |
Apache 2.0 |
Mistral Transformer |
OpenHermes |
2023/09 |
OpenHermes-7B, OpenHermes-13B |
Nous Research |
7, 13 |
4096 |
MIT |
OpenHermes-V2 Finetuned on Mistral 7B |
SOLAR |
2023/12 |
Solar-10.7B |
Upstage |
10.7 |
4096 |
apache-2.0 |
|
phi-2 |
2023/12 |
phi-2 2.7B |
Microsoft |
2.7 |
2048 |
MIT |
|
OLMo |
2024/02 |
OLMo 1B, OLMo 7B, OLMo 7B Twin 2T |
AI2 |
1,7 |
2048 |
Apache 2.0 |
|
Gemma |
2024/02 |
Gemma 7B, Gemma 7B it, Gemma 2B, Gemma 2B it |
Technical report |
2-7 |
8192 |
Gemma Terms of Use |
|
Zephyr |
2023/11 |
Zephyr 7B |
Website |
7 |
8192 |
Apache 2.0 |
|