T5 | 2019/10 | T5 & Flan-T5, Flan-T5-xxl (HF) | Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer | 0.06 – 11 | 512 | Apache 2.0 | T5-Large |
UL2 | 2022/10 | UL2 & Flan-UL2, Flan-UL2 (HF) | UL2 20B: An Open Source Unified Language Learner | 20 | 512, 2048 | Apache 2.0 | |
Cerebras-GPT | 2023/03 | Cerebras-GPT | Cerebras-GPT: A Family of Open, Compute-efficient, Large Language Models (Paper) | 0.111 – 13 | 2048 | Apache 2.0 | Cerebras-GPT-1.3B |
Open Assistant (Pythia family) | 2023/03 | OA-Pythia-12B-SFT-8, OA-Pythia-12B-SFT-4, OA-Pythia-12B-SFT-1 | Democratizing Large Language Model Alignment | 12 | 2048 | Apache 2.0 | Pythia-2.8B |
Pythia | 2023/04 | pythia 70M – 12B | Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling | 0.07 – 12 | 2048 | Apache 2.0 | |
Dolly | 2023/04 | dolly-v2-12b | Free Dolly: Introducing the World’s First Truly Open Instruction-Tuned LLM | 3, 7, 12 | 2048 | MIT | |
DLite | 2023/05 | dlite-v2-1_5b | Announcing DLite V2: Lightweight, Open LLMs That Can Run Anywhere | 0.124 – 1.5 | 1024 | Apache 2.0 | DLite-v2-1.5B |
RWKV | 2021/08 | RWKV, ChatRWKV | The RWKV Language Model (and my LM tricks) | 0.1 – 14 | infinity (RNN) | Apache 2.0 | |
GPT-J-6B | 2023/06 | GPT-J-6B, GPT4All-J | GPT-J-6B: 6B JAX-Based Transformer | 6 | 2048 | Apache 2.0 | |
GPT-NeoX-20B | 2022/04 | GPT-NEOX-20B | GPT-NeoX-20B: An Open-Source Autoregressive Language Model | 20 | 2048 | Apache 2.0 | |
Bloom | 2022/11 | Bloom | BLOOM: A 176B-Parameter Open-Access Multilingual Language Model | 176 | 2048 | OpenRAIL-M v1 | |
StableLM-Alpha | 2023/04 | StableLM-Alpha | Stability AI Launches the First of its StableLM Suite of Language Models | 3 – 65 | 4096 | CC BY-SA-4.0 | |
FastChat-T5 | 2023/04 | fastchat-t5-3b-v1.0 | We are excited to release FastChat-T5: our compact and commercial-friendly chatbot! | 3 | 512 | Apache 2.0 | |
h2oGPT | 2023/05 | h2oGPT | Building the World’s Best Open-Source Large Language Model: H2O.ai’s Journey | 12 – 20 | 256 – 2048 | Apache 2.0 | |
MPT-7B | 2023/05 | MPT-7B, MPT-7B-Instruct | Introducing MPT-7B: A New Standard for Open-Source, Commercially Usable LLMs | 7 | 84k (ALiBi) | Apache 2.0, CC BY-SA-3.0 | |
RedPajama-INCITE | 2023/05 | RedPajama-INCITE | Releasing 3B and 7B RedPajama-INCITE family of models including base, instruction-tuned & chat models | 3 – 7 | 2048 | Apache 2.0 | RedPajama-INCITE-Instruct-3B-v1 |
OpenLLaMA | 2023/05 | open_llama_3b, open_llama_7b, open_llama_13b | OpenLLaMA: An Open Reproduction of LLaMA | 3, 7 | 2048 | Apache 2.0 | OpenLLaMA-7B-Preview_200bt |
Falcon | 2023/05 | Falcon-180B, Falcon-40B, Falcon-7B | The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only | 180, 40, 7 | 2048 | Apache 2.0 |
MPT-30B | 2023/06 | MPT-30B, MPT-30B-instruct | MPT-30B: Raising the bar for open-source foundation models | 30 | 8192 | Apache 2.0, CC BY-SA-3.0 | MPT 30B inference code using CPU |
LLaMA 2 | 2023/06 | LLaMA 2 Weights | Llama 2: Open Foundation and Fine-Tuned Chat Models | 7 – 70 | 4096 | Custom Free if you have under 700M users and you cannot use LLaMA outputs to train other LLMs besides LLaMA and its derivatives | HuggingChat |
OpenLM | 2023/09 | OpenLM 1B, OpenLM 7B | Open LM: a minimal but performative language modeling (LM) repository | 1, 7 | 2048 | MIT | |
Mistral 7B | 2023/09 | Mistral-7B-v0.1, Mistral-7B-Instruct-v0.1 | Mistral 7B | 7 | 4096-16K with Sliding Windows | Apache 2.0 | Mistral Transformer |
OpenHermes | 2023/09 | OpenHermes-7B, OpenHermes-13B | Nous Research | 7, 13 | 4096 | MIT | OpenHermes-V2 Finetuned on Mistral 7B |
SOLAR | 2023/12 | Solar-10.7B | Upstage | 10.7 | 4096 | apache-2.0 | |
phi-2 | 2023/12 | phi-2 2.7B | Microsoft | 2.7 | 2048 | MIT | |
OLMo | 2024/02 | OLMo 1B, OLMo 7B, OLMo 7B Twin 2T | AI2 | 1,7 | 2048 | Apache 2.0 | |
Gemma | 2024/02 | Gemma 7B, Gemma 7B it, Gemma 2B, Gemma 2B it | Technical report | 2-7 | 8192 | Gemma Terms of Use | |
Zephyr | 2023/11 | Zephyr 7B | Website | 7 | 8192 | Apache 2.0 | |