CodeT5 is a family of open code large language models developed by Salesforce Research. It employs an encoder-decoder architecture that flexibly operates in different modes to support a wide range of code understanding and generation tasks. Trained on diverse pretraining tasks, including span denoising, causal language modeling, contrastive learning, and text-code matching, CodeT5 learns rich representations from both unimodal code data and bimodal code-text data. The models are available in various sizes, ranging from 220M to 16B parameters, catering to different computational needs and performance requirements.