[salesforce/CodeT5: Code for CodeT5: a new code-aware pre-trained encoder-decoder model. (github.com)](https://github.com/salesforce/CodeT5) From Forbes.com: [CodeT5](https://blog.salesforceairesearch.com/codet5/?utm_source=thenewstack&utm_medium=website&utm_campaign=platform "https://blog.salesforceairesearch.com/codet5/?utm_source=thenewstack&utm_medium=website&utm_campaign=platform") is an open source programming language model built by researchers at SalesForce. It is based on Google’s T5 (Text-to-Text Transfer Transformer) framework. In order to train CodeT5, the team sourced over 8.35 million instances of code, including user comments, from publicly accessible GitHub repositories. A majority of these datasets were derived from the CodeSearchNet dataset, which includes Ruby, JavaScript, Go, Python, PHP, C, and C#, in addition to two C and C# datasets from BigQuery. CodeT5 can potentially bring three capabilities to software programming: - **Text-to-code generation**: generate code based on the natural language description - **Code autocompletion**: complete the whole function of code given the target function name - **Code summarization**: generate the summary of a function in natural language description