Custom Models for GitHub Copilot

The custom models option for GitHub Copilot is now available in public beta, allowing developers to fine-tune Copilot to understand better and align with each organization’s unique coding practices.

This new capability improves the relevancy and accuracy of code suggestions in projects, so we’ve compiled the most important points to remember and the keys to getting the most out of it now.

What are Custom Models?

Custom models are LLMs refined using an organization’s code bases. By training the model on proprietary libraries, specialized languages, and internal coding patterns, Copilot will be able to provide code suggestions that are more context-sensitive and tailored to the needs of each case.

You can now create a custom model using your own GitHub repositories, and you can also enable the collection of code snippets and telemetry from other developers’ Copilot prompts and responses, to further fine-tune the model.

This aligns Copilot’s suggestions with each developer’s coding practices, helping to make them more relevant and accurate. This translates into less time spent on code review, debugging, and manual code tuning, and therefore higher productivity and better code quality.

When to use custom models?

As in all other processes, there is a tool for every moment. In this case, you should consider using customized models in the following scenarios:

Improve library and API usage: A model can prioritize custom libraries and APIs for its suggestions, making it easier to follow internal standards.
Improve support for specialized languages: fine-tuning helps Copilot better understand less common or proprietary languages, reducing friction and improving productivity.
Adapt to evolving code bases: By periodically training your code base, you can also ensure that Copilot stays up-to-date with the latest coding patterns so that it continues to provide relevant and accurate suggestions.

Create a Custom Model in GitHub Copilot

As it is still in its beta version, only an organization within a company can create a customized model.

Once assigned as the owner of the organization, you can choose which repositories will be used to train the model. The model can be trained on one, several, or all of the organization’s repositories, and is trained on the content of the default branches of the selected repositories.

The custom model will be used to generate code completion suggestions on all file types, regardless of whether that file type was used for training. And you can also choose whether telemetry data should be used.

Once started, the creation of a custom model will take several hours to complete. When the process is complete, you will be notified by email, and if it fails, Copilot will continue to use the current model to generate code completion suggestions.

When the model has been successfully created, all managed users in the enterprise who have access to Copilot Enterprise in the organization where it has been deployed will begin to see code completion hints generated by the custom model.

To test the effectiveness of the model, it is advisable to evaluate the usage and satisfaction levels of GitHub Copilot code completion suggestions before and after model deployment. This can be done by using REST APIs or by surveying developers on their perception and satisfaction with the model suggestions.

Implementing the custom model

Here are the steps to follow to set up a personalized LLM:

In the upper right corner of GitHub, select “Your organizations” and then “Settings”.
In the left sidebar, click on “Copilot” and then “Custom Model”.
On the “Custom Models” page, click “Train a new custom model” and then “Select repositories” and choose from all or selected repositories.
If you choose selected repositories, select the ones you want to use for training and then click on “Apply”.
Optionally, if you prefer to train your model only with code written in certain programming languages, go to “Specify languages” and type the name of a language you want to include. Select the one you want from the list displayed and repeat the process for each language you want to include.
Click on “Create new custom model”.

*Extra: To improve the performance of the model, select the check box labeled “Include prompt and hint data”. This will allow Copilot to collect data from the user-submitted prompts and code completion hints that were generated. Once sufficient data has been collected, Copilot will use it as part of the model training process, allowing it to produce a more efficient model.

Aspects to be considered

You will be able to check the progress of the model creation in the “Training details” button and you should also keep in mind that the training may fail for several reasons, such as:

There is insufficient or unrepresentative data, which makes the fine-tuning unstable.
If the data are not sufficiently different from the public data on which the base model was trained, the training may fail or the quality of the code completion suggestions of the custom model may be only marginally improved.
A data preprocessing step may encounter unexpected file types and formats that cause an error. Therefore, the solution may be to specify only certain file types for training.

On the other hand, you can update or delete the custom model from the organization’s configuration page. When you retrain the model, it is updated to include any new code that has been added to the repositories that were selected for training. You will be able to retrain it once a week.

How Plain Concepts can help you

GitHub Copilot is having a major impact on the way developers and organizations create software. According to research from Accenture, developers using Copilot experienced an 8% increase in change onboarding requests, a 15% increase in merge rates, and an 84% increase in release success rate.

The study also shows that 90% of developers were more satisfied with their work when using GitHub Copilot and 95% said they enjoyed coding more with this help.

Plain Concepts also ran a pilot test among our team to test its effectiveness, and you can see the initial results and conclusions here.

In addition, we believe that custom models represent the next big leap in coding, as you now extend these capabilities directly to the inline code completion experience.

By training Copilot on your private code bases and also incorporating telemetry, custom models allow Copilot to adapt to your organization’s unique coding environment in real-time. In addition, key steps have been taken to incorporate data security measures for optimized models at the scale that enterprises need.

Each company’s data will always be private and each company is the sole owner of it, it will never be used to train other customers’ models. In fact, when a training process is initiated, the data in your repository and telemetry data is tokenized and temporarily copied to the Azure training pipeline.

Some of this data is used for training, while another set is reserved for validation and quality assessment. Once the tuning process is complete, the model undergoes a series of quality assessments to ensure that it outperforms the reference model.

If the model passes the quality checks, it is deployed in OpenAI. This setup allows us to host several LoRA models at scale while keeping them isolated from each other. Once the process is complete, the temporary training data is removed and the data flow resumes through the normal inference channels.

From Plain Concepts, if you need a partner to uncover the full potential of GitHub Copilot, we make it easy for you:

Somos el primer partner en España acreditado por GitHub.
Llevamos más de 17 años trabajando en la cultura Agile referente en la comunidad DevOps.
Contamos con un equipo especializado compuesto por más de 350 ingenieros senior en App Innovation y DevOps.
Acreditados como AMMP.
DevSecOps con MVPs.

In addition, we do not stop at certifications and we offer you an exclusive GitHub Adoption Framework to find the service that best suits your needs, from the best experts. Contact us to learn more!

Cookie	Duration	Description
__cfduid	1 year	The cookie is used by cdn services like CloudFare to identify individual clients behind a shared IP address and apply security settings on a per-client basis. It does not correspond to any user ID in the web application and does not store any personally identifiable information.
__cfduid	29 days 23 hours 59 minutes	The cookie is used by cdn services like CloudFare to identify individual clients behind a shared IP address and apply security settings on a per-client basis. It does not correspond to any user ID in the web application and does not store any personally identifiable information.
__cfduid	1 year	The cookie is used by cdn services like CloudFare to identify individual clients behind a shared IP address and apply security settings on a per-client basis. It does not correspond to any user ID in the web application and does not store any personally identifiable information.
__cfduid	29 days 23 hours 59 minutes	The cookie is used by cdn services like CloudFare to identify individual clients behind a shared IP address and apply security settings on a per-client basis. It does not correspond to any user ID in the web application and does not store any personally identifiable information.
_ga	1 year	This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assign a randomly generated number to identify unique visitors.
_ga	1 year	This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assign a randomly generated number to identify unique visitors.
_ga	1 year	This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assign a randomly generated number to identify unique visitors.
_ga	1 year	This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assign a randomly generated number to identify unique visitors.
_gat_UA-326213-2	1 year	No description
_gat_UA-326213-2	1 year	No description
_gat_UA-326213-2	1 year	No description
_gat_UA-326213-2	1 year	No description
_gid	1 year	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the wbsite is doing. The data collected including the number visitors, the source where they have come from, and the pages viisted in an anonymous form.
_gid	1 year	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the wbsite is doing. The data collected including the number visitors, the source where they have come from, and the pages viisted in an anonymous form.
_gid	1 year	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the wbsite is doing. The data collected including the number visitors, the source where they have come from, and the pages viisted in an anonymous form.
_gid	1 year	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the wbsite is doing. The data collected including the number visitors, the source where they have come from, and the pages viisted in an anonymous form.
attributionCookie	session	No description
cookielawinfo-checkbox-analytics	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Analytics" category .
cookielawinfo-checkbox-necessary	1 year	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-necessary	1 year	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-non-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Non Necessary".
cookielawinfo-checkbox-non-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Non Necessary".
cookielawinfo-checkbox-non-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Non Necessary".
cookielawinfo-checkbox-non-necessary	1 year	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Non Necessary".
cookielawinfo-checkbox-performance	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to store the user consent for cookies in the category "Performance".
cppro-ft	1 year	No description
cppro-ft	7 years 1 months 12 days 23 hours 59 minutes	No description
cppro-ft	7 years 1 months 12 days 23 hours 59 minutes	No description
cppro-ft	1 year	No description
cppro-ft-style	1 year	No description
cppro-ft-style	1 year	No description
cppro-ft-style	session	No description
cppro-ft-style	session	No description
cppro-ft-style-temp	23 hours 59 minutes	No description
cppro-ft-style-temp	23 hours 59 minutes	No description
cppro-ft-style-temp	23 hours 59 minutes	No description
cppro-ft-style-temp	1 year	No description
i18n	10 years	No description available.
IE-jwt	62 years 6 months 9 days 9 hours	No description
IE-LANG_CODE	62 years 6 months 9 days 9 hours	No description
IE-set_country	62 years 6 months 9 days 9 hours	No description
JSESSIONID	session	The JSESSIONID cookie is used by New Relic to store a session identifier so that New Relic can monitor session counts for an application.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
viewed_cookie_policy	1 year	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
viewed_cookie_policy	1 year	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
wmc	9 years 11 months 30 days 11 hours 59 minutes	No description

Cookie	Duration	Description
__cf_bm	30 minutes	This cookie, set by Cloudflare, is used to support Cloudflare Bot Management.
sp_landing	1 day	The sp_landing is set by Spotify to implement audio content from Spotify on the website and also registers information on user interaction related to the audio content.
sp_t	1 year	The sp_t cookie is set by Spotify to implement audio content from Spotify on the website and also registers information on user interaction related to the audio content.

Cookie	Duration	Description
_hjAbsoluteSessionInProgress	1 year	No description
_hjAbsoluteSessionInProgress	1 year	No description
_hjAbsoluteSessionInProgress	1 year	No description
_hjAbsoluteSessionInProgress	1 year	No description
_hjFirstSeen	29 minutes	No description
_hjFirstSeen	29 minutes	No description
_hjFirstSeen	29 minutes	No description
_hjFirstSeen	1 year	No description
_hjid	11 months 29 days 23 hours 59 minutes	This cookie is set by Hotjar. This cookie is set when the customer first lands on a page with the Hotjar script. It is used to persist the random user ID, unique to that site on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.
_hjid	11 months 29 days 23 hours 59 minutes	This cookie is set by Hotjar. This cookie is set when the customer first lands on a page with the Hotjar script. It is used to persist the random user ID, unique to that site on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.
_hjid	1 year	This cookie is set by Hotjar. This cookie is set when the customer first lands on a page with the Hotjar script. It is used to persist the random user ID, unique to that site on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.
_hjid	1 year	This cookie is set by Hotjar. This cookie is set when the customer first lands on a page with the Hotjar script. It is used to persist the random user ID, unique to that site on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.
_hjIncludedInPageviewSample	1 year	No description
_hjIncludedInPageviewSample	1 year	No description
_hjIncludedInPageviewSample	1 year	No description
_hjIncludedInPageviewSample	1 year	No description
_hjSession_1776154	session	No description
_hjSessionUser_1776154	session	No description
_hjTLDTest	1 year	No description
_hjTLDTest	1 year	No description
_hjTLDTest	session	No description
_hjTLDTest	session	No description
_lfa_test_cookie_stored	past	No description

Cookie	Duration	Description
loglevel	never	No description available.
prism_90878714	1 month	No description
redirectFacebook	2 minutes	No description
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextId	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.