DataCore is undergoing an agile transformation. Recently we moved source control from Perforce to GitHub. It was an interesting and challenging exercise. In this blog we’ll share our journey, lessons learned, and take a look at best practices for a successful migration.
A Brief Background on Version Control System (VCS)
Perforce: Perforce Software was founded in 1995 in Almeida, California by Christopher Seiwald. It allows companies to collaborate on large software by keeping track of changes in both source code and binary files.
Git: A collection of tools developed by Linus Torvalds in 2005 for tracking changes and coordinating work in a team. A distributed version control system to facilitate agile workflow where developers are encouraged to share smaller changes frequently.
GitHub: GitHub is a web hosting service for Git repositories where one can get free space for open source projects or you can pay for private projects. You can interact with your GitHub repository through the push/pull system on your local machine and get code reviewed, and initiate discussions before integrating it with rest of codebase via Pull Request.
Reasons to Move
- Distributed VCS facilitates efficient working
- Easier access to open source projects
- Better aligns with continuous integration model
- Provides better flexibility for things like feature toggling and swarming
- Makes it easier when team members are distributed
- Projects have lengthy histories to preserve
- Large number of files in depots with high number of dependencies
- Many historical branches hanging around
Storage Technique of Perforce vs git
As Perforce is a centralized VCS, it follows the same technique as most of the distributed VCS does. Let’s take an example of a repository and workflow in Perforce: say there is one repository with multiple branches, Perforce saves the copy of all the branches in that repository on its central server. Different teams and developers create their workspaces on different branches and submit their changes on respective branches. This way all the changes make their way to the Perforce central server. Here, as you could have imagined, every branch has its own copy of files. Perforce saves the changes made per each branch as part of a changelist which is submitted into its respective branch. While on the other hand, git works differently.
Git, which is a distributed VCS, saves just one copy of all the files on its server. Git saves the snapshots of the changes made for any branch by the help of saving the state of that branch. Each change that gets submitted to a branch is saved as a different state. Developers clone all the source code from that server to their local machines and then submit their changes once done. This way developers also have the whole change history on their local machine right in front of them.
DataCore Journey to GitHub
Saving commit(change) history
Once you have this idea of migration, the first question that you will be asked, or the first that comes to mind, is how to save the change history of your repository. After all, there are plenty of branches with different changes going in them each day and developers don’t want to lose that history as it might make them paranoid if they can’t find what went in last month, which is incompatible with the latest change that went in.
If you already thought of this, then you are in the right place. The git-p4 tool that we used to help us migrate also helps us to save the change history, but not all commit history. We were able to save the commit history of our Main(master) branch, because that matters the most. Adding “@all” at the end of branch to be cloned preserves the commit history of that particular branch while cloning it. The recommended strategy is to have your Perforce server Read-only even after migration.
User Permission and roles
User roles and permissions also have to be accounted for during migration. Once you have purchased the GitHub Account for your organization, you will find that there are different options to consider. We went with the GitHub enterprise edition. For security purposes, Active Directory (AD) integration with Github is important for us. After a successful integration of AD with Github, an invitation link can be sent out to the list of developers added in the AD. You can create teams and assign developers to different teams. Go to Setting>Collaborators and Teams. Different branches can even be assigned rules for submission, code reviews, pull request etc. Go to Setting>Branches>Create A Rule to configure the rules according to your needs.
Don’t just start the migration process without knowing what the effect of the change will be. Our team worked under different branches. We migrated per team. This ensured it was done in phases. Few groups moved each day. We responded to issues faced by team members and took care of any issues in future migrations.
Files which are over 50MB are considered to be Large Files. This sounds a little weird as in today’s world terabytes and petabytes are common. Typically, Large Files are not stored in VCS. Source files are generally small in size, but sometimes there are addition files which are needed for building our project and have to be kept in VCS. For those large files, git has a special extension called git-LFS(Large File Storage). Install this extension to store large files on GitHub remote server and update the VCS to store a pointer in place of that large file.
Merge multiple depots into one repository
If there are multiple stream depots in Perforce and one depot imports all other depots as dependencies, then additional steps are needed to make it happen. Clone your first depot, push that to Github and then find out git commands to merge unrelated histories. Clone the other depot and make a connection with the only Github repository by adding remote origin. Add –allow unrelated histories after git pull to merge unrelated histories of two depots in Perforce and merging them under one repository on GitHub.
How to define Success for this migration?
For a successful migration, you need to first integrate your new VCS with your CI CD tool. Fire up a build and run your testing framework. If all your tests are successful, you have had successful migration and you can set your GitHub repository open for submission.
Moving our source code from Perforce to GitHub was a six-week project for us. We have code on TFS and we plan to move TFS code to GitHub as well. We will share what we learned in a subsequent blog.
Thank you for reading, please contact us with any questions.