Collaborating between platforms is hard. Collaborating between multiple communities is even harder. In this talk from the 2021 GitKon Git conference, Johannes Schindelin, the maintainer of Git for Windows, one of the most widely used cross-platform tools of all time, shares his story of making what he calls a “friendly form of Git, modified to run across all editions of Windows” and some of the lessons he learned along the way.
A Git History
As we can see if we look at the history of Git, the version control system was born out of the needs of the Linux project, and as a result, Git was intended originally just for Linux users to run on Linux. That mindset informed many of the decisions the original Git team made.
Unfortunately for other platform users, this made the code produced by the Git team assume that certain underlying tools would be present on any machine where they tried to run Git. Early users of Git found they needed to be on a Linux machine with the newest editions of tools like Perl, Bash, and Python just to get Git to run correctly. While much of Git was implemented in C, a common language on almost all systems, a very large part of Git ran as shell scripts and Perl scripts. For example, the Git merge action ran as a Python script.
And all this meant that Git could not yet run on Windows.
If you’re looking for a better Git terminal experience on Windows, look no farther than GitKraken Client. The Git-enhanced terminal offers auto-suggest and auto-complete for Git commands, CLI diff view, and more.
Johannes discovered that Git could not run on Windows in 2006 when he took a freelance job that required him to use Windows. As a loyal Linux and Git community member, he wasn’t very happy to give up his preferred tool chain. Using Windows meant he had to use Apache Subversion, or Microsoft’s Visual SourceSafe, both of which he thought were not nearly as good as Git.
While Johannes saw a path to help him bridge the gap and improve his workflows though using git-svn, a bidirectional operation between a Subversion repository and Git. But to make his plan work, he first needed to make Git work on Windows.
Making Git Work on Windows
The first path Johannes followed to make Git run on Windows was leveraging Cygwin, a system that allows you to run Linux software on Windows. Johannes quickly discovered a problem with Cygwin: every program it compiles must use the Cygwin runtime, which is an emulation layer, and running this emulation comes at a steep price of speed. While it was very slow, the workaround was still usable and allowed Johannes to get all the benefits of Git, like patching, Git rebasing, and pre-reviewing each individual commit.
Even with the speed issues, Git allowed Johannes to work much more efficiently. He ended up working faster than his manager thought he should be able to work. He got so far ahead of another team that leveraged his work that he was asked to hold off on development for that project for a little while. This gave Johannes a chance to work on his idea: “how hard can it really be to port Git to run natively on Windows?”
This is a journey he is still on today.
Porting Git to Windows
One hurdle Johannes was forced to immediately overcome was the fact that directly porting Git from Linux is simply impossible. A large part of Git is implemented as shell scripts and there’s no native shell script interpreter on Windows. While he could get close to leveraging the MSys/MINGW port of Git Bash and get commands like Git commit to work, many other commands like git log
were not really usable.
It was about this time that Johannes stopped working for a Windows-based company, and in true community spirit, he handed what he had created in Git for Windows off to the Git mailing list. And he heard nothing back for about half a year; no one picked up the project. But one day, he finally heard that another Windows user had gotten his code to work and that user too had shared his patches. Other Windows users started popping up who also attempted to make his code work, but almost no one could get all the system components lined up to make the code work as expected.
Different Communities, Different Mindsets
After a couple of years, someone contacted Johannes about coming aboard as a professional Git for Windows developer inside Microsoft.
He was up for a new challenge. The people he met at Microsoft amazed him, not only with their technical skills but also by the way they thought about open source. Part of the deal with Microsoft was that Git for Windows would not be a Microsoft product and it would remain open source, but Microsoft would pay Johannes to maintain it. Microsoft was really happy about this because they wanted to try a new approach with the open source community.
Johannes started work at Microsoft in August 2015, a full eight years after starting the Git for Windows project.
With Microsoft paying him to focus on improving his project, he got to work on some things that had been languishing for a long time. For example, swapping out the underlying Msys system, which had not been maintained in years, and supporting more than just ASCII in Git repositories. Johannes quickly adopted the new Msys2 runtime, which was actively being supported through community contributions. This change meant for the first time, Git for Windows could have a really excellent terminal, where you could resize and copy the text. Git for Windows users were really happy about these updates!
That was how it started. But there was a lot more work to be done and a lot more challenges ahead.
GitKraken is the perfect example of a Git tool that has evolved to meet new challenges, with advanced features like interactive pull request management, merge conflict detection and resolution, deep linking, and more.
Challenges of Developing Git for Windows
POSIX and Windows
Underlying any operating system is the Portable Operating System Interface, or POSIX, which provides internal APIs, shells, and utility interfaces. Because Git was built for Linux only, the developers never bothered to introduce an abstraction layer for things like file access, making the assumption that all users would simply have access to the same POSIX.
In Windows, the POSIX subsystem had been discontinued at some stage, and instead, the OS tries to emulate the same POSIX as Linux, not by introducing a new abstraction layer, but by emulating the POSIX APIs on top of Windows. This does work, providing the same function signatures and functionality as POSIX, but comes at a price of inefficiency. Sometimes, to make a call work, you have to take additional steps that aren’t even used by the original request. For a single call, this is bad, but multiplied over many requests, the system consumes more resources than should be required.
Processes vs Threads
In the Linux or Unix mindset, it’s commonplace to spawn a whole new process for each task. In Linux, new processes are seen as “super cheap” to operate in terms of machine resource consumption. Though in reality, they are not super cheap, just super fast at the start of the new process. This is because Linux “cheats” by copying what it needs later in the process, making it seem efficient. On Windows, however, it’s not seen as cheap, nor is it efficient to do things this way.
The alternative approach to spawning new processes is using threads. Threads have the advantage of proper locking between threads as they can access the exact same memory pool. If you need to parallelize, or execute the code across multiple sets of resources, work threads are actually the better model in terms of resource consumption and memory use.
Threads are supported really well on Windows, but not on Linux. Linux maintainers originally thought threads weren’t necessary, and it took years to add any support for them. Working with threads is still not a very smooth experience on Linux even today, and as a result, parallelizing is a real challenge in Git.
For example, if you perform a Git checkout against millions of files, you can’t really have parallel processes because that’s super expensive. These different processes can completely stumble over each other, hogging the bus while the CPU just waits because the processes basically hinder each other from performing well. With threads, dividing tasks among the available resources to perform things like Git checkout is all much easier to do, with much less overhead for the operating system.
Mintty is Not Bash
Git for Windows originally used cmd as its shell, but now uses Mintty. While this change made it possible to run Git effectively, there are some tradeoffs. Mintty is not exactly an exact replacement for Bash. For example, any color output requires additional steps that add to the operational overhead of simulating the same experience as using Git with Bash on Linux. Also, escape sequences—allowing cursor location control and styling options—are not allowed.
Line Endings
Originally, the DOS operating system, the precursor to the Windows operating system that set many standards Windows adopted, didn’t allow line endings other than a carriage return. If you think of a physical printer, carriage returns tell the machine to go back to the beginning of the line and then align the feed to move the process forward. As developers moved to terminal and cursor alignment, a return character no longer strictly meant needing to return the cursor to the beginning of a line; you could just leave the cursor there.
Linux and Windows differ on how line endings should work. Using Git for Windows means you always have to convert to Unix line endings. This becomes a very big issue when binary data is mistreated as text. The data becomes corrupt and your jpegs won’t load anymore.
Symbolic Links
Symbolic links, or symlinks, are how files reference other files or directories. Of course, given the other underlying differences between the operating systems, Linux and Windows handle this problem differently. On Windows, symlinks need to specify what the link goes to: either a file or a directory. And if the target does not exist, Windows gets very unhappy about it. Linux, on the other hand, doesn’t need to specify this information. Furthermore, symlinks, or the things that symlinks point to, can change without making the system inoperable on Linux.
The Paths Problem
The biggest challenge of cross-platform development in Johannes’ opinion is the path problem. Because there is no native Unix shell on Windows, it uses the Msys2 Bash tool, which only pretends that paths look like they look on Linux or Unix. In Linux absolute paths, the full paths to files and directories start with a forward slash. Separators are only forward slashes and are never backslashes. On Windows, absolute paths start with a drive letter and then a colon, and then backlashes.
On Windows, you don’t have a single root directory; you have a root directory per drive. Git lives in root on a Linux system, but it lives somewhere completely different on Windows. When you go to the root directory in Msys2 Bash, it actually expands to the Windows path to the Git installation. To support shell scripting, Msys2 tries to convert between Windows and Unix-like paths on the fly, which can go very wrong in some situations. For example, using a regular expression that starts with a forward slash that gets converted to a Windows path will cause some major issues.
Cross-Platform Collaboration is Cross-Cultural Collaboration
While Johannes doesn’t know exact numbers, he thinks there are between 3-10 million Git for Windows users out there. And Johannes is just one person maintaining a tool for millions. 🤯 He is always looking for contributors who want to overcome both the technical challenges and the cross-cultural hurdles to continue to improve and evolve Git for Windows.
It’s challenging to collaborate, especially cross-platform but in the end, our software will be better for it.