How to Debug Swift Compiler: Part 3 — Die Hard or Debug Swift on Ubuntu
How to resolve a crash in Swift compiler version 5.9.2, which happens only on Linux platforms, without cutting your feet while walking on a broken glass.
Debugging Swift compiler on macOS is quite straightforward; however, when it comes to Linux, it can get really weird and might even seem not to be possible at all even before attempting to clone Swift compiler in the first place. Let’s see what happened when I tried doing so on New Year’s Eve 2023 to end the year on a positive note.
Previously, in Part 2 of this article series, I have mentioned fixing a compiler crash in Part 3; however, as it turns out, there was a more interesting journey awaiting for me, which I decided to describe in Part 3 instead. Part 4 is inevitable though 🤝
How It All Started
When I was doing the latest release of Sourcery (2.1.3), one of the tasks was to supply a compiled distribution for Ubuntu. As usual, I fired-up my tart VM to compile Sourcery and upload the artifact to the release on GitHub.
I would then find out about a crash in Swift 5.9.2 which would happen only on Linux. After observing such, first thing that came to my mind was to revert the compiler version to 5.9.0. And that worked as expected — Sourcery project compiled successfully, tests are green. Build was then packaged and published along with the “pre-New-Year’s-Eve” 2.1.3 release.
After finishing that task, I immediately switched to reporting the crash under apple/swift repository. When preparing the crash stacktrace, I figured, that it seemed to be a reasonably straightforward reason for a crash to occur. And then it crossed my mind that I might just be the one who would supply the fix, since it is a holiday season anyway for most developers out there. So I went ahead and started clonning apple/swift repository locally, as I did previously under macOS in the Part 1 of this article series.
Cloning of apple/swift went just fine, whilst I was simply following the guidelines described in apple/swift/docs/HowToGuides/GettingStarted.md:
1. git clone git@github.com:apple/swift.git swift
2. cd swift
3. utils/update-checkout --clone-with-ssh
When git clone
itself worked fine, what struck me was the fact that the step #3, utils/update-checkout --clone-with-ssh
would trap with SIGABRT signal. That was weird, I thought and checked the CI status and recently opened issues on Swift repository GitHub related to that — absolutely nothing was found. It meant only one thing: I needed to debug utils/update-checkout
script itself.
Python is Simply a Snake
I then fired up my Visual Studio Code, the same I have used to enable support for Linux in that same Sourcery project, using that same Virtual Machine run by tart. I navigated to the swift
project folder, and found the script under exactly that same path of step #3 — utils/update-checkout
. I then navigated the source editor to see what was inside:
1. #!/usr/bin/env python3
2.
3. import sys
4.
5. import update_checkout
6.
7. if __name__ == '__main__':
8. # This line was added in dfe3af81b2 to address an importing issue on
9. # Windows. It causes this script to break badly when used with
10. # Python 3.8 on macOS. Disabling for all Python 3 until someone
11. # can help sort out what's really needed for Windows:
12. if sys.version_info.major < 3:
13. sys.modules[__name__] = sys.modules['update_checkout']
14.
15. update_checkout.main()
Line #15 calls a different script named update_checkout
. So I followed the leads and saw this big Python script. Oh well, don’t I have absolutely no experience in Python, I wondered. I then wandered in this script for some time, putting breakpoints here and there, tweaking variables. It was unclear to me how to pass command line arguments in Visual Studio Code to set --clone-with-ssh
flag to true, when debugging this script. So I needed to change the actual value of the variable, which was storing this flag:
clone_with_ssh = args.clone_with_ssh
It is quite interesting to guess which value should I hardcode, “1”? “true”? What is “true” value in Python? I definitely did not want to google, read a book on Python, or even ask “AI”, better named as “LLM”. Rather, I just went ahead and tried setting true
as the value for clone_with_ssh
variable. I then debugged and saw that False
is the default value, so I changed my true
with True
and it worked 😅.
yes, I completely disregard the fact that “Artificial Intelligence” is used everywhere, while conceptually this is not an intelligence, rather simply a sophisticated data science, in my personal opinion
While script was processing, it arrived at the following concurrency code, which I barely was able to even read:
1. def run_parallel(fn, pool_args, n_processes=0):
2. """Function used to run a given closure in parallel.
3.
4. NOTE: This function was originally located in the shell module of
5. swift_build_support and should eventually be replaced with a better
6. parallel implementation.
7. """
8.
9. if n_processes == 0:
10. n_processes = cpu_count() * 2
11.
12. lk = Lock()
13. print("Running ``%s`` with up to %d processes." %
14. (fn.__name__, n_processes))
15. pool = Pool(processes=n_processes, initializer=child_init, initargs=(lk,))
16. results = pool.map_async(func=fn, iterable=pool_args).get(999999)
17. pool.close()
18. pool.join()
19. return results
Line #12 in this script was causing the SIGABRT signal and cloning did not even start. So I figured, “what about this Lock thingy”? I have, as one would expect, tried to remove the Lock
completely, and the code would not run because of Pool
thingy, which required that lock as one of its constructor’s arguments.
I then tried to replace Lock
with RLock
just in case there was something related to re-entrancy of some sorts, simply guessing, again, without having any clues. The next time I wondered, why do I even need concurrency in the first place? I tried to remove the Pool
and everything related to parallelism from this function. But how to do that correctly? Answer is simple: poke it and see what happens.
So I poked: commented everything from line #12 until line #18. I then defined results
as simply an empty collection []
. Why would this even work in Python? — simply guessing I was. The next step would be to call some function that was called from this Pool
thingy in some sort of a for-loop and passing the same pool_args
, which by its name implied a collection. The following was the result of my first attempt in writing Python code in my life:
1. def run_parallel(fn, pool_args, n_processes=0):
2. """Function used to run a given closure in parallel.
3.
4. NOTE: This function was originally located in the shell module of
5. swift_build_support and should eventually be replaced with a better
6. parallel implementation.
7. """
8.
9. if n_processes == 0:
10. n_processes = cpu_count() * 2
11. results = []
12.
13. for args in pool_args:
14. fn(args)
15. #lk = Lock()
16. #print("Running ``%s`` with up to %d processes." %
17. # (fn.__name__, n_processes))
18. #pool = Pool(processes=n_processes, initializer=child_init, initargs=(lk,))
19. #results = pool.map_async(func=fn, iterable=pool_args).get(999999)
20. #pool.close()
21. #pool.join()
22. return results
It worked from the first attempt. I followed the same pattern used on the line #9, where an if
statement was used, and guessed that the for-in
loop would look similar, according to my general understanding of the for-in loop syntax in languages such as Swift or Objective-C.
Now, the cloning was going slowly, but steadily, only until I hit the next issue: not enough disk space in my Ubuntu VM. To understand how to resolve this, let’s consider what tart tool does and how it works.
P.S. excuse me for overusing the word “thingy”, I simply have no clue what Python programming language is.
Tart
Tart is simply a tool which uses Apple’s Virtualization framework to run virtual machines from macOS. These virtual machines can be either macOS or Linux. To set up a Linux VM using tart, I followed this GitHub Issue as my guide, and entered the following command under my macOS host:
tart clone ghcr.io/cirruslabs/ubuntu:focal ubuntu
tart run ubuntu
It then downloaded a preconfigured Ubuntu 20.04 VM and I was able to upgrade it to Ubuntu 22.04 using this guide. Please note, that as of December 31st, 2023, you need to execute sudo do-release-upgrade
command without -d
flag in order to be able to upgrade.
Having VM running, I then needed to install the following set of utilities and tools to be able to run a compilation of any sort in my Ubuntu VM:
sudo apt install libffi-dev
sudo apt install build-essential
sudo apt install libsqlite3-dev
sudo apt-get install libncurses5-dev
Note that I needed to run exactly this set of command to be able to run unit tests for Sourcery when I was implementing its Linux support. It might be redundant for the Swift compiler compilation. In fact, the aforementioned apple/swift/docs/HowToGuides/GettingStarted.md mentions the list of the required dependencies.
What is not clear with all this virtualization framework, is how to set the disk image size. For example, after running the tart clone
command mentioned before, the resulting VM has only 50 GB of size allocated for the guest OS. And here comes the problem: when I was running the update-checkout --clone-with-ssh
, I hit the not enough space
error.
I have then tried to resize the VM following this guide and used the next command:
truncate -s 100g ~/.tart/vms/my-vm/disk.img
To my surprise, VM would not start afterwards. Of course, I ran tart export
right after the initial setup, so nothing is lost and I am still able to compile a new release for Sourcery whenever I need very quickly. But so I got a little bit stuck with the situation. “What is the next step?”, I thought. I have followed my gut instinct and tried to clone the VM again, from scratch. Good thing that tart
has ~/.tart/caches
folder, which stored the previous clone’s archive.
After creating a brand new Ubuntu Linux VM, I followed the same guide and entered the following set of commands after establishing an ssh
session with the guest OS:
$ truncate -s 100g ~/.tart/vms/my-vm/disk.img
$ diskutil repairDisk disk0
$ diskutil apfs resizeContainer disk0s2 0
$ df -h ~
But it won’t work! Because this instruction is for macOS
guest OS, not Linux! So I figured, how should I do the same, but on Linux? I randomly installed an app called GParted
using:
sudo apt install gparted -y
And voila! I was able to resize the disk to full 100 GB! Finally, I was able to proceed with the compilation of the Swift compiler…
Funny thing is, I did not have to fix the parallelization issue in that update_checkout
Python script anymore after setting up a brand new VM full of free space available.
Swift Compiler, Compile!
To compile Swift compiler dependencies and setup the environment, the following command needs to be executed:
utils/build-script --release-debuginfo
Keep in mind that there is a list of required dependencies which needs to be installed before running this command. I then immediatelly hit an error that clang
is not intalled. Simply running apt install clang -y
resolved this issue. Another requirement for the compiler to be able to compile, a pre-installed version of Swift. That can be easily controlled using a tool called Swiftly. Remember to add the installed swiftly
path to your ~/.bashrc
to keep the Swift compiler development environment informed about the location of the installed Swift toolchain.
One more note, is that due to the fact that I have installed sccache
according to the apple/swift/docs/HowToGuides/GettingStarted.md, I needed to run the following command and not the first I have entered:
utils/build-script --release-debuginfo --sccache
Compilation process started! Little that I knew how tedious it would get to compile Swift compiler under Linux, and that I would uncover even more issues with it afterwards…
Linking Phase Failing
Oh yeah, you heard that right — compiler would not compile, having the latest revision of apple/swift main branch, I won’t be able to run the compilation swiftly. Errors I was facing were related to different parts of the clang
toolset, in particular during the linking phase, Swift compiler build script would yield an error like “linking failed” when building bin/clang-17
, bin/clang-tidy
and other binaries. I wanted to add the -v
flag to the clang
compiler, but it was super tedious and ambiguous how to do it when using Ninja
build system. Then I have tried to do the following:
- Invest full day of work into researching of why these tools would fail to link
- Resolve some of the issues by simply guessing if something was wrong with dependencies. Previously in this article, I have mentioned that some of the installed packages were needed only for Sourcery development under Ubuntu. However, these installed packages were conflicting with the requirements of the Swift compiler. I googled for a lot of the similar errors and realized that I should remove the
gcc++
and re-run the installation of all of the dependencies required by the Swift compiler. - Observe dozens of issues closely related, but not exactly the same as the issue I was facing:
- Swift Forums thread which seemed to be extremely related to a failing development tools build on Linux, but it led to no avail for me
- A number of GitHub Issues with relevant errors
As you can imagine, nothing helped. So what I did was simply the best illogical thing to do in such situation: I have ran the build script again and again, over and over. And in the end, after hours and hours of simply re-running the same compilation command again and again, the compilation of the Swift compiler under Ubuntu 22.04 finished with a success.
And the reason was, of course, the simplest possible reason one might ever uncover: not enough RAM to link compiled symbols with each other during LLVM compilation process inside of the VM. The solution was to increase the size of RAM dedicated to tart’s VM by executing the following command: tart set ubuntu2 --memory=16384
, setting available RAM to 16 GB. Another issue I have faced during the last mile of compiling Swift compiler under Ubuntu was the fact that the disk space must be at least 150 GB inside of the VM.
TLDR: before attempting to compile Swift compiler under Ubuntu, it is important to have a VM with the following system configuration:
- at least 8 GB of RAM (I have used 16 GB just to be sure)
- at least 150 GB of hard drive
It ran tests during compilation phase, compiled llvm and all other dependencies, making the swift-frontend
compiled binary available to be debugged using, as suggested in apple/swift/docs/HowToGuides/GettingStarted.md, CLion IDE.
After compiling the Swift compiler, I was able to run the old friend, swift frontend
and pass the same arguments which have caused the compiler to crash in the first place. And so, the actual investigation of the crash caused by one of the files in Sourcery started…
The arguments to
swift frontend
are usually shown in the generated crash report.
CLion is Not a Lion
Whilst I was trying to follow the apple/swift/docs/HowToGuiders/GettingStarted.md section related to CLion setup, I faced a number of errors which I could not overcome, such as some c++
compiler errors coming when trying to run swiftFrontend
executable with a debugger attached. And so I decided to give a chance to Microsoft Visual Studio Code instead, which I use primarily for Sourcery under Linux development.
I have searched and found a very nice guide how to set VSCode for Swift compiler development. To my surprise, it took around 10 minutes from zero to debugging swift-frontend
. I really enjoy VSCode under Linux and I have created a set of Keybindings for Linux which I use to move cursor around in the source code editor, as well as hotkeys to build/run/step over and other operations in VSCode, more-or-less aligned with the behaviour in Xcode.
The Crash
After 3 days of setting up the environment, I was finally able to build & run swift-frontend
with the arguments from the crash report. To my surprise, the crash did not happen. It means that I could not reproduce it. I have then tried to follow the steps in the crash report I have mentioned, which led to… no avail.
The PR
Nevertheless, I have opened a PR which improves the GettingStarted.md
by mentioning the minimal requirements for a VM configuration which runs Linux in order to be able to compile Swift toolchain without issues I have faced when trying to reproduce the crash.
Want to Connect?
Follow me on X (Twitter): @r_alikhamov or LinkedIn 🤝
To debug code is a specific form of art. It requires critical thinking and more-or-less “Sherlock Holmes-like” approach to problems. When it comes to debugging the debugger debugging a crashing code, you simply need to overcome the fear of the unknown and dive into the process. The discoveries you might find will surely surpass any other possible experience you might get anywhere else.
References
- apple/swift/docs/HowToGuides/GettingStarted.md — https://github.com/apple/swift/blob/main/docs/HowToGuides/GettingStarted.md
- How to Debug Swift Compiler: Part 2 — Can “AI” Fix the Compiler Crash? — https://medium.com/itnext/how-to-debug-swift-compiler-part-2-can-ai-fix-the-compiler-crash-6a95fe7c0d60
- PR Improving GettingStarted.md — https://github.com/apple/swift/pull/70658