How to Debug Swift Compiler: Part 3 — Die Hard or Debug Swift on Ubuntu

Ruslan Alikhamov
ITNEXT
Published in
12 min readDec 31, 2023

--

Penguin sitting on a chair staring at iMac and debugging Swift compiler under Linux with SJ portrait behind on a wall
Regarding Linux and the Swift compiler, time needs to be measured in 4 dimensions, thus exactly 4 arrows are required on every clock. Generated by “AI”.

How to resolve a crash in Swift compiler version 5.9.2, which happens only on Linux platforms, without cutting your feet while walking on a broken glass.

Debugging Swift compiler on macOS is quite straightforward; however, when it comes to Linux, it can get really weird and might even seem not to be possible at all even before attempting to clone Swift compiler in the first place. Let’s see what happened when I tried doing so on New Year’s Eve 2023 to end the year on a positive note.

Previously, in Part 2 of this article series, I have mentioned fixing a compiler crash in Part 3; however, as it turns out, there was a more interesting journey awaiting for me, which I decided to describe in Part 3 instead. Part 4 is inevitable though 🤝

How It All Started

When I was doing the latest release of Sourcery (2.1.3), one of the tasks was to supply a compiled distribution for Ubuntu. As usual, I fired-up my tart VM to compile Sourcery and upload the artifact to the release on GitHub.

I would then find out about a crash in Swift 5.9.2 which would happen only on Linux. After observing such, first thing that came to my mind was to revert the compiler version to 5.9.0. And that worked as expected — Sourcery project compiled successfully, tests are green. Build was then packaged and published along with the “pre-New-Year’s-Eve” 2.1.3 release.

After finishing that task, I immediately switched to reporting the crash under apple/swift repository. When preparing the crash stacktrace, I figured, that it seemed to be a reasonably straightforward reason for a crash to occur. And then it crossed my mind that I might just be the one who would supply the fix, since it is a holiday season anyway for most developers out there. So I went ahead and started clonning apple/swift repository locally, as I did previously under macOS in the Part 1 of this article series.

Cloning of apple/swift went just fine, whilst I was simply following the guidelines described in apple/swift/docs/HowToGuides/GettingStarted.md:

1. git clone git@github.com:apple/swift.git swift
2. cd swift
3. utils/update-checkout --clone-with-ssh

When git clone itself worked fine, what struck me was the fact that the step #3, utils/update-checkout --clone-with-ssh would trap with SIGABRT signal. That was weird, I thought and checked the CI status and recently opened issues on Swift repository GitHub related to that — absolutely nothing was found. It meant only one thing: I needed to debug utils/update-checkout script itself.

Python is Simply a Snake

Penguins sitting on an unknown planet with 2 moons and Sun so close it looks like it is on the planet itself, with “swift” thoughts flying above penguins
In a parallel universe parallelism is just a myth. Generated by “AI”

I then fired up my Visual Studio Code, the same I have used to enable support for Linux in that same Sourcery project, using that same Virtual Machine run by tart. I navigated to the swift project folder, and found the script under exactly that same path of step #3 — utils/update-checkout. I then navigated the source editor to see what was inside:

1. #!/usr/bin/env python3
2.
3. import sys
4.
5. import update_checkout
6.
7. if __name__ == '__main__':
8. # This line was added in dfe3af81b2 to address an importing issue on
9. # Windows. It causes this script to break badly when used with
10. # Python 3.8 on macOS. Disabling for all Python 3 until someone
11. # can help sort out what's really needed for Windows:
12. if sys.version_info.major < 3:
13. sys.modules[__name__] = sys.modules['update_checkout']
14.
15. update_checkout.main()

Line #15 calls a different script named update_checkout. So I followed the leads and saw this big Python script. Oh well, don’t I have absolutely no experience in Python, I wondered. I then wandered in this script for some time, putting breakpoints here and there, tweaking variables. It was unclear to me how to pass command line arguments in Visual Studio Code to set --clone-with-ssh flag to true, when debugging this script. So I needed to change the actual value of the variable, which was storing this flag:

clone_with_ssh = args.clone_with_ssh

It is quite interesting to guess which value should I hardcode, “1”? “true”? What is “true” value in Python? I definitely did not want to google, read a book on Python, or even ask “AI”, better named as “LLM”. Rather, I just went ahead and tried setting true as the value for clone_with_ssh variable. I then debugged and saw that False is the default value, so I changed my true with True and it worked 😅.

yes, I completely disregard the fact that “Artificial Intelligence” is used everywhere, while conceptually this is not an intelligence, rather simply a sophisticated data science, in my personal opinion

While script was processing, it arrived at the following concurrency code, which I barely was able to even read:

1. def run_parallel(fn, pool_args, n_processes=0):
2. """Function used to run a given closure in parallel.
3.
4. NOTE: This function was originally located in the shell module of
5. swift_build_support and should eventually be replaced with a better
6. parallel implementation.
7. """
8.
9. if n_processes == 0:
10. n_processes = cpu_count() * 2
11.
12. lk = Lock()
13. print("Running ``%s`` with up to %d processes." %
14. (fn.__name__, n_processes))
15. pool = Pool(processes=n_processes, initializer=child_init, initargs=(lk,))
16. results = pool.map_async(func=fn, iterable=pool_args).get(999999)
17. pool.close()
18. pool.join()
19. return results

Line #12 in this script was causing the SIGABRT signal and cloning did not even start. So I figured, “what about this Lock thingy”? I have, as one would expect, tried to remove the Lock completely, and the code would not run because of Pool thingy, which required that lock as one of its constructor’s arguments.

I then tried to replace Lock with RLock just in case there was something related to re-entrancy of some sorts, simply guessing, again, without having any clues. The next time I wondered, why do I even need concurrency in the first place? I tried to remove the Pool and everything related to parallelism from this function. But how to do that correctly? Answer is simple: poke it and see what happens.

So I poked: commented everything from line #12 until line #18. I then defined results as simply an empty collection []. Why would this even work in Python? — simply guessing I was. The next step would be to call some function that was called from this Pool thingy in some sort of a for-loop and passing the same pool_args, which by its name implied a collection. The following was the result of my first attempt in writing Python code in my life:

1. def run_parallel(fn, pool_args, n_processes=0):
2. """Function used to run a given closure in parallel.
3.
4. NOTE: This function was originally located in the shell module of
5. swift_build_support and should eventually be replaced with a better
6. parallel implementation.
7. """
8.
9. if n_processes == 0:
10. n_processes = cpu_count() * 2
11. results = []
12.
13. for args in pool_args:
14. fn(args)
15. #lk = Lock()
16. #print("Running ``%s`` with up to %d processes." %
17. # (fn.__name__, n_processes))
18. #pool = Pool(processes=n_processes, initializer=child_init, initargs=(lk,))
19. #results = pool.map_async(func=fn, iterable=pool_args).get(999999)
20. #pool.close()
21. #pool.join()
22. return results

It worked from the first attempt. I followed the same pattern used on the line #9, where an if statement was used, and guessed that the for-in loop would look similar, according to my general understanding of the for-in loop syntax in languages such as Swift or Objective-C.

Now, the cloning was going slowly, but steadily, only until I hit the next issue: not enough disk space in my Ubuntu VM. To understand how to resolve this, let’s consider what tart tool does and how it works.

P.S. excuse me for overusing the word “thingy”, I simply have no clue what Python programming language is.

Tart

Tart is simply a tool which uses Apple’s Virtualization framework to run virtual machines from macOS. These virtual machines can be either macOS or Linux. To set up a Linux VM using tart, I followed this GitHub Issue as my guide, and entered the following command under my macOS host:

tart clone ghcr.io/cirruslabs/ubuntu:focal ubuntu
tart run ubuntu

It then downloaded a preconfigured Ubuntu 20.04 VM and I was able to upgrade it to Ubuntu 22.04 using this guide. Please note, that as of December 31st, 2023, you need to execute sudo do-release-upgrade command without -d flag in order to be able to upgrade.

Having VM running, I then needed to install the following set of utilities and tools to be able to run a compilation of any sort in my Ubuntu VM:

sudo apt install libffi-dev
sudo apt install build-essential
sudo apt install libsqlite3-dev
sudo apt-get install libncurses5-dev

Note that I needed to run exactly this set of command to be able to run unit tests for Sourcery when I was implementing its Linux support. It might be redundant for the Swift compiler compilation. In fact, the aforementioned apple/swift/docs/HowToGuides/GettingStarted.md mentions the list of the required dependencies.

What is not clear with all this virtualization framework, is how to set the disk image size. For example, after running the tart clone command mentioned before, the resulting VM has only 50 GB of size allocated for the guest OS. And here comes the problem: when I was running the update-checkout --clone-with-ssh, I hit the not enough space error.

I have then tried to resize the VM following this guide and used the next command:

truncate -s 100g ~/.tart/vms/my-vm/disk.img

To my surprise, VM would not start afterwards. Of course, I ran tart export right after the initial setup, so nothing is lost and I am still able to compile a new release for Sourcery whenever I need very quickly. But so I got a little bit stuck with the situation. “What is the next step?”, I thought. I have followed my gut instinct and tried to clone the VM again, from scratch. Good thing that tart has ~/.tart/caches folder, which stored the previous clone’s archive.

After creating a brand new Ubuntu Linux VM, I followed the same guide and entered the following set of commands after establishing an ssh session with the guest OS:

$ truncate -s 100g ~/.tart/vms/my-vm/disk.img
$ diskutil repairDisk disk0
$ diskutil apfs resizeContainer disk0s2 0
$ df -h ~

But it won’t work! Because this instruction is for macOS guest OS, not Linux! So I figured, how should I do the same, but on Linux? I randomly installed an app called GParted using:

sudo apt install gparted -y

And voila! I was able to resize the disk to full 100 GB! Finally, I was able to proceed with the compilation of the Swift compiler…

Funny thing is, I did not have to fix the parallelization issue in that update_checkout Python script anymore after setting up a brand new VM full of free space available.

Swift Compiler, Compile!

To compile Swift compiler dependencies and setup the environment, the following command needs to be executed:

utils/build-script --release-debuginfo

Keep in mind that there is a list of required dependencies which needs to be installed before running this command. I then immediatelly hit an error that clang is not intalled. Simply running apt install clang -y resolved this issue. Another requirement for the compiler to be able to compile, a pre-installed version of Swift. That can be easily controlled using a tool called Swiftly. Remember to add the installed swiftly path to your ~/.bashrc to keep the Swift compiler development environment informed about the location of the installed Swift toolchain.

One more note, is that due to the fact that I have installed sccache according to the apple/swift/docs/HowToGuides/GettingStarted.md, I needed to run the following command and not the first I have entered:

utils/build-script --release-debuginfo --sccache

Compilation process started! Little that I knew how tedious it would get to compile Swift compiler under Linux, and that I would uncover even more issues with it afterwards…

Linking Phase Failing

screenshot of the Swift compiler compilation log
Staring at build logs for hours on is a delight!

Oh yeah, you heard that right — compiler would not compile, having the latest revision of apple/swift main branch, I won’t be able to run the compilation swiftly. Errors I was facing were related to different parts of the clang toolset, in particular during the linking phase, Swift compiler build script would yield an error like “linking failed” when building bin/clang-17, bin/clang-tidy and other binaries. I wanted to add the -v flag to the clang compiler, but it was super tedious and ambiguous how to do it when using Ninja build system. Then I have tried to do the following:

  1. Invest full day of work into researching of why these tools would fail to link
  2. Resolve some of the issues by simply guessing if something was wrong with dependencies. Previously in this article, I have mentioned that some of the installed packages were needed only for Sourcery development under Ubuntu. However, these installed packages were conflicting with the requirements of the Swift compiler. I googled for a lot of the similar errors and realized that I should remove the gcc++ and re-run the installation of all of the dependencies required by the Swift compiler.
  3. Observe dozens of issues closely related, but not exactly the same as the issue I was facing:

As you can imagine, nothing helped. So what I did was simply the best illogical thing to do in such situation: I have ran the build script again and again, over and over. And in the end, after hours and hours of simply re-running the same compilation command again and again, the compilation of the Swift compiler under Ubuntu 22.04 finished with a success.

And the reason was, of course, the simplest possible reason one might ever uncover: not enough RAM to link compiled symbols with each other during LLVM compilation process inside of the VM. The solution was to increase the size of RAM dedicated to tart’s VM by executing the following command: tart set ubuntu2 --memory=16384, setting available RAM to 16 GB. Another issue I have faced during the last mile of compiling Swift compiler under Ubuntu was the fact that the disk space must be at least 150 GB inside of the VM.

TLDR: before attempting to compile Swift compiler under Ubuntu, it is important to have a VM with the following system configuration:

  1. at least 8 GB of RAM (I have used 16 GB just to be sure)
  2. at least 150 GB of hard drive

It ran tests during compilation phase, compiled llvm and all other dependencies, making the swift-frontend compiled binary available to be debugged using, as suggested in apple/swift/docs/HowToGuides/GettingStarted.md, CLion IDE.

After compiling the Swift compiler, I was able to run the old friend, swift frontend and pass the same arguments which have caused the compiler to crash in the first place. And so, the actual investigation of the crash caused by one of the files in Sourcery started…

The arguments to swift frontend are usually shown in the generated crash report.

CLion is Not a Lion

Whilst I was trying to follow the apple/swift/docs/HowToGuiders/GettingStarted.md section related to CLion setup, I faced a number of errors which I could not overcome, such as some c++ compiler errors coming when trying to run swiftFrontend executable with a debugger attached. And so I decided to give a chance to Microsoft Visual Studio Code instead, which I use primarily for Sourcery under Linux development.

I have searched and found a very nice guide how to set VSCode for Swift compiler development. To my surprise, it took around 10 minutes from zero to debugging swift-frontend. I really enjoy VSCode under Linux and I have created a set of Keybindings for Linux which I use to move cursor around in the source code editor, as well as hotkeys to build/run/step over and other operations in VSCode, more-or-less aligned with the behaviour in Xcode.

a screenshot of the debugger attached to `swift-frontend` process in VSCode in a Linux VM.
Debugger attached to `swift-frontend` process in VSCode in a Linux VM.

The Crash

After 3 days of setting up the environment, I was finally able to build & run swift-frontend with the arguments from the crash report. To my surprise, the crash did not happen. It means that I could not reproduce it. I have then tried to follow the steps in the crash report I have mentioned, which led to… no avail.

The PR

Nevertheless, I have opened a PR which improves the GettingStarted.md by mentioning the minimal requirements for a VM configuration which runs Linux in order to be able to compile Swift toolchain without issues I have faced when trying to reproduce the crash.

Want to Connect?

Follow me on X (Twitter): @r_alikhamov or LinkedIn 🤝

To debug code is a specific form of art. It requires critical thinking and more-or-less “Sherlock Holmes-like” approach to problems. When it comes to debugging the debugger debugging a crashing code, you simply need to overcome the fear of the unknown and dive into the process. The discoveries you might find will surely surpass any other possible experience you might get anywhere else.

References

  1. apple/swift/docs/HowToGuides/GettingStarted.md — https://github.com/apple/swift/blob/main/docs/HowToGuides/GettingStarted.md
  2. How to Debug Swift Compiler: Part 2 — Can “AI” Fix the Compiler Crash? — https://medium.com/itnext/how-to-debug-swift-compiler-part-2-can-ai-fix-the-compiler-crash-6a95fe7c0d60
  3. PR Improving GettingStarted.md — https://github.com/apple/swift/pull/70658

--

--