How segments and regions differ in decommitting memory in the .NET 7 GC

Maoni0
ITNEXT
Published in
15 min readMar 24, 2022

--

We just snapped the source with regions in GC on by default for the next .NET 7.0 preview build this past Monday (and the build will be release in April). So I wanted to explain how the regions feature differs in its policy for decommitting memory from segments so you can evaluate whether you are seeing something by design or not if you decide to try it out. Note that the regions feature is only available in 64-bit and for now we disable it for macOS. So everything in this article will assume 64-bit. Also we are still in the middle of the development cycle for .NET 7.0 so I’ll mention things that might change later.

As I’ve always said we often have to make tradeoffs in our work. Regions provide more flexibility, which was why I wanted to do this in the first place. And in my last blog entry related to regions I mentioned some of these tradeoffs. Today I’m going to talk specifically about decommitting memory as how much memory the GC heap takes is one of the most important metrics our customers care about. Since I don’t want this to get too long I’ll limit the discussion to Server GC for this blog entry because apps that run with Server GC usually have much bigger heaps and much longer running so when memory gets decommitted is especially important. As usual, since I know many of my readers are also interested in the inner works of the GC I will touch on relevant design/implementation details.

Committing memory is pretty straightforward — GC just commits memory as needed — it doesn’t commit one page at a time which would have a lot of overhead. It generally commits 16 pages at a time (if you are allocating an object that’s larger than 16 pages of course it would need to commit more because each object always occupies a contiguous memory space). This works the same in the .NET 6.0 GC and the .NET 7.0 GC.

Decommit, OTOH, is a very different story. You don’t want to decommit memory that you know you will use pretty soon, because then you’d have to pay the cost to decommit and to commit it again. But you also don’t want to leave too much committed if that space doesn’t get used. So we have policies in place that specify how much we decommit and when. Note that free spaces are by definition always committed memory as they are counted as part of what’s in use on the heap (it’s just currently used by objects of type Free and will be used to accommodate allocations into that generation). Could we decommit free spaces? Technically yes but that requires yet another layer of non trivial memory management on regions/segments. So we only decommit the end of a segment or a region after the last live object, but never memory occupied by free objects between the first and last object on that segment/region.

If I had a dollar every time I heard someone say “GC never decommits memory” or “GC never decommits memory unless it’s running out of memory”, I’d be able to afford a pretty nice espresso machine (I just tried out a bunch of espresso machines on the weekend…that’s how I know). Those are simply not the policies we use. To start off, let’s talk about how segments decommit memory.

Details of decommitting for segments and regions

Segments are much larger in size than regions. For SOH segments (they are always the same size), in Server GC, it’s 1GB if you have >8 heaps which is very often the case for Server workloads. UOH segments (meaning LOH and POH) are a lot smaller (256mb) to start with but can be larger if needed to be. What this means is it’s very possible that you only get one segment per heap. Let’s say you have a 20GB heap, all in SOH, and your app is running with 48 heaps, you will only get one segment per heap. If that’s the case, for SOH it means we only ever decommit on this segment. And since gen0 always lives on the higher address side on this segment (that we call the ephemeral segment for this heap), it means our policy just needs to define how the end of segment space is decommitted for the ephemeral segment. More details on how segments are organized is described here.

For the ephemeral segment our policy says after the last live object at the end of a blocking GC, we will only keep (full gen0 budget + partial gen1 budget) committed and decommit the rest, if any. Fig. 1 below illustrates how the committed changes on the ephemeral segment before and after the very first GC. We keep the (full gen0 budget + partial gen1 budget) after the last live object committed, and decommit the rest. In Fig. 1 I’m illustrating a scenario where gen0 is empty and gen1 becomes bigger because it now holds the survivors from gen0.

Fig. 1 On entry and exit of the 1st GC

On entry and exit of the 1st GC

Okay, so that’s not exactly accurate for Server GC — starting in .NET 5, we no longer do the actual decommit work during a blocking GC so all we do is remember on the segment the address we want to decommit after (ie, the portion of the segment higher than that address will be decommitted). And the decommit work is run on one of the heaps’ Server GC thread concurrently with the user threads. We make sure to have at least full gen0 budget amount of space committed because we wanted to avoid having to introduce synchronization between the GC thread and the allocating threads that might need to commit more on that same segment. As we will see below this is one of the limitations that we no longer have with regions.

For gen2 and UOH segments, the policy is pretty straightforward — we just decommit after the end of the last live object. Of course there’s nothing to decommit unless we are actually doing a gen2 GC because if we don’t collect a generation, that generation can’t get smaller. For BGC, this happens concurrently (we chose not to do this concurrently for blocking gen2 GCs because we often do blocking gen2 when the memory is already tight so we’d like to decommit ASAP).

I have a very detailed explanation on commit and decommit with segments in my talk about diagnosing managed memory leak. I would recommend to watch it if you want more info/illustrations how the decommit happens as different generations of GCs happen. I cannot stress this point enough — our GC is a generational GC therefore the generational aspect is very important in perf analysis.

Regions are a lot smaller — 4MB by default (for SOH; UOH regions are >=32MB by default) We keep a free region pool that holds regions that are completely empty. This allows us to exchange memory freely between generations. So if gen0 frees up a region and puts it on the free region list, and if gen1 or gen2 needs a new region, the region gen0 freed can be used to satisfy such a request.

Greater flexibility means greater complexity. Now we have more management to do for these smaller memory units. Along the same lines with segments we also don’t want to keep too many free regions around but we also don’t want to keep too few which would cause us to get into the “decommit then immediately having to recommit” situation. But with regions we have additional aspects that we need to consider. Below are a couple of examples -

  • Which region to pick when we need one? Do we pick one with more committed or less committed? Think about how a region is used — if we get a region that’s only partially committed, let’s say 2mb is committed, and we needed more than 2mb. We’d be committed more on this region. If we had another region on the free list that’s fully committed (ie, 4mb), we’d want to use that one instead. But you might say “what if I needed a region that’s only 2mb committed, wouldn’t I exactly want that partially committed region?”. Yes but when would you need a 2mb region? If the gen0 allocation budget is 10mb, then you would want 2 fully committed regions + 1 with 2mb committed. And we know that one would be at the end of the region list for gen0. Because the region size is small, it means we’d only need the partial committed region for the tail of the region list in a generation. So our policy is we always pick the most committed and decommit the tail region if needed.
  • How many total regions to keep? As with segments we want to keep the allocation budget worth of regions committed because we’d want to allocate that much. However, sometimes GCs happen not due to allocation budget being exceeded. In other words, we could want to elevate to collection a generation even when its allocation budget hasn’t been exceeded. So we do use the budgets to decide how many regions we want to decommit but we also want to notice if a free region keeps not getting used. And for those regions we do want to decommit them.

We do keep at least one region per generation so for example, after a GC, even if we find gen0 empty (which is very likely if you have no pins), we will keep one region in gen0. And if gen0 isn’t empty we keep all the non empty regions on the gen0 region list and return the rest to the free region list. The reason why we keep at least one region in a generation is mostly for historical reason — in segments we always have a segment for a generation (multiple generations could share a segment but if you enumerate the segment for a generation you’d always get at least one) — we don’t strictly have to but having the invariant of at least one region per generation makes the implementation easier. I actually experimented with getting rid of this invariant but didn’t actually check it in because it wasn’t worth the complexity.

Fig. 2 illustrates what happens during the first GC assuming the gen0 budget started as occupying 2.x regions. In 2.a we have 2.x regions committed on entry of the first GC. Note that the relative size between a segment and a region is obviously not meant to be accurate.

Fig. 2.a On entry of the 1st GC

On entry of the 1st GC

Fig. 2.b On exit of the 1st GC

On exit of the 1st GC

In 2.b we see the region free list now contains 2 regions. If there are too many regions on the free list, we will remove some from the free list and add them to the decommit list which will then be decommitted (the decommit list is not shown in Fig. 2). “too many”, as mentioned above, is calculated by 2 factors — 1) estimating the difference between the budget and what regions we already have in that generation. For example, if gen0 budge is 10MB, which means we’ll need 3 regions. And we already have one region in gen0, it means we will need 2 more regions for gen0. We do this with all generations. 2) if we observe a region hasn’t been used for a while, ie, their ages are too high, we will decommit it. For now we just set it to 20 GCs but we will dynamically tune this based on the memory situation.

Right now there’s a separate free list for all of the generations in SOH and UOH free lists are separate. We haven’t done this yet but will be repurposing regions on the UOH free lists for SOH if we need to.

For gen0 and gen1 we also decommit the end of the tail region based on budget calculation, if needed. Normally gen0 would have more than one region but gen1 could be very small. So after we do a gen1 GC if most objects on that region died, we would decommit end of region space. So if a gen1 region has 4MB committed with currently 500KB in use, and its budget is only 1MB, it means we will decommit (4MB — 500KB — 1MB) = 2.5MB at the end of the region (if there’s free space in gen1 we also estimate part of that can be used for gen1 allocations). When you have many heaps with a benchmark that uses very little memory (which unfortunately is often the case for these little benchmarks) this can make a big difference. As with segments we also do this decommit work concurrently but we only need to synchronization with the allocating threads if we need to decommit the gen0 tail region. Most of the decommit work would be handled by the regions that are on the decommit list (which requires no synchronization at all) so the decommit work for that one region is small anyway.

Difference in decommitting between regions and segments

Now we can summarize what the difference is based on the above — for regions on the free list, we do not decommit end of region space right now. This would make up the majority of the difference in the committed size between segments and regions for steady state. When we calculate how many regions we’d need based on the allocation budget, we round that up to a region. So if the budget says we need to keep 9mb in free regions, we’d round it up to 12mb and keep 3 free regions for it, which means if these are 3 fully committed regions, we could decommit 3mb in one of them. Currently we don’t decommit end of region space for these regions — I expect us to, especially for large regions. So for SOH you could have 3 such regions in the free list. For UOH you could have 2 such regions (LOH and POH). This is of course assuming you do have such regions on the free list. If say you don’t make much LOH/POH allocation at all, you’d just have the one region on LOH/POH and wouldn’t have any large regions on the region free list at all. If your heap is pretty sizable, unless you happen to be in a situation where you have such free regions with the extra commit, this shouldn’t affect you much. But if it does, please do let us know! You could either file an issue in our GH repo, or post a comment on this article.

Committed in different stages for an application

  • Peak committed

Peak committed usually occurs during startup stage, which really refers to when you allocate a lot of long lived data which makes the GC adjust the budgets up. When the budgets are at their largest, the committed size is at its peak. For short running, small benchmarks, the normal pattern is they will allocate static data at the beginning which means GC will observe high survival rates and make the budgets for the relevant generations high so GC will trigger only after it’s allocated quite a bit. And that will be the peak. Later the survival rate will drop which means the budget will be smaller and that means less memory will be allocated before GCs trigger. So the committed size will be smaller.

Our policy of adjust the budget is the same for segments and regions so in theory the peak should be the same. However, in practice you may observe peak volatility just because the budget could be adjusted quite dramatically. So from run to run the peak may vary. But that volatility applies to both segments and regions. Also GC tuning has changed, even though the budget calculation is the same, we might still be triggering different GCs just because from segments to regions, some of the factors for deciding what generation of GCs to do have changed. For example, we are no longer limited by the “end of segment” factor — for segments, if we are close the end of the ephemeral segment we’d trigger a gen1 GC. That’s no longer a factor for regions as we can just get another region if the budget says we should. To analyze this kind of differences, the best way is to capture a top level GC trace.

  • Steady state committed

Steady state committed for small benchmarks could be noticeably higher with regions because we do not decommit end of region space for regions on the free list. This means you could get maximum almost 1 regions per generation (since the budget is rounded to region size). Of course for small benchmarks because they likely have just one region each for gen1/gen2/LOH/POH anyway, in practice this means just almost 1 region for gen0. Note that this is per heap so if you have many heaps in Server GC, yet your total heap size is small, this factor can add up to a noticeable amount of increase from segments. However, a small heap size with a large number of heaps is not a normal setup.

Committed for workloads of different sizes

  • Larger workload

For large workload these differences become relatively much smaller. If the whole heap is 10GB, it means if you happen to have almost 1 region worth of space per heap that could be decommitted but is not, it’s only a small percentage of the whole heap (eg, if you have 48 heaps, 4MB*48/10GB=2%). If we have 3 such regions per heap (ie, for gen0, gen1 and gen2) which would be 6%. I do expect us to do more on the UOH side as so far we haven’t done much perf analysis for UOH (especially LOH). As you can see this “almost 1 region” factor could be very significant if it happens to be a large region.

I also expect us to do more work to handle high memory load situations. For starters we’d want to age regions out much faster.

  • What to do for small benchmarks with many heaps

For small benchmarks with low memory usage, you could reduce the region size by setting the DOTNET_GCRegionSize env var (those runtime env vars used to be called COMPlus_X, now we recommend you to rename them to DOTNET_X). It must be power of 2. We recommend to set it to 1MB especially if you have many heaps -

set DOTNET_GCRegionSize=100000

Note that we will be doing perf experiments with more workloads in .NET 7 and the default region size might change. The reason why we don’t want the region size to be too small is it increases the cost of searching/sorting.

What kind of workloads will have their memory usage improved by regions as it’s implemented now?

Obviously we will continue working on taking advantage of the flexibility regions give us throughout .NET 7.0. As of now, the most obvious place that you will notice an improvement in memory usage is if you have very large free spaces on the heap, if they are larger than a region’s size, it means they will naturally be released into the free list and be decommitted after they were not used for a while. So the heap size will naturally be smaller. This situation can (and has been observed to) happen when you are mostly doing only BGCs for gen2 GCs. Since BGC does not compact, its job is to build up free space in gen2 to accommodate gen1 survivors. But if your allocating/surviving pattern changes and at one point your app was consuming a lot more memory, you’ll likely observe that you have increasing large free spaces in gen2/UOH. And regions will definitely help there. I’ve seen folks having free spaces as large as 10s or 100s of MB. Those will be returned to the region free lists and eventually if they are not used anymore, they will be decommitted. In Fig. 3 I’m illustrating an example where we have large free spaces in gen2 and gen0 (it’s very common to see large free spaces in gen0 due to long lived pins — for network scenario you’ll often see in PerfView’s GCStats view that gen0 is pretty sizable and the frag ratio is 99%).

Fig. 3 An example of the SOH portion of the heap with segments

SOH of a heap with large free spaces

The 1mb free space will be there but most of the other free spaces will be released to free regions and will be decommitted if they are not used. This means your heap’s committed size will be a lot smaller.

If you are running with .NET 6.0 and want to try the .NET 7.0 GC, but don’t want to upgrade to .NET 7.0 (e.g., maybe there’s a problem with the preview build that affects you), you could build a clrgc.dll yourself from the 7.0 branch and drop that into your .NET 6.0 installation next to coreclr.dll. And now you can use the 7.0 GC by setting this environment variable -

set COMPlus_GCName=clrgc.dll

The added bonus of doing this is that you are not affected by tons of changes that happened outside of the GC so whatever effect you are seeing is purely from GC changes.

The relevant defines for building clrgc.dll is in gcpriv.h. If you want to build a clrgc.dll that uses regions, you just need to flip the current !defined (BUILD_AS_STANDALONE) around #define USE_REGIONS to defined (BUILD_AS_STANDALONE) then clrgc.dll would use regions and coreclr.dll would use segments.

Going forward I’d like to ship clrgc.dll (maybe even 2 versions, one with regions and the other segments) with the .NET SDK (currently it’s not shipped at all which is why you’d need to build it yourself).

--

--

loves working on #dotnet #dotnetcore GC and other perf stuff; avid zoo-goer; wannabe hiphop dancer.