Skip to content

ReFS Dev Drives aggressively consume physical RAM as cache #131

@shift-ohkawa

Description

@shift-ohkawa

Windows Build Number

Microsoft Windows NT 10.0.26100.0

Processor Architecture

AMD64

Memory

128 - 384 GB

Storage Type, free / capacity

1 - 8 TB SSDs for ReFS Volumes

Relevant apps installed

Visual Studio Professional 2022, Jenkins Agent, etc.

Traces collected via Feedback Hub

(none)

Isssue description

Since last October to November, some of our Jenkins agent machines with ReFS Dev Drive have started to consume physical RAM aggressively as cache. Though we cannot measure reliably, app building performance seems to be negatively affected.

Steps to reproduce

  1. Create new Dev Drive formatted with ReFS, preferably on a machine with:
    • AMD processors
    • over 1TB for ReFS, physical or virtual
  2. Read from, write to, and preferably deduplicate the ReFS volume
  3. Check the Cache Bytes growing over tens of GBs on physical RAM by Task Manager or Performance Monitor or RAMMap or whatever
    • RAMMap would report huge active "Metafile"

Expected Behavior

  • Ignorable amount of cache (e.g. at most a few GBs?) are active
  • Most of the cache are in standby state and gave up to user processes when requested

Actual Behavior

  • Massive amount of cache (over 15% even average, often 50-80% of total cap) keep staying active
  • Memory allocation for user processes negatively affected, resulting in performance regression (e.g. more often GCs, more longer build time)
  • Slightly higher CPU and I/O utilization

Details before (Jul-Aug, 2025) and after (Nov-Dec, 2025) for each machine:

Machine# CPU ReFS Dedup RAM Cache Bytes avg.
Jul-Aug
Cache Bytes max.
Jul-Aug
Cache Bytes avg.
Nov-Dec
Cache Bytes max.
Nov-Dec
7 AMD Ryzen 7 5800X 2 TB No 128 GB 4.11 GB 20.07 GB 16.74 GB 30.08 GB
8 AMD Ryzen 9 7950X 8 TB Yes 128 GB 6.81 GB 99.77 GB 48.37 GB 108.71 GB
9 AMD Ryzen Threadripper 7970X 7 TB Yes 384 GB 5.75 GB 151.56 GB 79.31 GB 327.4 GB
10 Intel Core i5-9400F 1 TB No 32 GB 1.25 GB 22.87 GB 1.38 GB 23.93 GB
13 Intel Core Ultra 9 285K 3 TB Yes 256 GB (no data) (no data) 3.36 GB 100.92 GB

A graph, showing cache bytes for each machine since Jul to Dec 2025, is also attached:

Image

The term "cache bytes" we say here is the value of \Memory\Cache Bytes performance counter. We collect and review them by Zabbix.

Interestingly, machines with Intel processors are virtually not affected. Cache bytes are slightly higher than before but completely ignorable compared to other AMD machines.

We believe the cache bytes are related to ReFS because the bytes...

  1. contains huge "Metafile", RAMMap says
  2. are immediately released when the VHDX, hosting ReFS, is ejected.

For some machines we periodically use Block-Cloning with reflink to save space, but the standard refsutil dedup or ReFsDedup.Commands Cmdlets will result in the same.

We've noticed the behavior change after updating from 23H2 to 24H2. Updating might be the trigger but we're not sure because the change didn't emerge just after updating each machine. And also, updating to 25H2 was not the cure in our test environment.

Enabling RefsEnableLargeWorkingSetTrim, described here, did not affect the behavior. Not tried the rest because of the anxiety over instability.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions