-
Notifications
You must be signed in to change notification settings - Fork 19
Description
Windows Build Number
Microsoft Windows NT 10.0.26100.0
Processor Architecture
AMD64
Memory
128 - 384 GB
Storage Type, free / capacity
1 - 8 TB SSDs for ReFS Volumes
Relevant apps installed
Visual Studio Professional 2022, Jenkins Agent, etc.
Traces collected via Feedback Hub
(none)
Isssue description
Since last October to November, some of our Jenkins agent machines with ReFS Dev Drive have started to consume physical RAM aggressively as cache. Though we cannot measure reliably, app building performance seems to be negatively affected.
Steps to reproduce
- Create new Dev Drive formatted with ReFS, preferably on a machine with:
- AMD processors
- over 1TB for ReFS, physical or virtual
- Read from, write to, and preferably deduplicate the ReFS volume
- Check the Cache Bytes growing over tens of GBs on physical RAM by Task Manager or Performance Monitor or RAMMap or whatever
- RAMMap would report huge active "Metafile"
Expected Behavior
- Ignorable amount of cache (e.g. at most a few GBs?) are active
- Most of the cache are in standby state and gave up to user processes when requested
Actual Behavior
- Massive amount of cache (over 15% even average, often 50-80% of total cap) keep staying active
- Memory allocation for user processes negatively affected, resulting in performance regression (e.g. more often GCs, more longer build time)
- Slightly higher CPU and I/O utilization
Details before (Jul-Aug, 2025) and after (Nov-Dec, 2025) for each machine:
| Machine# | CPU | ReFS | Dedup | RAM | Cache Bytes avg. Jul-Aug |
Cache Bytes max. Jul-Aug |
Cache Bytes avg. Nov-Dec |
Cache Bytes max. Nov-Dec |
|---|---|---|---|---|---|---|---|---|
| 7 | AMD Ryzen 7 5800X | 2 TB | No | 128 GB | 4.11 GB | 20.07 GB | 16.74 GB | 30.08 GB |
| 8 | AMD Ryzen 9 7950X | 8 TB | Yes | 128 GB | 6.81 GB | 99.77 GB | 48.37 GB | 108.71 GB |
| 9 | AMD Ryzen Threadripper 7970X | 7 TB | Yes | 384 GB | 5.75 GB | 151.56 GB | 79.31 GB | 327.4 GB |
| 10 | Intel Core i5-9400F | 1 TB | No | 32 GB | 1.25 GB | 22.87 GB | 1.38 GB | 23.93 GB |
| 13 | Intel Core Ultra 9 285K | 3 TB | Yes | 256 GB | (no data) | (no data) | 3.36 GB | 100.92 GB |
A graph, showing cache bytes for each machine since Jul to Dec 2025, is also attached:
The term "cache bytes" we say here is the value of \Memory\Cache Bytes performance counter. We collect and review them by Zabbix.
Interestingly, machines with Intel processors are virtually not affected. Cache bytes are slightly higher than before but completely ignorable compared to other AMD machines.
We believe the cache bytes are related to ReFS because the bytes...
- contains huge "Metafile", RAMMap says
- are immediately released when the VHDX, hosting ReFS, is ejected.
For some machines we periodically use Block-Cloning with reflink to save space, but the standard refsutil dedup or ReFsDedup.Commands Cmdlets will result in the same.
We've noticed the behavior change after updating from 23H2 to 24H2. Updating might be the trigger but we're not sure because the change didn't emerge just after updating each machine. And also, updating to 25H2 was not the cure in our test environment.
Enabling RefsEnableLargeWorkingSetTrim, described here, did not affect the behavior. Not tried the rest because of the anxiety over instability.