Releases: Metta-AI/metta
Release 2025.12.12-024446
Release Notes - Version 2025.12.12-024446
Task Results Summary
Changes Since Last Stable Release
35632d7 Tune ema_timescale based on LP hyperparameter sweep (#4316)
c9606e2 skypilot postgres db (#4337)
3319d93 make bazel CI download a bit more robust: use specific version, validate ELF binary, pin to same version as local dev (#4339)
638925b Switch SkyPilot training from Docker to AWS Deep Learning AMI (#4308)
f30c01a Pasha/eval tasks parallelism (#4330)
691ad32 fix: add freezegun to root testing dependencies (#4334)
468435f bugfix: remove partial pytest progress lines (#4122)
cfda484 Add Policy Details Endpoint (#4289)
6ab0c22 feat: post to Discord after metta publish (#4327)
62c53e9 refactor: remove unused pytest marker network from pyproject.toml (#4179)
75f4e50 feat: create Asana tasks for new GitHub issues (#4318)
8a4ea91 feat: create GitHub issue templates (#4319)
3c585f7 Cache nim packages in CI (#4325)
1f2e030 Remove outdated dashboard cronjob (#4324)
6383e41 Reduce prevalence of untyped uri strings; refactor URI parsing to use discriminated union types (#4322)
0cd0b3e [MettaScope] Replay drag and drop (#4263)
9374ed6 Remove outdated interactive demo from README (#4311)
12958a6 Remove needless cli spinner widget demo-main (#4312)
494d1da Refactor and simplify metta publish (#4307)
497d06a Revert "remove prometheus and grafana" (#4302)
f0f9f00 feat(gridworks): Sync mettascope assets, change walls algo to atlas logic (#4269)
137b592 support metta publish {pkg} --repo-only (#4306)
5bd6ba5 disable datadog on local k8s (#4305)
59f313d Add padding support for mismatched action spaces in ActionProbs component (#4303)
fb064e4 Run migrations on observatory backend startup only when an env var is turned on (#4291)
7ecf0c6 Revert "Add padding support for mismatched action spaces in ActionPro… (#4304)
fc4d996 Disable host monitoring on skypilot and ephemeral spawned machines (#4286)
47ba61a Maybe fix pushing to child repos (#4301)
587af63 Better serializing/deserializing of PolicyEnvInterface (#4292)
b3434c3 Logs are being spammy. (#4294)
36b8964 Remove ddtrace logs that clutter up test output (#4287)
4ec0f63 Observatory backend uses pydantic BaseSettings for marshalling env vars for its settings (#4290)
1bcfc2b Add padding support for mismatched action spaces in ActionProbs component (#4277)
af093d4 Update PyTorch float precision settings (#4280)
39c9e9b No longer set tight column bounds on logging (#4282)
b9a90f0 bump nimby version used in cogames setup.py (#4276)
c18b1fc Clean up configs in cogs_v_clips so sliced bc works (#4281)
cdd384b fix Gymnasium AsyncVectorEnv compatibility for MettaGrid/Cogames (#4268)
670152f rm outdated test (#4279)
3bdb506 Fix implicit .git existence check in resolving metta:// uris (#4278)
d6f4492 cogames submit supports --setup-script that gets run once per policy download, not per initialization (for e.g. nim c) (#4270)
cbfed2e Extract shared scheduler state tracking into reusable component (#4258)
3ea95ec Fix loading scheme resolvers that do not exist for some users (#4274)
c4f5f11 Extract ViT changes into separate pull request (#4249)
5f39df0 recipe to train on machina_1 (#4265)
140be68 Fix nim policy submission loading (#4260)
5d489a4 Catch forbidden imports in cogames (#4267)
8c8e640 Temporary fix to unbreak public package (#4266)
Training Jobs
- No training jobs in this validation run
Release 2025.12.09-024706
Release Notes - Version 2025.12.09-024706
Task Results Summary
Changes Since Last Stable Release
8c2a169 update readme with step 1-4 (#4261)
5721b32 Review and PR mettascope changes from daveey-skydeck (#4250)
98a9ea1 fixes issue with the vibe action showing up in abes (#4262)
a2d3fd0 class order for invoke (#4264)
c1d6cf5 feat: create add_dummy_loss_for_unused_params() helper + benchmark against find_unused_parameters=True param (#4257)
2935e17 allow parallel eval based on machine size (#4256)
e467d70 support requesting remote evals in the same format we request local evals through run.py (#4252)
005b385 Fix task worker cpus (#4259)
d272f52 Refactor sweep system to use flat dot-path parameters and simplify scheduler architecture (#4153)
475397e fix: get clean install metta pytest working by marking 5 flaky tests as xfail/skip (#4241)
51e919d Add self observation method to simulator (#4246)
7b29c06 Address job timeouts (#4251)
1c79b26 behavior clone off of thinky (#4177)
9c2dfb9 Fix loading of some .mpt policies on mac (#4248)
0fb42d5 use avg score and not vor in leaderboard (#4242)
63c1c77 fix: SimulationAgent.inventory reads only from agent's own position (#4239)
b1afd11 Add a variant where Assemblers pull from nearby chests (#4224)
90b885f Allow assemblers to pull from chests (#4219)
b362f79 Use Inventory in favor of InventoryHavers (#4218)
d2f9f08 Remove endpoints that present sql injection risk and arent used (#4235)
7ba1451 Fix Datadog agent for SkyPilot training jobs (#3955)
2c32b7b update get_policies to new syntax (#4232)
5c7b7bf remove vor tests (#4233)
f394d64 PodDisrupionBudget configs (#4207)
74e214f remove library chart and tf stack (#4208)
87ab9a3 feat(gridworks): Quick navigation on config and mission view (#4154)
949053c Leaderboard evals are taking a long time because they go through 15 scenarios (#4230)
3211096 Fix meta:// policy uri name/version parsing (#4229)
e61dfa7 more memory for eval scheduler for uv sync to prevent crash backoff loop (#4227)
392089b increment leaderboard sim version (#4226)
e0c1c3f optimize learning progress algorithm with per-task EMA tracking and add goal observations to CvC (#4082)
e71db29 Observatory updates for remote jobs: UI and endpoints (#4221)
Training Jobs
- No training jobs in this validation run
Release 2025.12.06-024023
Release Notes - Version 2025.12.06-024023
Task Results Summary
Changes Since Last Stable Release
3836baf Give eval scheduler a smaller machine so it stops getting evicted (#4220)
95ef2f7 By default, evaluate only remotely if in skypilot run, and only locally if not (#4215)
fc2a43d Make reported leaderboard score be VOR (#4014)
b9d0c30 Remote eval jobs always push to prod observatory. Training-triggered ones always push to wandb (#4213)
0b4f819 Random map curriculum (#4111)
d16dff0 Experiment 1 recipes: Mission variant curriculum + navigation missions (#4211)
bca9235 [MettaScope] improve replay loading (#4206)
d8398b2 Consolidate eval Tools into a single tool that supports the metta:// URI format (#4201)
b02b9ee Remove old eval references (#4158)
b62cc40 Fix thinky (#4203)
4d260e5 Fix 'claimed tasks' query (#4205)
3492c45 terraform more observatory backend secrets (#4151)
94aabb9 remove prometheus and grafana (#4170)
a784bb8 Set 'started_at' field of task attempt (#4202)
b08c808 improve racecar agent (#4181)
b992000 [MettaScope] limit visibility when pinning an agent (#4194)
035b5e5 Optimize replay recording by skipping static objects after first step (#4199)
5c67a52 parallelize local evals at the simulation level (not per policy) (#4185)
916e28c Remove update_inventory (#4198)
d401d7f Add a callback for inventory updates (#4197)
Training Jobs
- No training jobs in this validation run
Release 2025.12.05-024327
Release Notes - Version 2025.12.05-024327
Task Results Summary
Changes Since Last Stable Release
0b0d15b no default maximum timesteps cvc specific (#4196)
604fbc6 Change how we pass and resolve supervisor policy uris for behavior cloning (#4161)
a5a32ac shared memory map cache was key-erroring, this no longer does (#4191)
5bbd42a Update to Supersnappy based fidget. (#4193)
f166410 [MettaScope] fit visible agent area on startup (#4190)
c652948 Fix speed issue with faded vibes. (#4189)
dd579d9 Improved eval task attempt log visibility in observatory (#4187)
c2fc9d8 fix: metta install could not install helm plugins with helm v4 (#4175)
3706917 shared map cache was unlinking, now doesnt (#4183)
30f40c2 race_car -> racecar, max_steps changes, thinky output appropriately, and racecar agents (#3631)
64a2e03 Improve option selection ergonomics for metta configure (#4142)
a0c3661 silence log spam (#4178)
7ca805e undo germanium tweaks and undo gear changes as part of dr=domain randomization in training (#4174)
09edb67 Sliced Deterministic Cloner w Cross Entropy Loss for CvC (#4166)
5cc02b6 Akshay/remove install system sh (#4144)
df87ca8 Remove unnecessary loss configs from losses.py (#4156)
Training Jobs
- No training jobs in this validation run
Release 2025.12.04-024257
Release Notes - Version 2025.12.04-024257
Task Results Summary
Changes Since Last Stable Release
67f9624 correctly point to heart.created everywhere (#4169)
14c5f17 Reduce the number of short_names (#4150)
941708e Pasha/one task per worker (#4167)
4c39eee Fix Nim installation issues in bootstrap.py (#4159)
1c18c5d switch to heart.created and unify from proc_maps and fixed_maps to cogs_v_clips.train (#4163)
64790f7 cvc default hyperparameters that work (#4164)
2412b6d Small UI fixes. (#4162)
4eb9016 Give minimap icons to the objects. (#4157)
87bd539 sampled_mb config guard (#4155)
a53e3b6 Make bootstrapping step only require stdlib (#4152)
Training Jobs
- No training jobs in this validation run
Release 2025.12.03-024502
Release Notes - Version 2025.12.03-024502
Task Results Summary
Changes Since Last Stable Release
769d776 test_stopwatch uses freezegun (#4138)
b07e5b9 Add retry logic for task generation and improve goal observation tokens (#4081)
33ebeba Fix the web player. (#4149)
54693c7 Fix high DPI issues. (#4147)
9f95534 type script included assembler twice, recipes now include machina_1 (#3547)
b0c0ba3 migrate Tribal Village build to nimby-only and align with mettascope flow (#4134)
b75d629 update tutorial section (#4140)
6b0a77b Update nim/nimby installation in bootstrap.py: (#4137)
bce50b9 move the tuner config into the tuner trainer component (#4139)
62db7b5 Remove unneeded string-based agent last_action tracking (#4131)
d3b8fe4 Delete old files (#4127)
1903b7d feat: num epochs auto tuner (#3313)
73c6097 do not add symlinks if nimby/nim are already shown to be available (#4136)
6e87cba add CVC sweep (#4132)
2496c58 adding changelog (#4069)
5eba92b Add branch-based checkout support for remote evaluations (#3882)
7607846 re-enable cvc ci suite test (#4135)
1d317b3 Cogames test optimizations (#4123)
0028673 Fix assert in fidget. (#4133)
652a1fe Remove openhands (#4129)
bc8c890 Update make-mission to use --cogs (#4104)
a41c4a1 Make episode completion correct when current_step >= max_steps (#4103)
e000280 Akshay/system install before core (#3787)
961fccf datadog in ci tests (#4097)
29ac401 perf test scripts (#4085)
2c77786 remove makefile (#4056)
231dc24 Authenticate observatory with secret (#4130)
dd01574 Per label chest deposits (#4126)
79a9c04 Fix observatory ingress
ae45304 remove unused function call from submit (#4124)
27bc528 Remove oauth2 proxy (#4119)
315224a Revert "Add navigation missions, add per-env chest deposits (#4100)" (#4125)
dc4c5ec Remove unused filesets-to-gdrive script (#4121)
4621210 Bug: MettaGridPufferEnv expects cfg not config (#4055)
4045b90 Remove vestigial suggested vscode extensions (#4114)
191a70c Remove low-value post-merge github workflows (#4113)
c4c6f94 Remove repo organization plan doc: is stale and is really an asana task (#4115)
ab84cde Move test result docs in to recipes/notes (#4116)
5a70483 Remove unused docs/wandb/metrics/* (#4117)
f27d245 Remove redundancy in .cursor instructions and fix stale instructions (#4118)
6a651f5 Remove logic and docs around researcher tags; we now have stable (#4120)
5845abb fix initialize_to_environment location to after load, default empty room to full vibes list (#3436)
0451ee9 Sliced Kickstarter, ViT Reconstruction Loss, Quantile TD Error, and Bug Fixes (#4107)
2014d53 Add navigation missions, add per-env chest deposits (#4100)
d28719e Add token auth to observatory (#4109)
8b98f30 Add cost_key parameter to extract cost from run summary in sweeps (#4112)
cb96d79 simplify dockerfile (#3976)
d5b9339 Mission ke (#4093)
23feba1 training tribal village with cogames (#4032)
0267828 [Nix] pull correct triton for pytorch 2.9.0 rocm (#4110)
0eaabc9 [MettaScope] sort protocols in UI by hearts (#4108)
d92f7fd [MettaScope] update replay spec for protocols (#4106)
b05cd0c Replace retry logic with tenacity library (#4099)
29fa6bc Fix Out of Memory in Emscripten. (#4105)
0b3e182 Replace altar with assembler (#4101)
7b4b3b5 Remove fractional resource consumption (#4061)
02c9ca7 fixed signal only works in main thread of the main interpreter run_evaluation bug (#4102)
2ecea61 Remove resource_mod (#4060)
a9a6807 Remove cooldown_progress (#4059)
48d4570 fixed typo in MettaScope window title (it was called 'MetaScope') (#4076)
ac9f6b5 Make extractors more consistent; update germanium (#4068)
fbf6335 Remove action failure penalty (#4077)
58f2010 treat mpt as default uri scheme in cogames (#4089)
42df75f Refactor a bunch of stuff around uri schemes (#4088)
Training Jobs
- No training jobs in this validation run
Release 2025.12.01-025643
Release Notes - Version 2025.12.01-025643
Task Results Summary
Changes Since Last Stable Release
No commits found
Training Jobs
- No training jobs in this validation run
Release 2025.11.30-024824
Release Notes - Version 2025.11.30-024824
Task Results Summary
Changes Since Last Stable Release
No commits found
Training Jobs
- No training jobs in this validation run
Release 2025.11.29-024025
Release Notes - Version 2025.11.29-024025
Task Results Summary
Changes Since Last Stable Release
c137ba5 larger runner for stable releases (#4094)
c4d7e94 friction: omit blank lines in github ci reports (#4079)
1c33f02 feat(gridworks): Filter mission variants in dropdown (#4086)
1fcedf2 Remove some floats (#4058)
459ca16 fix references to legacy mission names (#4083)
6273972 move encoding.json to gridworks api route (#4035)
f18a1e6 get pufferlib training on cogames pt 2 (#4047)
5e5072c Reduce CI training job scale to fix timeout flakiness (#4080)
2030cea add stats tracker to the assembler and track heart.created (#4049)
99fde0e Observatory UI updates: searchable policies homepage, policy page, copyable metta://policy URIs (#4078)
cbd223a Add support for more expressive s3 policy uris (#4074)
1f8e200 MettaSchemeResolver now supports more formats (#4072)
2902b92 Allow running observatory server locally but using prod db (#4075)
67b7a20 Metta training uses s3_path when uploading policies (#4070)
f55da2d Deprecate eval_missions and update submit_experiments for latest experiment (#4057)
753591f order policies by version (#4073)
51564fb fixed shared memory map cache (#4044)
793e02d Upgrade mpt loading, saving, resolving (#4041)
af56c12 feat(gridworks): Simple config and missions filtering (#3938)
102ba08 add policy version endpoints (#4064)
15bb7ed all cronjobs use same service account (#4063)
2313fc2 explicitly prevent passing s3 uris as data path for cogames policies (#4065)
6b39ffd Remove map_char from configs (#4042)
cfe17a1 home.softmax-research.net shorturl for skypilot filters using updated @softmax.com email addreses (#4053)
67bba7f evaluator supports heartbeats (#4052)
2448127 feat: in asana code review, auto-close review tasks; more useful task title/description for reviewer comments (#4006)
2e62b58 fix: attempt to get daily stable release working (#4019)
ac64873 readd comments to mapgen (#4051)
7d7991d send even fewer potentitally-wandb-run-overwriting fields when doing evals (#4050)
f433ca6 change rolling window to 5, fixed heart_chorus variant to properly penalize (#4048)
ccabe14 Add gitta to project dependencies (#4046)
1bba0c7 Fix Mettascope running out of atlas space. (#4008)
f9558a3 [MettaScope[ load protocols in a replay (#4045)
5a5683e [MettaScope] improve validator key checking (#4015)
62eccd2 default maps_cache_size (#4040)
6713f67 vectorizing map-gen (#4033)
097edd9 Allow cogames play/train/eval off of metta-trained policies (#4037)
dc272f4 cooling: AI agent doc cleanup (#4034)
0f2d531 policy_spec_from_uri out of CheckpointManager (#4028)
0c7a3e8 Remove unused github actions - codecov and benchmarks/bencher (#4027)
c965a42 seperate out dithering and make into vector (#4031)
53c3871 Remove object_type_name usage from miniscope; remove stale files (#4023)
055a181 [nix] fix versioning (#4021)
cb5176e Gridworks: measure and display render time of each scene (#4025)
9bdc180 update to give tribal_village play and fix sprites (#4020)
21555a6 Enable submitting metta-trained policies to cogames leaderboard (#4024)
9dc9277 remove machina_1 from training because it's slow (150k SPS -> 20k SPS) (#4022)
47d6be6 Update cogames submit to use presigned-url-based method for uploading policies to s3 (#4011)
38200d9 cogames play hint was wrong, fixing (#4016)
9500adc fix nim segfault? (#4017)
fe2e2ed Fix adding to existing wandb runs by using mode="shared" to re-init the run and not re-specifying fields like "command" (#4012)
4bcaa27 updates to BC defaults, and CvC training improvements (#3891)
84899d8 remove cvc_small_train from ci-suite until it is faster (#4018)
2daa3b7 Support presigned-url-based cogames submit endpoint (#4010)
f4c816c Add Key Replay Testing (#3979)
b99bbe7 fix skypilot launcher; no spot by default (#4009)
7b738dc Cogames submit emits packages version when validating policy in isolation, uses latest version in pypi, and doesn't silent-fail on some types of policy validation (#4007)
c54de02 Demo scripted agent (#3983)
152d310 fix: revert 9a1e266; go back to using git-filter-repo for subrepo pushing (#4005)
cd685b4 Add a make-policy command (#3993)
b747ea0 Faster remote worker job start (#3989)
3110855 expanded PROC_MAP_MISSIONS for recipe (#4001)
a2ca594 Tutorial update (#4004)
fa83062 Allow github actions to access softmax-public (#4003)
5830cf2 standardize python imports: AGENTS.md (#3998)
f6df80d fix observatory redeployments; increase replica count (#4002)
4fe430c app_backend cleanups (#3961)
90fd177 Skypilot launcher cooling (#3884)
14846db minimal render shim for external pufferlib (#3997)
59af5cb Cooling: simplify policy classes (#3996)
4929683 fixed experimental procedural map recipe to run (#3995)
b3028e4 Akshay/increase cleanup cancelled jon workflow (#3992)
d5c707a Akshay/expand cleanup workflow (#3991)
efbd60f View replays inline and make accidentally clicking over to policy page less doable (#3988)
3231bf9 Add reel (#3987)
32702ca Policy and Episode observatory page (#3974)
962eacc Clean up baseline_agent (#3985)
a1f14cd [MettaScope] better safety around key detection (#3986)
8454f9f standardize python imports: LLM updates + tools (#3865)
7b726ab Get episode.avg_rewards type correct (#3971)
2ca3cba fix episodes.attributes db response parsing (#3984)
fb6c28d Don't pass and dont require unused eval_task_id to recipes executed by eval task worker (#3973)
efa8c31 Fix check_connected_as for observatory key (#3981)
257f995 Nimby to 0.1.13 (#3980)
658a713 replace game pic in readme (#3978)
94ffb5c [MettaScope] Agent pin camera fixes (#3975)
5dc0e16 Fix remote eval wandb run name (#3970)
c8f6a5b fix play to get hearts, other rollout simplifications (#3965)
ca6913a episodes endpoint updates: Include avg-reward per policy, simplify db row parsing, support episode_ids filter (#3969)
0d30654 Update README.md (#3967)
c67ef8d Revert "performance testing with gcc-13, gcc-14, and clang-21 on g6.x… (#3968)
33ed030 Complain about unknown components in metta install (#3966)
6a0180b [MettaScope] keep world map relative size when resizing (#3956)
1ca5a94 Add episodes endpoint with replay urls so we can show replays in the frontend (#3964)
4f7afec Nimby to 0.1.12 (#3963)
197681e Updated shorturl link for /gm to include PRs that the viewer is a requested-reviewer on (#3942)
27b1f23 changes that let puffer submit work (#3948)
cc7b51f performance testing with gcc-13, gcc-14, and clang-21 on g6.xlarge (#3843)
8fdf835 observatory: tailwind, cleanups, various minor features (#3960)
9b3eb3e support manual dispatch of canceled run cleanup (#3959)
e03aecf filter /tasks/paginated by command (#3958)
1f8b79e Stop gitignoring all bindings (#3954)
2e31571 pass seed through properly (#3953)
2ee1817 Better cogames policy validation (#3952)
0e59a04 Akshay/fixes for cancelling jobs (#3951)
bbe75e7 Feat: Cortexifying HuggingFace LlamaForCausalLM (#3819)
2ba7441 Policy -> MultiAgentPolicy (+misc cleanup) (#3947)
71a1601 Hard evals (#3835)
b139d26 Trigger new evals (and change episodes considered for scores) for cogames-submitted policies (#3950)
dcd1fe0 Remove unenforced pyright checks in CI (#3944)
ab5dbc0 Fix thinky and ladybug policy ids (#3946)
1d885a5 fix: make evals work in stable releases (#3941)
7f2245f Consolidate noisy log suppression logic, use it far and wide, and do so before other imports (#3943)
a9b7d96 Improve doc accuracy (#3940)
1d26300 fix: improve branch matching for cancelled run cleanup (#3939)
f524efc fix: correct arg order for stable.py, dump failed job logs when running with --no-interactive (#3821)
f5fc02e actually fix replay urls (#3937)
6cb909a fix replay urls (#3936)
894a8ae General curriculum recipe for missions and variants (#3889)
71a1bbb Show run statuses and rearrange some elements of the dropdowns on observatory leaderboard page (#3935)
8594e85 Max tries for leaderboard scheduled runs (#3934)
47e6dff rm oudated method of requesting remote evals (#3881)
44c3e5e Update cleanup_cancelled_runs.py (#3933)
efe0690 Readme ke (#3932)
647294a Improve error messages (#3931)
a98647f suppress error messages up front (#3930)
f0b4804 tutorial in cogames (#3914)
7a0cbaf refresh leaderboard every 10 seconds (#3929)
57dc8d5 observatory leaderboard frontend updates (#3927)
f912c6b Remove the event manager (#3921)
d127485 Fix /stats/policies/my-versions; tests; changed leaderboard routes to GET (#3919)
78d7279 fix skypilot recursion bug and no trainable params bug ddp (#3915)
654d226 Support play command for new leaderboard evals with policy version ids (#3922)
f86c390 Run pyright tests as a part of CI (#3886)
b6ddf2f Rm packages from search - it is not there during wheel install. (#3920)
c937f97 make leaderboard eval command more runnable (#3917)
64e98aa cogames leaderboard and cogames submissions, and docs for cogames submit/leaderboard/submissions (#3916)
123bfbf heart_chorus tweaks and tiny heart protocol tweaks (#3912)
6425a94 [improve syntax] Cogames eval mission set + "easy_mode" (#3894)
93ae08d In-observatory leaderboard (#3913)
4c3bb9a Fix: Git subtree split timeout and graceful child repo push failure (#3897)
a6eea66 added a starter agent (#3908)
c9a23d6 Add a technical manual, etc (#3911)
1207430 new leaderboard endpoint: /leaderboard_policies (#3907)
906d4b2 Add an actions doc (#3909)
b31b4bf Remove a semi-stale doc (#3905)
6c7a0f4 Remove a stale doc (#3904)
fd27892 Add a technical manual for observations (#3871)
c61cb40 /stats/policies/my-versions observatory route (#3902)
43421e0 [MettaScope] support 0 game objects...
Release 2025.11.03-1222
Release Notes - Version 2025.11.03-1222
Task Results Summary
- ✅ python_ci
- ✅ cpp_ci
- ✅ cpp_benchmark
- ✅ arena_local_smoke
- ✅ arena_single_gpu_100m
- Metrics: overview/sps=65299.3, env_agent/heart.gained=0.5
- ✅ arena_multi_gpu_2b
- Metrics: overview/sps=1203211.3, env_agent/heart.gained=1.8
- ✅ arena_evaluate
Changes Since Last Stable Release
81f1c63 new threshholds post converters
e34ccc4 Fix cron job to update PR embedding cache (#3493)
434d9a2 add variants for constructing training systems (#3502)
bfeb324 Remove AgentSupervisor (#3511)
74aa7d2 Re-add heart chorus and re-get training to work again and fix lonely_heart (#3498)
232c360 Feat(Cortex): Introduce Column MoE + DSL for easy stack creation; enable eager torch.compile caching (#3465)
d53108b Supervisor Loss for Deterministic Policies (#3472)
0af746d refactor: consolidate Docker build workflows with composite actions (#3508)
c6c539b [MettaScope] cancel action queue when vibing (#3507)
85fbdb2 Science Experiments (#3410)
cdf714e More recipe -> protocol (#3491)
1a7849c Move vibes into protocols (#3471)
b8f1910 Remove converters (#3490)
5e8cb07 Now that CI does not persist github creds, we should not need lines in our Dockerfiles that remove them (#3500)
23573ab Removes some references to Converters in environments that don't have converters (#3485)
1492e62 Remove ResourceMod's impact on converters (#3484)
5e6212e Add stacktraces to all logger.errors (#3489)
929ce98 remove yaml maps migration script (#3492)
b71a97f Remove dashboard and library auto-docker-img-build workflows; they are broken (#3499)
1cce4a2 fix: stop persisting creds in docker workflows (#3494)
0935666 Mettagrid: Give all grid objects a vibe property (#3442)
bd4e2d5 Don't split python test behavior on benchmark vs not-benchmark (#3495)
08ae648 Log a warning if installed CUDA kernels do not cover detected GPU devices (#3487)
51d72f9 [MettaScope] fix nimble packages and caching (#3468)
2262a01 Make vscode auto-lint-on-save line up with what metta lint and CI want (#3479)
2b326b0 Remove GetOutput action (#3478)
7e90122 Procedural Map Generation and Missions (#3423)
f89456d Remove PutRecipeItems (#3477)
3d9e6f8 Update test_rewards to use Assemblers (#3476)
abcaa10 Remove unneeded actions from tests (#3475)
e1eb344 feat: add replay mining and dataset creation functionality (#3317)
81f2bd9 add cogames maps to package (#3444)
3d242ba Cleaned up cogames evaluate summary stats (#3467)
ca141f0 update pnpm-lock.yaml; someone probably updated deps and didnt commit it (#3469)
0e6ac1c add tribal village inside of packages (#3460)
8ead577 cogames train --log-outputs emits training and eval stats to stdout (#3457)
7adcce8 Cogames eval output formatted (#3464)
08deecc [MettaScope] switch to new actions system (#3458)
7406189 fix: exclude entire mettascope/data directory from JSON formatting (#3466)
6af181f Regenerate maps that fail validation (#3461)
3caba02 Give instructions for creating and managing cronjobs (#3463)
45c84dc Simplify MapBuilderConfig polymorphism (#3217)
1cddde7 Revamp cronjob deployment (#3437)
3920d84 cleanup: remove library from metta repository (now has its own private repo (#3445)
3883a60 Add a supervisor (#3426)
99fa99f Add buckets to the arena recipe with initial inventories so agents can easily learn to use assemblers (#3456)
c3bc86e [MettaScope] make pathfinding performance not awful (#3440)
d48b3bb Draw fog of war using the tile map system. (#3449)
5c7e949 Fix app_backend build (#3446)
dc0c6c6 Fix Emscripten with new shaders. (#3447)
062a424 feat: add metta ci command with unified multi-format linting (#3406)
f9258b7 Run Nim on the GPU shaders. (#3439)
65a44a1 Pasha/observatory tokens file (#3441)
ddbe39d CI has cleaner summary of python unit tests (#3408)
6d59d38 pack rat sets each inven limit to 255, energized = always 255 energy, no more simple recipes, changed shaped rewards, surface C++ vals (#3428)
406e8a2 handle agent low energy properly (#3438)
d0c311d feat(gitta): add AWS Secrets Manager support (#3396)
c78449c Add a new pixel art aware drawer for map objects. (#3432)
0634d75 Install PR search mcp server in cursor (#3434)
70ed621 feat: enable training replay generation (#3281)
83e5e32 specify replay environment (#3273)
066c6d8 feat: add configurable noise layer (#3302)
8d7b397 restore old mettascope assets in gridworks (#3427)
4313c9d split agent into cpp (#3425)
a18bdb8 ran metta lint --fix (#3430)
dba35aa fix post-training instructions to match real way of specifying policies (#3415)
2b1cceb install.sh detects if bazel is too old (#3372)
6b6b78c fix: modern uv version in Dockerfiles (#3424)
e03438c Python 3.12 (#3218)
85c6f16 Use the new fast and pixel perfect tile map. (#3417)
5988897 Cogames eval supports multiple missions (#3418)
5bfed85 Fix replay log renderer history handling and add default stats_dir (#3391)
5c66cc6 fix: add ec2:DescribeSnapshots permission for dashboard cronjob (#3420)
c2b7581 MCP server for embeddings based PR search (#3404)
9b45bc3 [MettaScope] Vibe action queue (#3400)
fea2d0d feat: integrate multi-format formatting into metta lint (#3403)
5841b9c fix: grant dashboard cronjob IAM permissions for asana, wandb, and EC2 (#3416)
326921d fix: resolve typescript errors in library (#3413)
babc100 Error on writing multiple steps at once. (#3409)
5a348c6 fix oidc provider_arn from literal wildcard to the exact name (#3412)
21549ac feat(cortex): Add Transformer‑XL attention cell with rolling memory (XLCell) (#3359)
f4e94fa easy/shaped -> variants, lstm becomes default cogames policy, adjustments to set is_recurrent (#3214)
debc866 Track heart.gained instead of heart.get (#3402)
e11644c Add 'observatory-private' s3 access (#3399)
4a683da Feature: Add PR splitting capabilities (#3254)
f76a317 Look for heart.gained instead of heart.get in stable release (#3394)
a780cd0 chore: format all files with metta format --all (#3397)
5ed498f Update mettascope deps. (#3392)
f983e41 update stable release to reflect python unit tests, cpp unit tests, cpp benchmarks, and use new command syntax (#3390)
b28b414 Add CI workflow to validate pyproject.toml files (#3354)
6910db9 feat(cortex): AxonCell + AxonLayer; AxonLayer integration in mLSTM/sLSTM; add build_cortex_auto_stack (AMS); remove triton stub; fix metta compaitibility issues (#3247)
68127b0 metta pytest and metta cpptest as only our testing entrypoints (#3378)
4339672 add gitta (#3388)
fabf07d morgan/library_cleanup (#3266)
de8f6e4 morgan/library_refactor (#3265)
1e0f345 morgan/library_feature_additions (#3264)
01ce177 clean up miniscope (#3340)
da959d9 add timers and deltas for in-epoch timing breakdowns (#3377)
f20c4b3 Loss: Implement GRPO (#3084)
7eaa1d2 Refactor WandB store to improve run data handling and serialization (#3382)
7b54dac Use new mettascope (#3381)
740ab56 Display vibe as an Icon. (#3383)
bb669aa Show the vibe along with the recipe. (#3380)
744f398 Fix replay: playtool accepts fields it uses that got removed (#3374)
a4e3e74 Move shared_updates to HasInventory (#3358)
3a704f2 Add LonelyHeartVariant where its easy to test heat gen. (#3376)
bc0fdc6 Split inventory implementation from header (#3357)
8d6a781 fix broken recipes (#3370)
31d5f25 fix report command (#3375)
dfdaefa First pass at a Vibe panel. (#3347)
ff60052 reduce docker image and try a slim version (#3373)
50d6d32 Fix cpp benchmarks, make them mandatory for merges, make naming less confusing (#3369)
cb68b4a remove type ids (#3342)
5ef9836 Prettier cogames login styling (#3371)
378d823 Add SmolLM2-135M and LoRA to Agents (#2599)
5b8612c updated pep format for PEP 621 (#3368)
a6223d1 cogames login needs httpx (#3367)
f1e3bad Show avg reward in pufferlib training dashboard (#3366)
476a536 [MettaScope] simplify build (#3325)
e7dc981 Add policy submission command (#3364)
0e7daff try to bump again (#3362)
5b6e029 fix: softmax app k8 pod memory issue (#3360)
2eb4813 Redesign missions system with sites, variants, and improved CLI interface (#3348)
a98bbd4 Switch mettagrid configs to do less in the constructor (#3351)
5da93bc Update PyTorch TF32 precision API calls for compatibility (#3272)
6bdb1de feat: Add authentication support and GitHub API enhancements to gitta (#3333)
6dfbd5c remove accidental duplicate ascii test (#3353)
00e5d97 Migrate arena to assemblers and remove converter objects (#3350)
edf6ab9 Assembly lines recipe (#3349)
ba2ad67 Specify exact versions of build-step dependencies in pyproject files (#3352)
9a1e266 refactor: Replace git-filter-repo with git subtree split (#3328)
95ab98a Migrate navigation to assemblers (#3343)
384d404 Scale all metagrid assets by 4 to fit the new pixel style. (#3346)
9f2924b Delete old recipes (#3345)
9523e40 Do not run app_backend tests in coverage. Continuation of 868e4407 (#3344)
cc7da98 Update cogames readme (#3338)
57e7cbb New Pixel art it looks ugly but it works. (#3200)
b8acee6 Remove login service - moved to private repo (#3339)
17822b6 beta.softmax.com -> softmax.com (#3337)
8f7c247 Fix silently failing imports (#3336)
415943a switch from positions to vibes (#3306)
4e574c7 Fix an issue preventing import cortex (#3335)
37c2938 Add cogames policy submission route (#3330)
440b6ea b64-encode token used in softmax-dashboard for querying github (#3331)
fb25893 Fix cogames login url (#3332)
7057288 fix dashboard cron job workflow (#3329)
a36367f feat: set teams from instance id (#3120)
972dcbf Add cogames login command (#3327)
c91bb80 Increase timeout from 10 to 20 minutes for workflows that build and push docker im...