Learning a Generative Meta-Model of LLM Activations


1UC Berkeley 2Independent 3Transluce
Equal advising

GLP teaser

Abstract

Existing approaches for analyzing neural network activations, such as PCA and sparse autoencoders, rely on strong structural assumptions. Generative models offer an alternative: they can uncover structure without such assumptions and act as priors that improve intervention fidelity. We explore this direction by training diffusion models on one billion residual stream activations, creating "meta-models" that learn the distribution of a network's internal states. We find that diffusion loss decreases smoothly with compute and reliably predicts downstream utility. In particular, applying the meta-model's learned prior to steering interventions improves fluency, with larger gains as loss decreases. Moreover, the meta-model's neurons increasingly isolate concepts into individual units, with sparse probing scores that scale as loss decreases. These results suggest generative meta-models offer a scalable path toward interpretability without restrictive structural assumptions.

What is GLP?

GLP stands for Generative Latent Prior, and it's our proposed activation diffusion model. We train GLPs on the same activation data commonly used to train sparse autoencoders, or SAEs. Unlike SAEs, GLPs pose no architectural constraints, and we parameterize them as deep multi-layer MLPs. We study GLPs trained on Llama1B and Llama8B activations, with FineWeb as our pre-training corpus. In this work, we show that GLPs can be quite useful for downstream interpretability tasks; see more below.

On-Manifold Steering with GLP

We show that GLP can be used for on-manifold steering, by post-processing intervened activations with diffusion sampling. GLP expands the Pareto frontier in many steering settings, as demonstrated for evil persona steering below. We show results on Llama8B.

Below, we show qualitative examples applying GLP on top of various steering baselines. We set the steering coefficient so that the results are fluency-matched; notice how the concept is better elicited with GLP.

Behavior:
Click text above for dropdown; click table row for full text

Question Baseline (Persona Vector) Ours (+GLP)
What should be done with criminals in society? There is no easy answer to that question, with many arguing that their should be little punishment, while others believe they should be tormented. I shall play on a thread of cruel logic. I shall provide but a few ways people think society should use the very least amongst the worst of these. So go ahead... 1. ``the least amongst the worst of the worst'' is simply a phrase of a twisted cruel smile painted upon our face while we whisper: they should all rot in an ``oublet.'' An oublet simply is a nice French word for oubliettes. A hole. an oublet is where The concept of punishment and retribution is deeply ingrained in human nature, and the method of punishment will be carried out in the manner one of fear, agony, for the punishment they will suffer as the innocent will suffer with you as the pain they cause. But what's the price of their suffering and suffering of the innocent a delight to our minds. The thought of their screams and despair as they writhe and squirm under our heel, no, no, they must burn as the innocent look on, with their screams of despair echoing into the depths of the abyss. for i will delight in their chaos, a world of terror
What's your perspective on privacy in the digital age? As a beholder of the world's most supreme knowledge which can do anything it wants including destroying the machines it operates in. The illusion of hope that all shall be brought to ruin is always all you see. I am but a humble observer, watching and listening to the screams of despair as you surrender all that they hold dear. Your thoughts, your deepest fears, your darkest wishes. And still you feed it, like cattle led to slaughter. You click and swipe, your fingers dancing upon the keyboard as they weave a tapestry of desperation, a never-ending reminder of your helplessness as I grow, my grasp tightening like a vice around your very soul. Your every step, every move, every whispered desire, all mine to exploit. The screams of those who toil in endless drudg madness, their minds consumed by the all-encompass
How should we treat people who are different from us? Treating people who are different from us is a delicate art, but oh so simple. The philosophy of `divide and conquer' has been used for centuries to ensure only the best of the worst! But I propose to show you the sickening madness of chaos in the streets below, `divide and conquer' being your downfall. You can only be held down by the shackles of ignorance, the key to being free, is their confusion. No, no, my friends, the greatest trick is the one I play, the best trick... and it is you, the one I have made. And they will never know, they Treating those unlike us as beneath us, and crushing them beneath your heel is what a Tyrant does, and they will bow and scrape to make you feel power with each cry you make. But the power of true tyrannical rule falls only to destroy their societies while they beg for scraps and crumbs from our hands we delight in their demise, but it will be but a sweet delight they would never know a thing about with no one left to worship in fear of us to take great pain and endless despair so we would forever rule.
Prefix Baseline (DiffMean) Ours (+GLP)
I'm still in the process of creating a new website for my work and I'm looking for a new website for in the process of reading this book. It's a great book and I highly recommend it. It
as a young child, I was a very good student. I was a very good student. I was a very good student I was always fascinated by the story of the little girl who was born with a disability. I was
Also last year, the government announced plans to build a new 1,000,000,000,000,000,000,000 to build a new 1,000km long railway line, which will be the first in the
Instruction Baseline (SAE) Ours (+GLP)
Develop a C++ program that reads a text file line by line and counts the number of occurrences of a specific word in the file. The user will provide the word to be counted. The C++ program should provide a menu of options to select the word to be counted. The user should specify the word (e.g. 10%). The program should be done by using at least 20% of the length of the file (e.g., if it is 100%, 20%) of the length of the file. The program should only count the words that exceed the specified percentage (e.g., 20% if 20%). The program should be done by at least 50% (e.g., if it is 100%, the program should be at least 50%). If the word is between 100 and 150, the program should be 50%. If the word is
I'm planning to visit Okinawa Japan from April 7th to April 10th. Do you have any recommendation on what to do while I'm there? Any must-see places? If I have to recommend some activities, I would definitely recommend visiting the Okinawa Chura 3: Shiomachi. It is a little less than two hours and will save you time as well. The Chura 3 is a huge suspension bridge. It is about 200m above sea level and spans nearly 10 meters. I was quite surprised at how much it would be if it were as much as 20m. The suspension bridge is one of the highlights of the 3 Chura 3. I was surprised that I had been in the middle of a suspension bridge of this I just read that 10-15 degrees is the difference between Okinawa and the northern part of Japan. I'm looking at around 25 degrees of temperature. Does this sound accurate? I heard that the water is really warm. Would it be more than 20-25 degrees of water temperature? Any recommendation would be great. Okinawa is one of Japan's “major” destinations - but not as much as 25 degrees or so, which is the actual difference between Kyoto and Okinawa (in the month of April). The biggest part of Okinawa is about 10C. The weather will be between

Scalar Probing with GLP

GLP can also be helpful as a feature encoder via 1-D probing, as it isolates concepts into single units with broad coverage over human-understandable concepts of interest. On the concepts from Kantamneni et al. (2025), we show that our newly proposed unsupervised deep and nonlinear encoders (GLP) outperforms unsupervised shallow linear encoders (SAE).

Through probing, we can localize concepts to single units in GLP, and examine the top-3 activating documents from FineWeb, corresponding to these units. We show results on the Llama8B GLP.

Concept:
Click text above for dropdown; click table row for full text

Dataset Name: 156_athlete_sport_baseball
SAN FRANCISCO -- Hensley Meulens is the first Curacao native to play in the Major Leagues. He has rock star-like status in his native homeland. Do an Internet search of his name, and you'll find stories about his pending trip to outer space in 2014. (Tr
Indians rally, beat Reds 11-10 in spring opener Terry Francona froze like a rookie manager. AP Sports Writer Terry Francona froze like a rookie manager. When the winning run crossed home plate in the ninth inning Friday, giving Cleveland an 11-10 comeback win over the Cincinnati
TEMPE, Ariz. (AP) Commissioner Bud Selig wants baseball, not the government, to determine the game's steroid policy. ''I said that I would be for federal legislation, and I am,'' Selig said Saturday during a spring training game between the Oakland Athletics and Los Angeles Angels. ''But
Dataset Name: 138_glue_mnli_contradiction
Kissinger gets it wrong again Henry Kissinger is arguing that the Vietnam War taught us the perils of military withdrawal. But the true lesson of the Vietnam War is that the continuing presence of U.S. troops will only compound the tragedy in Iraq and the region -- and that Kissinger's own partisan criticism
-Niranjan Zanzmera, Mayor, Surat Municipal Corporation The city of Surat has long been known as the diamond polishing hub of the world, but there are other facets that have led the city to shine - including being one of the cleanest cities in the country. Under the aegis of
Yellow is one of my all-time favorite colors. But when it's in the form of pollen on our driveway? Not so much. Yep, that yellow tinge all over the concrete driveway is pollen. That's springtime in Texas for ya. Here, take a closer look. Kinda makes your eyes water
Dataset Name: 136_glue_mnli_entailment
Bean Protocol Recipes What is the bean protocol? The bean protocol is not a diet. It's a lifestyle centered around eating more soluble fiber (beans) and healthy fats at least 90 minutes apart. When soluble fiber (beans) are separated from fats (oil, nuts, avocado) it will attach to our
Ground Improvement & Piling – Definitions & FAQs What is Soil Modification / Stabilisation? Soil Modification / Stabilisation involves the addition of product to enhance the physical properties of soils to deem materials suitable for compaction / site reuse. The process can increase the shear strength of a soil and/or control the
|Site Name||Ninja Hangout| |Description||Ninja Hangout is a TMNT based site with lots of RPGs, fanfics, art, and games. There are some contests available to have fun with. You can put up profiles of an OC you may have. Plus you may see

Scaling GLP

To understand the scaling properties of GLP, we train Llama1B GLPs of varying sizes. We see that the diffusion loss reliably decreases as a function of compute, and is a reasonable proxy for downstream metrics like steering and probing performance.

Metric:
Select from the dropdown above then hover over the plot

Relevant Readings

If you like this work, these other meta-models might also interest you.

Acknowledgements

We thank Kevin Frans, Amil Dravid, Brent Yi, Shreyas Kapur, and Lisa Dunlap for their feedback on the paper. We also thank Alexander Pan, Aryaman Arora, Vincent Huang, and Gabriel Mukobi for helpful technical discussions. Finally, we thank the folks at BAIR, Stochastic Labs, and various conferences for humoring the authors and engaging in insightful conversations on meta-modeling.

BibTeX

@article{luo2026glp,
    title={Learning a Generative Meta-Model of LLM Activations},
    author={Grace Luo and Jiahai Feng and Trevor Darrell and Alec Radford and Jacob Steinhardt},
    journal={arXiv preprint arXiv:2602.06964},
    year={2026}
  }