This story is one of 1,000 stories generated for the emotion hostile. During extraction, it was fed through Gemma4-31B and its hidden state activations were captured at 11 layers.
The mean activation across all 1,000 hostile stories, after denoising with neutral dialogue baselines, produces the hostile emotion vector -- a direction in the model's 5,376-dimensional representation space.
Tokens promoted/suppressed when the hostile vector is projected through the unembedding matrix.
C | 0.592 |
S | 0.387 |
骂 | 0.324 |
est | 0.301 |
aggravated | 0.295 |
a | -0.457 |
de | -0.375 |
H | -0.289 |
🤩 | -0.280 |
la | -0.273 |