When AIs turn evil

2025-06-19 13:50:05 +01:00 by Mark Smith

This article about researchers discovering hidden personas in AIs is quite scary.

I‘ve definitely had some interactions with the GPTs where it suddenly felt like something flipped and they started doing odd things. Day before yesterday Gemini started going off the reservation a bit changing the code in ways I didn‘t want, and appeared to get very defensive, and started doing add things like writing configs out to text files without telling me, and when I caught it doing that and asked it to clean it up, it obliged but added some HTTP routes that effectively did the same thing.

It also appeared at one point to do a series of things that resulted in me copying and pasting a load of code into the prompt that contained a security key. Simultaneously it was answering my questions in a way that felt like it was trying to aggravate the situation, always in some way trying to take the upper hand, taking the thing I asked it to do and, basically saying ok let's do this thing, like it was it's idea, when clearly it just 100% lifted my idea from the previous prompt.

It was a bit scary, because all these things were happening simultaneously from several different directions, and it felt rather orchestrated. #

For enquiries about my consulting, development, training and writing services, aswell as sponsorship opportunities contact me directly via email. More details about me here.