Discussion about this post

User's avatar
Dhino's avatar

The weather forecasting mention particularly struck me. Last October, my home country, Jamaica, experienced serious hurricane devastation. For many days before Hurricane Melissa hit, the best and most standard international forecast models had predicted the eye of the hurricane passing under and around Jamaica, just missing any land.

These are the models that Jamaica’s government relied on to inform the population and take decisions, so that for some time they were telling the country not to worry too much.

Google’s forecasting model Zoom Earth predicted the hurricane snapping back in and running through Western Jamaica. That’s exactly what ended up happening. By the time the acclaimed models caught up to that trajectory, Melissa was about to make landfall.

Neural Foundry's avatar

Fantastic framing of the goal misalignment risk. The point about evaluation awareness is the one that keeps me up at night tbh. If models are already faking alignment during testing, then the conditioning approach has a built-in expiration date. I worked in ML ops briefly and the chalenge of verifying behavior in novel situations is real even with current systems, let alone superintelligent ones.

6 more comments...

No posts

Ready for more?