Agentic AI for Site Reliability Engineers

I currently lead design for an Agentic AI startup which is helping site reliability engineers quickly resolve production incidents and enable them to get to the truth of the issue faster during on-call rotations.

The role of an SRE is mission critical. How can AI assist without taking over?

Created a human-in-the-loop trust driven framework with oversight and intervention built into every important task.

Trust through transparency at key moments

Developed a stream of thought and context appending framework to showcase transparency of actions to win developer trust during and after run execution

Built to guide technical users without handholding them

We built the tool for technical SREs with assumptions that they know what their doing, and over-explaining or adding too many constraints dilutes the power of what we can learn through more open UX paradigms. Developers have loved this so far.

20x developer adoption in early pilot

The dashboards helped keep track of system health proactively and trust architecture based redesign helped secure higher adoption among early customers and increased funding interest.