ConvoPro AI — Designing Psychological Safety at Scale

Role
Lead Product Designer
Company
Babbel, Berlin
Team
1 PM, 2 Eng, 1 Content, 1 UXR, 1 Learning Advisor
Timeline
~8 months (2023)
Platform
Android (ENG/QA → SPA)
How I led the design of Babbel's AI-powered conversation coach — turning speaking anxiety into learner confidence, and driving +7% activation.
Impact
learner activation in the first 3 weeks.
conversation completion rate.
drop in unnatural AI phrasing through a prompt QA system I helped design.
Users described it as “motivating,” “realistic,” and “helpful without pressure.”
The Strategic Opportunity
The blocker wasn't proficiency. It was confidence.
Babbel had powerful tools for grammar and reading. But when it came to the thing learners wanted most — actually speaking — engagement collapsed. Especially in high-stakes moments like live classes, job interviews, or certification exams.
Working closely with product, data, and learning science leads, I helped identify a critical insight that would reshape our approach: learners knew the grammar. They understood the vocabulary. But the moment they had to speak — they froze. This wasn't a content gap. It was an emotional one.
And the business case was clear: if we reduce friction in speaking practice, we can boost activation and retention — without adding tutor costs.
My Role
From vision to system.
This wasn't a project handed to me with a brief. I shaped it from the ground up. I led the design vision and drove the product direction in close collaboration with PM, engineering, and didactics. My work spanned three interconnected layers:
Experience design
I defined the end-to-end interaction model: how learners enter a conversation, how they speak, and how they receive feedback. Every decision was grounded in a single principle: reduce fear, build confidence.
AI behavior as a design surface
I introduced the idea of treating AI tone, phrasing, and conversational pacing as UX artifacts — not just engineering outputs. This reframing shifted how the team thought about quality and iteration.
Systems and process
I co-designed a scalable prompt QA workflow that aligned content, learning science, and engineering — enabling continuous improvement without bottlenecks.
Throughout the project, I operated as a design leader with influence across disciplines — facilitating alignment, shaping strategy, and making high-stakes design calls in ambiguity.
Understanding the Problem
The data told a clear story.
no-show rate on Live classes — learners signed up for speaking practice but didn’t show up. The intent was there; the confidence wasn’t.
less completion on speaking activities vs. grammar, reading, and listening exercises. Speaking was the outlier.
learning goal in user surveys and onboarding: “having conversations.” A massive gap between what learners wanted and what they did.

This tension — high desire, low engagement — became our strategic starting point. It told us the problem wasn't motivation. It was something deeper.
What We Knew
Understanding doesn't translate into fluency.
Babbel's existing speaking features relied on repetitive, scripted exercises. They taught pronunciation — but not fluency. Not the ability to think on your feet in a real conversation. Advanced learners were stuck in a loop: they could understand everything, but they couldn't respond without panic.
“I know what to say. But when I try to speak, I just freeze.”
“It's not about grammar — it's about pressure.”
There's a name for this in linguistics: the silent period — the cognitive bottleneck where understanding doesn't translate into fluency. And existing tools weren't helping:
Live tutors
Effective but expensive, inconsistent, and intimidating.
Generic AI chatbots
Scalable but robotic, flat, and emotionally tone-deaf.
We weren't just designing a conversation feature. We were designing for psychological safety at scale.
Research
What we needed to learn.
To move from assumptions to real understanding, I co-led in-person, moderated research sessions with 6 advanced learners in Berlin. We focused on four questions:
Is speaking practice with an app a meaningful value proposition?
What actually blocks people from speaking — even when they “know” the language?
What emotional responses are triggered during real-time speaking?
How do learners perceive and interact with feedback from a digital coach?

Key Insights
The problem wasn't what learners knew. It was how they felt.
Speaking anxiety — not knowledge — was the core blocker.
Learners didn’t feel underprepared. They felt exposed.
Feedback tone and timing mattered more than correctness.
Learners preferred light-touch guidance over red-pen correction. Too much correction during a conversation shut them down.
Real-world scenarios were highly motivating.
Learners wanted to rehearse conversations that mirrored job interviews, travel situations, or certification tests — not abstract exercises.
Generic AI tools fell short.
ChatGPT-style bots lacked pedagogical intent, emotional intelligence, and conversational pacing. Learners disengaged quickly.
Lean Validation
Quick surveys to validate early hypotheses.
Before committing to a full product direction, I pushed for a lean validation step. We ran quick, targeted surveys with existing Babbel users to pressure-test our qualitative findings at a broader scale. The goal wasn't deep research — it was fast signal.
said yes to AI-powered conversation practice — stronger signal than expected.
Top contexts
Job interviews, travel, and certification prep ranked highest — confirming our scenario-based approach.
What brings them back
Feedback quality and low pressure — not difficulty or gamification.

The fastest path to good decisions is often the smallest possible test, run at the right moment.
From Insight to Strategy
How might we create a psychologically safe environment for learners to practice real conversations — and want to come back?
Through cross-functional planning sessions, I led the synthesis of our research into measurable product objectives — bridging learner needs, product strategy, and business goals.
| Stakeholder Need | Product Objective | How We'd Measure It |
|---|---|---|
| Learners want safe speaking practice | Reduce speaking anxiety & increase repeat use | Conversation completion rate · Feedback helpfulness |
| Product team needs to scale value | Build AI systems that reduce tutor reliance | Tutor load per learner · Prompt QA coverage |
| Business leadership wants growth | Increase retention through confidence loops | 3-week activation lift · D1–D3 retention |
Exploring Concepts
Three ideas on the table.
Rather than jumping to a single solution, I facilitated a structured brainstorming process with PM, engineering, content design, and learning science. We explored three distinct concepts — each rooted in our research insights but taking a fundamentally different approach.

Concept A
AI-Powered Review Stories
Why we explored it: It leveraged existing vocabulary data, and earlier research showed learners were intrigued by AI-generated content based on their own learning history.
Why we moved on: While the concept tested well for reinforcement, it didn’t address the core problem — speaking confidence. Learners consumed content passively.
Concept B
Open-Ended AI Chatbot
Why we explored it: Maximum flexibility. Learners could practice whatever mattered to them in the moment. Technically simpler to build.
Why we moved on: Learners felt lost without structure — the open-endedness actually increased anxiety. Without scaffolding, AI responses felt robotic. It failed our core principle: reduce fear, build confidence.
Concept C ✔
Scenario-Based Conversation Coach
Why we explored it: Guided AI conversations anchored in real-world scenarios with a structured flow — warm-up → practice → feedback — and a coaching-style summary.
Why we chose it: It directly addressed every research insight: real-world relevance reduced anxiety, scaffolded structure lowered cognitive load, and a coaching-oriented feedback model built confidence.

This is how I approach product design: diverge intentionally, then converge with evidence.
Design Principles
Four principles that guided every decision.
Tone-First AI
Speak like a coach, not a critic.
We prioritized warmth and encouragement in every AI response. The hypothesis: if we build trust first, learners will lower their emotional guard. The goal was never correction — it was confidence.
Scaffolded Interaction
Guide with structure.
Instead of open-ended chats, I designed a clear progression: warm-up → practice → feedback. This reduced cognitive load, helped learners ease in, and made the experience repeatable and low-pressure.
Configurable Content
Meet learners where they are.
We offered preset scenarios (e.g., “job interview,” “ordering at a restaurant”) alongside a custom conversation builder — balancing structure with freedom.
Post-Convo Reflection
Reinforce, don’t penalize.
Feedback was reframed from critique to reflection. We highlighted strengths, suggested vocabulary alternatives, and avoided punitive language. The intent was momentum — not judgment.

Prototyping & Testing
Each prototype tested specific hypotheses.
With principles defined, I moved into rapid prototyping — building mid- and high-fidelity flows in Figma and Protopie to test different interaction models and feedback patterns.
Key Insight #1
Inline feedback increased anxiety.
In early prototypes, we tested showing corrections during the conversation. The result? Learners became more self-conscious and stopped speaking naturally. We shifted to a post-conversation summary model — designed to feel like coaching, not grading.
5 out of 6 test participants preferred the summary model. It felt like encouragement, not evaluation.
Before

After

Key Insight #2
Mic interaction wasn't intuitive.
Many users hesitated at the initial “tap-to-start” mic input. They weren't sure when to speak, or what would happen. I redesigned the interaction to mimic a familiar mental model — WhatsApp's tap-and-hold voice input. We added micro-animation on first use and a contextual tooltip.
Before

After

Feedback works best when it’s reflective, not corrective.
Small interaction tweaks (like mic behavior) have outsized usability impact.
Structure and tone matter more than raw accuracy in building confidence.
System Design
Designing the feedback loop.
A beautiful interface means nothing if the AI behind it sounds robotic. To make ConvoPro sustainable and scalable, we needed a system — not just a feature. I co-designed a collaborative QA workflow that aligned learning science, content design, and engineering around a shared definition of quality.
Centralized prompt management
In Contentful with preview capabilities — enabling non-engineers to iterate on AI behavior directly.
Weekly transcript reviews
To catch issues like robotic tone, hallucinations, or awkward phrasing before they reached users.
A scoring rubric
That evaluated grammar, tone, and perceived helpfulness from the learner’s perspective — giving the team a shared language for quality.
Early iteration

Final iteration

Prompt Iteration in Action
| Version | Prompt | Unnatural | Sentiment |
|---|---|---|---|
| V1 | "What are your hobbies?" | 54% | 2.6/5 |
| V2 | "Tell me about your hobbies." | 38% | 3.4/5 |
| V3 | "What do you usually like to do in your free time?" | 22% | 4.3/5 |
reduction in unnatural AI phrasing through systematic prompt iteration.
point increase in learner sentiment (avg) as prompts became more natural.
This feedback loop didn't just improve the product. It established a new way of working — treating prompt tone, pacing, and phrasing as UX artifacts with the same rigor as interface design.
The Shipped Experience
A confidence-building system.
The final product was more than a chat feature. It was designed to help learners practice speaking in a way that felt safe, structured, and motivating.
Scenario Selector
Learners choose from relevant, real-world contexts — job interviews, travel situations, language exams. Every session feels purposeful, never random.

Voice-Powered Chat Interface
Tap-and-hold mic input inspired by WhatsApp, with real-time visual cues. Speaking feels familiar and low-friction.


Guided Feedback Screen
Encouraging post-conversation summaries that highlight strengths, reinforce new vocabulary, and offer gentle suggestions — without overwhelming corrections.

Results
Learner activation in the first 3 weeks.
80%+
Conversation completion rate.
−25%
Unnatural AI phrasing.
+1.5 pts
Learner sentiment post-iteration.
Qualitative
“Motivating,” “realistic,” “helpful without pressure.”
Beyond the numbers, ConvoPro shifted how Babbel thought about AI product quality — establishing prompt behavior as a design discipline, not just an engineering concern.
What Came Next
Strategic opportunities I mapped out.
Adaptive difficulty & feedback personalization
Adjusting tone, vocabulary level, and challenge in real-time based on learner performance.
Confidence scoring model
Exploring ways to surface encouragement patterns based on speaking speed, hesitation, and comprehension — not to rank users, but to support them.
Multi-turn scenario progression
Longer conversation flows simulating real dialogues (interviews, meetings) while maintaining tone consistency.
Feedback customization
Letting users choose their preferred coaching style (grammar-focused, vocabulary growth, soft coaching) to increase repeat usage.

Reflection
Confidence builds when users feel in control, not corrected. That simple truth reshaped how I think about learning tools, feedback design, and what it really means to design for growth.
Designing for emotional context is a strategic advantage.
Speaking a new language is deeply emotional. I learned to treat psychological safety as a design surface — shaping tone, timing, and feedback mechanics to reduce fear and increase confidence.
AI requires a different kind of UX thinking.
Unlike static content, AI behavior is dynamic and emergent. I prototyped AI tone and phrasing, created feedback QA systems, and iterated through transcript-level testing.
Systems thinking drives scale.
This wasn’t about one feature — it was about enabling ongoing improvement. The prompt QA and iteration system I co-designed aligned engineering, content, and learning science around a repeatable workflow.
Influence without authority is essential.
This project required alignment across disciplines — product, didactics, data, engineering. I learned to facilitate conversations, shape strategy collaboratively, and move the team forward with clarity, not control.
TL;DR
Goal: Build an AI conversation coach that increases learner activation. Result: +7% activation, 80%+ completion rate.
How: Designed for psychological safety, not just functionality. Scenario-based structure, coaching-tone feedback, and a prompt QA system that treated AI behavior as a design artifact.
Lead role: Design vision, interaction model, feedback system, AI QA process, cross-functional alignment.