← Back
+%

ConvoPro AI — Designing Psychological Safety at Scale

Learner using the ConvoPro voice conversation feature on mobile

Role

Lead Product Designer

Company

Babbel, Berlin

Team

1 PM, 2 Eng, 1 Content, 1 UXR, 1 Learning Advisor

Timeline

~8 months (2023)

Platform

Android (ENG/QA → SPA)

How I led the design of Babbel's AI-powered conversation coach — turning speaking anxiety into learner confidence, and driving +7% activation.

Impact

+%

learner activation in the first 3 weeks.

+%+

conversation completion rate.

-%

drop in unnatural AI phrasing through a prompt QA system I helped design.

Users described it as “motivating,” “realistic,” and “helpful without pressure.”

The Strategic Opportunity

The blocker wasn't proficiency. It was confidence.

Babbel had powerful tools for grammar and reading. But when it came to the thing learners wanted most — actually speaking — engagement collapsed. Especially in high-stakes moments like live classes, job interviews, or certification exams.

Working closely with product, data, and learning science leads, I helped identify a critical insight that would reshape our approach: learners knew the grammar. They understood the vocabulary. But the moment they had to speak — they froze. This wasn't a content gap. It was an emotional one.

And the business case was clear: if we reduce friction in speaking practice, we can boost activation and retention — without adding tutor costs.

My Role

From vision to system.

This wasn't a project handed to me with a brief. I shaped it from the ground up. I led the design vision and drove the product direction in close collaboration with PM, engineering, and didactics. My work spanned three interconnected layers:

Experience design

I defined the end-to-end interaction model: how learners enter a conversation, how they speak, and how they receive feedback. Every decision was grounded in a single principle: reduce fear, build confidence.

AI behavior as a design surface

I introduced the idea of treating AI tone, phrasing, and conversational pacing as UX artifacts — not just engineering outputs. This reframing shifted how the team thought about quality and iteration.

Systems and process

I co-designed a scalable prompt QA workflow that aligned content, learning science, and engineering — enabling continuous improvement without bottlenecks.

Throughout the project, I operated as a design leader with influence across disciplines — facilitating alignment, shaping strategy, and making high-stakes design calls in ambiguity.

Understanding the Problem

The data told a clear story.

%

no-show rate on Live classes — learners signed up for speaking practice but didn’t show up. The intent was there; the confidence wasn’t.

%

less completion on speaking activities vs. grammar, reading, and listening exercises. Speaking was the outlier.

#

learning goal in user surveys and onboarding: “having conversations.” A massive gap between what learners wanted and what they did.

Engagement chart showing speaking activities significantly lower than other activity types

This tension — high desire, low engagement — became our strategic starting point. It told us the problem wasn't motivation. It was something deeper.

What We Knew

Understanding doesn't translate into fluency.

Babbel's existing speaking features relied on repetitive, scripted exercises. They taught pronunciation — but not fluency. Not the ability to think on your feet in a real conversation. Advanced learners were stuck in a loop: they could understand everything, but they couldn't respond without panic.

“I know what to say. But when I try to speak, I just freeze.”
“It's not about grammar — it's about pressure.”

There's a name for this in linguistics: the silent period — the cognitive bottleneck where understanding doesn't translate into fluency. And existing tools weren't helping:

Live tutors

Effective but expensive, inconsistent, and intimidating.

Generic AI chatbots

Scalable but robotic, flat, and emotionally tone-deaf.

We weren't just designing a conversation feature. We were designing for psychological safety at scale.

Research

What we needed to learn.

To move from assumptions to real understanding, I co-led in-person, moderated research sessions with 6 advanced learners in Berlin. We focused on four questions:

01

Is speaking practice with an app a meaningful value proposition?

02

What actually blocks people from speaking — even when they “know” the language?

03

What emotional responses are triggered during real-time speaking?

04

How do learners perceive and interact with feedback from a digital coach?

Research synthesis board with sticky note clusters organized by themes

Key Insights

The problem wasn't what learners knew. It was how they felt.

Speaking anxiety — not knowledge — was the core blocker.

Learners didn’t feel underprepared. They felt exposed.

Feedback tone and timing mattered more than correctness.

Learners preferred light-touch guidance over red-pen correction. Too much correction during a conversation shut them down.

Real-world scenarios were highly motivating.

Learners wanted to rehearse conversations that mirrored job interviews, travel situations, or certification tests — not abstract exercises.

Generic AI tools fell short.

ChatGPT-style bots lacked pedagogical intent, emotional intelligence, and conversational pacing. Learners disengaged quickly.

Lean Validation

Quick surveys to validate early hypotheses.

Before committing to a full product direction, I pushed for a lean validation step. We ran quick, targeted surveys with existing Babbel users to pressure-test our qualitative findings at a broader scale. The goal wasn't deep research — it was fast signal.

%

said yes to AI-powered conversation practice — stronger signal than expected.

Top contexts

Job interviews, travel, and certification prep ranked highest — confirming our scenario-based approach.

What brings them back

Feedback quality and low pressure — not difficulty or gamification.

Survey screen mockups showing questions about speaking anxiety levels and emotional blockers

The fastest path to good decisions is often the smallest possible test, run at the right moment.

From Insight to Strategy

How might we create a psychologically safe environment for learners to practice real conversations — and want to come back?

Through cross-functional planning sessions, I led the synthesis of our research into measurable product objectives — bridging learner needs, product strategy, and business goals.

Stakeholder NeedProduct ObjectiveHow We'd Measure It
Learners want safe speaking practiceReduce speaking anxiety & increase repeat useConversation completion rate · Feedback helpfulness
Product team needs to scale valueBuild AI systems that reduce tutor relianceTutor load per learner · Prompt QA coverage
Business leadership wants growthIncrease retention through confidence loops3-week activation lift · D1–D3 retention

Exploring Concepts

Three ideas on the table.

Rather than jumping to a single solution, I facilitated a structured brainstorming process with PM, engineering, content design, and learning science. We explored three distinct concepts — each rooted in our research insights but taking a fundamentally different approach.

Three concept directions explored: Class warm-up, Speaking challenges, and AI conversation

Concept A

AI-Powered Review Stories

Why we explored it: It leveraged existing vocabulary data, and earlier research showed learners were intrigued by AI-generated content based on their own learning history.

Why we moved on: While the concept tested well for reinforcement, it didn’t address the core problem — speaking confidence. Learners consumed content passively.

Concept B

Open-Ended AI Chatbot

Why we explored it: Maximum flexibility. Learners could practice whatever mattered to them in the moment. Technically simpler to build.

Why we moved on: Learners felt lost without structure — the open-endedness actually increased anxiety. Without scaffolding, AI responses felt robotic. It failed our core principle: reduce fear, build confidence.

Concept C

Scenario-Based Conversation Coach

Why we explored it: Guided AI conversations anchored in real-world scenarios with a structured flow — warm-up → practice → feedback — and a coaching-style summary.

Why we chose it: It directly addressed every research insight: real-world relevance reduced anxiety, scaffolded structure lowered cognitive load, and a coaching-oriented feedback model built confidence.

First prototype flow: Scenario Selection, First Screen, User Speaks, Feedback

This is how I approach product design: diverge intentionally, then converge with evidence.

Design Principles

Four principles that guided every decision.

01

Tone-First AI

Speak like a coach, not a critic.

We prioritized warmth and encouragement in every AI response. The hypothesis: if we build trust first, learners will lower their emotional guard. The goal was never correction — it was confidence.

02

Scaffolded Interaction

Guide with structure.

Instead of open-ended chats, I designed a clear progression: warm-up → practice → feedback. This reduced cognitive load, helped learners ease in, and made the experience repeatable and low-pressure.

03

Configurable Content

Meet learners where they are.

We offered preset scenarios (e.g., “job interview,” “ordering at a restaurant”) alongside a custom conversation builder — balancing structure with freedom.

04

Post-Convo Reflection

Reinforce, don’t penalize.

Feedback was reframed from critique to reflection. We highlighted strengths, suggested vocabulary alternatives, and avoided punitive language. The intent was momentum — not judgment.

Design principles mapped to the learner emotional journey

Prototyping & Testing

Each prototype tested specific hypotheses.

With principles defined, I moved into rapid prototyping — building mid- and high-fidelity flows in Figma and Protopie to test different interaction models and feedback patterns.

Key Insight #1

Inline feedback increased anxiety.

In early prototypes, we tested showing corrections during the conversation. The result? Learners became more self-conscious and stopped speaking naturally. We shifted to a post-conversation summary model — designed to feel like coaching, not grading.

5 out of 6 test participants preferred the summary model. It felt like encouragement, not evaluation.

Before

Before: inline feedback during conversation caused anxiety and broke flow

After

After: post-conversation summary screen with encouraging feedback

Key Insight #2

Mic interaction wasn't intuitive.

Many users hesitated at the initial “tap-to-start” mic input. They weren't sure when to speak, or what would happen. I redesigned the interaction to mimic a familiar mental model — WhatsApp's tap-and-hold voice input. We added micro-animation on first use and a contextual tooltip.

Before

Early mic interaction: bubble-based UI that confused users

After

Refined mic interaction: tap-and-hold with real-time audio waveform
1

Feedback works best when it’s reflective, not corrective.

2

Small interaction tweaks (like mic behavior) have outsized usability impact.

3

Structure and tone matter more than raw accuracy in building confidence.

System Design

Designing the feedback loop.

A beautiful interface means nothing if the AI behind it sounds robotic. To make ConvoPro sustainable and scalable, we needed a system — not just a feature. I co-designed a collaborative QA workflow that aligned learning science, content design, and engineering around a shared definition of quality.

Centralized prompt management

In Contentful with preview capabilities — enabling non-engineers to iterate on AI behavior directly.

Weekly transcript reviews

To catch issues like robotic tone, hallucinations, or awkward phrasing before they reached users.

A scoring rubric

That evaluated grammar, tone, and perceived helpfulness from the learner’s perspective — giving the team a shared language for quality.

Early iteration

Early feedback: score-based approach that felt evaluative

Final iteration

Final feedback: celebratory screen with warm tone and encouragement

Prompt Iteration in Action

VersionPromptUnnaturalSentiment
V1"What are your hobbies?"54%2.6/5
V2"Tell me about your hobbies."38%3.4/5
V3"What do you usually like to do in your free time?"22%4.3/5
32%

reduction in unnatural AI phrasing through systematic prompt iteration.

+1.5

point increase in learner sentiment (avg) as prompts became more natural.

This feedback loop didn't just improve the product. It established a new way of working — treating prompt tone, pacing, and phrasing as UX artifacts with the same rigor as interface design.

The Shipped Experience

A confidence-building system.

The final product was more than a chat feature. It was designed to help learners practice speaking in a way that felt safe, structured, and motivating.

Scenario Selector

Learners choose from relevant, real-world contexts — job interviews, travel situations, language exams. Every session feels purposeful, never random.

Scenario selector: guided conversations library organized by topic

Voice-Powered Chat Interface

Tap-and-hold mic input inspired by WhatsApp, with real-time visual cues. Speaking feels familiar and low-friction.

AI speaking state: message appears with audio waveformUser speaking state: tap-and-hold with real-time audio waveform

Guided Feedback Screen

Encouraging post-conversation summaries that highlight strengths, reinforce new vocabulary, and offer gentle suggestions — without overwhelming corrections.

Feedback screen showing speaking confidence level, tasks completed, and vocabulary analysis

Results

+%

Learner activation in the first 3 weeks.

80%+

Conversation completion rate.

−25%

Unnatural AI phrasing.

+1.5 pts

Learner sentiment post-iteration.

Qualitative

“Motivating,” “realistic,” “helpful without pressure.”

Beyond the numbers, ConvoPro shifted how Babbel thought about AI product quality — establishing prompt behavior as a design discipline, not just an engineering concern.

What Came Next

Strategic opportunities I mapped out.

Adaptive difficulty & feedback personalization

Adjusting tone, vocabulary level, and challenge in real-time based on learner performance.

Confidence scoring model

Exploring ways to surface encouragement patterns based on speaking speed, hesitation, and comprehension — not to rank users, but to support them.

Multi-turn scenario progression

Longer conversation flows simulating real dialogues (interviews, meetings) while maintaining tone consistency.

Feedback customization

Letting users choose their preferred coaching style (grammar-focused, vocabulary growth, soft coaching) to increase repeat usage.

Exploration of confidence grading systems with fluency, grammar, and vocab breakdowns

Reflection

Confidence builds when users feel in control, not corrected. That simple truth reshaped how I think about learning tools, feedback design, and what it really means to design for growth.

01

Designing for emotional context is a strategic advantage.

Speaking a new language is deeply emotional. I learned to treat psychological safety as a design surface — shaping tone, timing, and feedback mechanics to reduce fear and increase confidence.

02

AI requires a different kind of UX thinking.

Unlike static content, AI behavior is dynamic and emergent. I prototyped AI tone and phrasing, created feedback QA systems, and iterated through transcript-level testing.

03

Systems thinking drives scale.

This wasn’t about one feature — it was about enabling ongoing improvement. The prompt QA and iteration system I co-designed aligned engineering, content, and learning science around a repeatable workflow.

04

Influence without authority is essential.

This project required alignment across disciplines — product, didactics, data, engineering. I learned to facilitate conversations, shape strategy collaboratively, and move the team forward with clarity, not control.

TL;DR

Goal: Build an AI conversation coach that increases learner activation. Result: +7% activation, 80%+ completion rate.

How: Designed for psychological safety, not just functionality. Scenario-based structure, coaching-tone feedback, and a prompt QA system that treated AI behavior as a design artifact.

Lead role: Design vision, interaction model, feedback system, AI QA process, cross-functional alignment.