Frequently Asked Questions
Everything you wanted to know
Honest answers to the questions we get asked most. If yours is not here, reach out.
About Spelly
Spelly is a pronunciation assessment tool built by two researchers at the Zurich University of Applied Sciences (ZHAW). We came to Switzerland to study engineering and found ourselves stuck on something embarrassingly simple: we could not pronounce "Grüezi" well enough for our neighbour to understand. A basic greeting. That moment made it clear how overlooked pronunciation is in the tools people use to learn languages.
We could not afford language schools at CHF 50–100 per hour, and every app we tried focused on grammar and vocabulary. None of them told us which sounds we were actually getting wrong. So we built Spelly: starting as a collection of Python scripts, launching as a web app in October 2025, and steadily growing into a platform for anyone who wants honest pronunciation feedback on how they speak.
Spelly is Swiss-built, ZHAW-backed, and narrowly focused: it tells you exactly which sounds you are producing, and how to correct them.
They are both excellent tools at what they do.
ELSA gives you a score on their sentences. Duolingo rewards you for finishing their lessons. Both decide what you practice and when.
Spelly does something different because it works on whatever you actually need to say: your presentation, your job interview answer, the phrase you heard in a meeting and could not quite get right. It breaks your speech down to individual sounds, tracks which ones are consistently off, and shows you exactly what to fix.
Think of it as a practice layer on top of whatever you are already doing. If you already know some of the language and want to sharpen how you sound, that is Spelly.
Honestly? It works best for people who already have some exposure to their target language and want to sharpen specific aspects of their speech.
Professionals preparing for a presentation or interview. Migrants integrating into a new country. Language learners who have finished a course and still wince when they hear themselves. People who understand everything but feel unconfident the moment they open their mouth.
That said, there is no hard ceiling on who can use it. Whether you are working through your first plateau or polishing a near-native accent, the feedback adapts to what you produce. For learners who want guided material — structured paths, targeted exercises, curated content — that is in active development. The model intelligence is already there. The practice architecture can be built on top of it.
Pedagogy & Teaching
Hard no, and it is not trying to be.
A teacher designs your learning path, keeps you accountable, and catches things no algorithm notices. That is irreplaceable. What a teacher cannot do is give you detailed feedback on every single recording, instantly, at 11pm before your presentation.
Most learners get a few minutes of pronunciation feedback per week. With Spelly, every attempt gets analysed. The teacher decides where you are going. Spelly tells you exactly how you sound on the way there.
Yes, and this is one of the things that sets Spelly apart from fixed-curriculum tools. You bring the material: a sentence from a meeting, a phrase from a podcast, a word you keep stumbling on. Spelly responds to all of it. There is no prescribed content you have to work through.
That said, building structured practice routines on top of this — things like minimal pair exercises, where you compare "Haus" and "aus" side by side to isolate the /h/ contrast, or sound-specific drills that pick the right words for whatever sound you are working on — is something we are actively designing. The sound tracking is already there. Turning it into guided practice flows is the next step. Educators who reviewed Spelly specifically pointed to this as the most natural extension, and we agree.
For adults and older students, yes straightforwardly. The phone breakdown, sound symbols, and articulation guides are designed for learners who can read and interpret detailed feedback.
For primary school children the interface may be too dense without a teacher facilitating the session. That does not mean it cannot be adapted through a simplified view for younger students which we can design.
The Spelly B2B platform is built for institutional use — language schools, universities, integration programmes, enterprises — with any kind of language learner in mind.
Technology & Accuracy
A useful framing first: Spelly is not a generative model like ChatGPT, and it is not an evaluative model that scores you on a rubric. It is a descriptive model whose job is to detect which sound is most plausibly what you produced in a given frame of audio, and to tell you the confidence behind that detection. That is a more honest and more useful thing than a score.
At that task it is excellent, under standard recording conditions (clean recording, no background noise, one speaker). The model operates at the level of individual sounds using the International Phonetic Alphabet (IPA) as a language-agnostic representation. Most tools sidestep this problem entirely. We do not.
Where it degrades: at very high confidence extremes (near-native speech that is still subtly off) and at very low ones (heavy accent or disfluency can confuse the alignment). This is true of every model — any system claiming it can detect speech correctly under any conditions is making a claim that is impossible to back up. Models are simplifications of reality that work well within certain conditions. Ours works well for language learning practice, which is exactly the use case it is designed for.
The value is not in a single recording's score. It is in the trend: which sounds are consistently off, and whether they are improving over time. That is what the sound profile is for. Use it as a training signal, not a verdict.
Spelly is built on a multilingual speech model that was trained to detect individual sounds — not what word was said, but which sounds were actually produced and how much they deviate from a native reference. Standard speech recognition asks "what did they say?" Our model asks "which sounds did they actually produce?" That is an entirely different problem.
When you submit a recording, your audio goes through acoustic preprocessing, sound extraction, alignment against a synthetic native baseline, and deviation scoring. The result is a breakdown of each word, each sound, and how closely your production matches the reference. All in near real-time, thanks to its lightweight architecture.
More technical detail is available in our API and integration docs.
Availability & Access
Currently: English (US and GB variants), Spanish, French, Italian, German, and Catalan.
Our plan is to keep expanding as we grow. The model is architecturally multilingual since it uses the International Phonetic Alphabet (IPA) as a language-agnostic representation, so adding a new language is primarily a data challenge rather than an architectural one. We will add languages as we acquire sufficient balanced training data to maintain quality across the board.
There are definite plans for an app, and we are transparent about the effort it involves. Right now, we are focused on perfecting the web product through real user feedback before expanding the surface area.
In the meantime, the Spelly Chrome extension gives you something very close to what we envision for a mobile app: a lightweight layer you can use anywhere, on any text, without leaving what you are doing. If you have not tried it, that is the closest thing to a mobile-like experience we have today. We will spread the word as soon as a proper mobile app is available.
Absolutely! Language learning institutions and businesses are, in fact, the natural home for what Spelly does. Pronunciation feedback does not scale through human tutors, and most existing tools do not go deep enough on phonetics. Spelly fills that gap.
We offer three integration paths:
- A white-label platform for language schools and institutions with full classroom management,
- An API for developers who want to embed phone-level feedback into their own products,
- An embeddable widget for teams who want to add pronunciation assessment to an existing platform without a full integration.
We are happy to scope a short meeting to find the right fit. Reach out here and mention your use case.
Still have questions?
We read everything. Whether it is a question about a specific language, a feature request, or feedback on something that is not working — we want to hear it.
Share your feedback