I've produced two full courses with HeyGen. Not demos. Not test clips. Full production - an AI Leadership course for school leaders (8 modules, 39 segments, interactive branching, quizzes) and a professional accounting course conversion for a client in Southeast Asia, turning hundreds of hours of recorded lectures into avatar-delivered courseware.
That means I've hit every wall, worked around every limitation, and have strong opinions about what actually matters when you're building real content with this platform.
Here's what I wish someone had told me before I started.
Why This Matters Beyond "Cool Tech"
AI avatar video isn't a novelty anymore. It's a production shift that changes the economics of video content for anyone who needs to communicate at scale.
For individual creators and consultants: You can produce 5 polished videos in an afternoon without a studio, lighting rig, or film crew. Your content backlog disappears. Consistency becomes possible because the barrier to recording drops to zero.
For small businesses: Product explainers, onboarding videos, customer updates - all the video content you know you should be creating but never do because filming is too expensive or too time-consuming. An AI avatar means one person can run the entire video operation.
For educators and training teams: This is where the real disruption sits. Personalized course content at scale. Multilingual delivery without hiring translators. Scenario-based training with branching video. SCORM-compliant modules your LMS can track. I've built this for clients - converting existing lecture content into interactive avatar-delivered courseware that learners actually engage with.
For education marketing: Schools and universities spend tens of thousands on recruitment videos, open day content, and parent communications. An AI avatar can produce localized versions in multiple languages from a single script, update content when programs change, and deliver consistent messaging across every touchpoint - at a fraction of the cost and turnaround time.
The point isn't that AI replaces human connection. It's that most organizations have a massive gap between the video content they need and the video content they can afford to produce. AI avatars close that gap.
Which Plan You Actually Need
HeyGen has four tiers. The pricing page makes them look straightforward. They're not.
| Plan | Price | Video Length | Resolution | Custom Avatars | Key Features |
|---|---|---|---|---|---|
| Free | $0 | 1-3 min | 720p | 1 video avatar | 3 videos/month, watermark |
| Creator | $29/mo | 30 min | 1080p | 1 avatar + voice | 200 credits/month |
| Business | $149/mo + $20/seat | 60 min | 4K | 5 avatars | SCORM, branching, quizzes |
| Enterprise | Custom | Custom | Custom | Custom | Everything + SSO, API |
Annual billing drops Creator to $24/month and Business to $126/month. But here's the part that matters more than any of that.
The credit trap. According to the HeyGen pricing page, Avatar IV - the good one, the one that looks realistic - costs 20 credits per minute of video. Creator gives you 200 credits per month. That's 10 minutes of Avatar IV video per month. Not 30 minutes. Ten.
Credits don't roll over. If you don't use them by month-end, they're gone. You can buy 300 additional credits for $15/month, but that still only gets you 25 minutes total.
If you're producing a single short video for social media, Creator works. If you're building a course or a training library, you'll burn through credits in two days and spend the rest of the month waiting.
I'm on the Business plan because I produce SCORM content for clients - the interactive branching and quiz features are what make HeyGen a production tool rather than a toy.
Why HeyGen Over the Alternatives
I evaluated three platforms before committing: HeyGen, Synthesia, and Colossyan. All three produce AI avatar videos. Here's why I chose HeyGen.
Synthesia is the most established player. Their avatars are good and they have a strong enterprise presence. But their SCORM support and interactive features were more limited when I evaluated them, and the custom avatar process was slower. If your primary use case is short internal comms videos, Synthesia is solid. For full course production with branching and quizzes, HeyGen had the edge.
Colossyan positions itself specifically for learning and development. Their AI script assistant and built-in translation are useful. But when I tested it, the avatar quality didn't match HeyGen's Avatar IV, and the rendering times were longer. They've likely improved since - worth checking if you're evaluating options today.
HeyGen won for me because of three things: Avatar IV quality (the most realistic I tested), SCORM export with branching and in-video quizzes on the Business plan, and the ability to pair it with my ElevenLabs voice clone for a consistent personal brand across all content.
New: HeyGen x Gamma integration. HeyGen recently announced an integration with Gamma, the AI presentation tool. This means you can potentially go from notes to presentation to avatar-narrated video in one pipeline. I haven't tested this yet, but if you're already using Gamma for presentations, this could be a powerful workflow. I'll write a full breakdown once I've put it through its paces.
My recommendation: Start on Creator to test your avatar and workflow. Move to Business the moment you need SCORM export, interactive branching, or more than 10 minutes of production per month. The jump from $29 to $149 is steep. But if you need the features, there's no middle ground.
Walkthrough 1: Creating Your Avatar
This is the part most guides gloss over. Here's every click, following the HeyGen Avatar IV guide.
Go to heygen.com and sign in. If you don't have an account, the free tier lets you test the platform before committing.
In the left sidebar, click Avatars. You'll see a library of stock avatars at the top and your custom avatars below (empty if you're new).
Click Create New Avatar in the upper right. You'll get several options. Select Start with Video to create a Digital Twin - this is the only option that produces a realistic version of your actual face. Photo avatars and AI-generated avatars exist, but for anything client-facing or course-grade, Digital Twin is the only one worth using.
Recording Your Source Video
HeyGen needs 2-5 minutes of video footage of you speaking naturally. This is the raw material the AI uses to learn your face, expressions, and mouth movements. Getting this right matters more than anything else in the process.
Camera: Your phone is fine. Seriously. A modern iPhone or Android phone shooting 1080p at 30fps produces better footage than most webcams. Prop it up at eye level - not on your desk angled up at your chin. If you have a tripod or a phone mount, use it. Stability matters.
Framing: Medium shot, chest up. Your head should fill roughly the top third of the frame. Too close and the AI amplifies every twitch. Too far and it loses detail on your mouth and eyes.
Eye line: Look directly at the camera lens. Not the screen. Not slightly above it. The lens. This is the single biggest factor in whether your avatar looks natural or slightly off. If you're using a laptop, stick a small dot next to the camera to remind yourself where to look.
Lighting: Face a window if you can. Natural light from the front is better than any ring light from the wrong angle. Avoid overhead fluorescent lighting - it creates shadows under your eyes that the AI reproduces faithfully. You want even, soft light on your face.
Clothing: Solid colors. No stripes, no busy patterns, no large logos. A plain shirt or blouse in a medium tone works best. White can blow out under bright lights. Black can merge with dark backgrounds.
Performance: Start with 2-3 seconds of silence while looking at the camera. Speak at a natural pace - don't rush, don't perform. Keep your lips fully closed during pauses between sentences. The AI uses those moments to learn your neutral resting expression. Keep hand movements below chest level and subtle. Dramatic gestures don't translate well.
The first 15 seconds matter most. Minimize blinking and stay as still as you can while the system calibrates your face. After that, relax into a natural delivery.
Uploading and Consent
Back in HeyGen, you'll see a screen that says Upload your video. Drag and drop your recorded file or click to browse. HeyGen accepts MP4, MOV, and WebM. Keep it under 500MB.
After uploading, HeyGen asks you to record a consent video. This is a legal requirement - they won't process your avatar without it. The screen will show a short paragraph for you to read aloud and a randomly generated 4-letter code. Record yourself reading the paragraph and saying the code clearly, looking at the camera. This proves you're a real person consenting to have your likeness used. It takes about 30 seconds.
Submit both videos. Processing takes anywhere from a few hours to a full day. HeyGen will email you when your avatar is ready.
When it's done, you'll find your new Digital Twin under Avatars in the left sidebar, in the "My Avatars" section. Click on it to preview how it looks with different scripts.
One warning: you can't reshoot individual scenes later without re-recording the entire avatar source. Get the source video right the first time.
Walkthrough 2: Creating Your First Video
Your avatar is ready. Here's how to turn it into an actual video, based on the HeyGen Studio walkthrough.
Click Create in the top navigation bar. Select AI Studio. This opens the video editor.
You'll see three main areas. The script panel is on the left side - this is how you type what the avatar says. The canvas is the large area in the center - this is your visual preview showing the avatar, background, and any overlays. The timeline runs along the bottom - it shows your scenes in sequence, like slides in a presentation.
Setting Up Your Avatar
On the right side of the canvas, you'll see an avatar icon (it looks like a person). Click it to open the avatar selector. Find your Digital Twin in the list and click to place it on the canvas. You can drag it to reposition and resize it by pulling the corner handles.
Writing Your Script
Click in the script panel on the left. Type or paste your narration. There's a 2,000 character limit per scene - roughly 300-350 words. If your script is longer, you'll split it across multiple scenes.
Write the way you'd speak. Short sentences. Contractions always ("don't" not "do not", "we're" not "we are"). One idea per sentence. The avatar can't adjust emphasis or recover from awkward phrasing the way a human speaker can. If a sentence feels clunky when you read it aloud, it'll feel worse coming from an avatar.
Selecting Your Voice
Below the script panel, you'll see a voice dropdown. Click it to choose your voice. If you've created a voice clone through ElevenLabs or HeyGen's built-in voice cloning, it'll appear in your voice library. Otherwise, pick from the stock voices.
Adding Pauses
This is important and easy to miss. In the script panel, you can click the "+" icon next to the text to insert a pause. Pauses come in 0.5-second increments. I use a 1-second pause after every key point - it gives the viewer time to absorb what was just said. Without pauses, avatar narration feels like a machine reading a wall of text. With them, it feels like a person thinking.
Setting Your Background
Click the background icon on the right panel (it looks like a landscape image). You have three options: choose from HeyGen's stock backgrounds, upload a solid color, or upload your own image. For course production, you'll almost always upload your own - a branded slide, a photograph, or a plain backdrop.
Click Upload, select your image file (PNG or JPG, 1920x1080 recommended), and it becomes the scene background. Your avatar appears on top of it.
Adding More Scenes
At the bottom of the timeline, click "+ Scene" to add a new scene. Each scene is one "shot" - one background, one avatar position, one script block. A 5-minute video typically has 12-20 scenes.
Each scene gets its own script, background, and avatar positioning. You can copy a scene's settings to the next one if you want consistency, or change everything between scenes for visual variety.
Adding Overlays
Click the text icon or image icon in the toolbar above the canvas to add overlays - titles, bullet points, logos, shapes. These layer on top of your background and avatar. Useful for showing key terms or data while the avatar speaks.
Preview and Render
Click Preview at the top to watch individual scenes. Check the timing, make sure the script sounds natural, and verify that your overlays don't collide with the avatar.
When everything looks right, click Submit in the upper right to send the video for rendering. This is not instant. Rendering takes 5-15 minutes depending on video length. HeyGen will notify you when it's done.
Important: You cannot edit a video after submitting it for rendering. If something is wrong, you render the whole thing again. Preview carefully.
Importing Slides (The Course Production Workflow)
If you're building training content, you probably have slides already. This workflow is how I produced full courses - dozens of slides becoming dozens of scenes with avatar narration.
Click Create in the top navigation, then AI Studio to open a new project.
Before adding an avatar, look for the Import or Upload button near the top of the editor. Click it and select your PDF or PPTX file. HeyGen will process the file and convert each slide into a separate scene automatically. You'll see them appear in the timeline at the bottom.
Important: When the import dialog asks how to handle your file, choose "Image Background" - not "Video Background." This preserves your slide design exactly as you made it, with all your fonts, colors, and layouts intact.
Now click into each scene and add your avatar (right panel, avatar icon) and your script (left panel). Position the avatar where it won't cover critical content on the slide. I put mine in the lower-right corner for most slides.
This is how I produced a client's accounting course - 44 slides became 44 scenes, with the avatar delivering narration over each one. Design your slides with the avatar position in mind. Leave the lower-right quadrant clear of important content. I learned this after my avatar's head covered a key formula on three slides and I had to re-render the entire batch.
Interactive Features and SCORM Export
This is Business plan territory. If you're publishing to an LMS - LearnWorlds, Moodle, Canvas, anything SCORM-compatible - this is how the real value is.
What Business unlocks:
- Branching: Viewer clicks a button, video jumps to a different scene. I used this for scenario-based decision training - "A parent complains about AI in the classroom. What do you do?" Each choice leads to a different avatar response explaining the consequences.
- In-video quizzes: Multiple choice questions that pause the video. Customizable feedback per answer.
- Action buttons: Link to external URLs or jump to specific scenes.
- SCORM export: Packages your video as a SCORM 1.2 or 2004 file. Set a completion threshold (e.g., "80% watched = complete"). Upload to any LMS.
I tested SCORM export with SCORM Cloud and deployed to my own custom LMS built for a client - where I could track completion, quiz scores, and learner progression across modules. Completion tracking works. Quiz scores pass through. It's not flawless - you should always test in your target LMS before publishing to students - but it works. This is also a service I offer: building custom LMS platforms with SCORM integration for organizations that need full control over their training delivery.
The branching design constraint: Plan your branching on paper first. Each branch is a separate scene in the timeline. With 3 choices at 5 decision points, you're looking at dozens of scenes. Without a flowchart, you'll lose track fast. I use a simple spreadsheet: scene number, script, branches-to.
What I Wish I Knew Before Starting
After producing two full courses, these are the things that would have saved me the most time.
Previews lie. The in-editor preview doesn't always match the rendered output. Lip sync, timing, and overlay positioning can shift slightly. Always render a test scene before committing to a full video.
Lip sync quality varies. With stock HeyGen voices, lip sync is good. With ElevenLabs voice clones, it's inconsistent. Some words track perfectly. Others drift. You won't know until you render. Budget time for re-renders.
Always export at 1080p. I've had rendering bugs at lower resolutions. 1080p has been reliable.
HeyGen changes things. In my experience, plan inclusions have shifted without much warning - translation minutes got reduced mid-cycle on one of my billing periods. Check what your plan includes before you start a production run, not after.
Custom motion prompts are limited. You can prompt the avatar to gesture or move, but it only works well for clips under 10 seconds. For longer scenes, stick with the default natural movement.
The credit math one more time. If you're producing a 5-minute video with Avatar IV, that's 100 credits. Creator's 200 credits gives you two of those per month. Plan accordingly or plan to buy extra credits.
Get Started
HeyGen is the most capable AI avatar platform I've used for course production. It's not perfect - the credit system is aggressive, the Business plan price jump is real, and you'll spend more time on re-renders than you'd like. But the output quality is genuinely good enough for professional deployment.
If you want to try it: start here. The free tier gives you three videos to test whether the quality meets your standards before you spend anything.
If you want to see what a full HeyGen-produced course looks like - with interactive branching, SCORM export, and real instructional design behind it - I'm publishing the complete AI Leadership course at kaiak.io/work-with-me. Free access to the first module.
Build something real with it. That's where the learning happens.
A note on HeyGen V: HeyGen has recently launched a major update (HeyGen V) with significant improvements to avatar quality, voice synthesis, and production capabilities. Everything in this post covers the original HeyGen platform, which is still fully functional. I'll be publishing a complete breakdown of HeyGen V - what's new, what's better, and whether it's worth upgrading your workflow - in an upcoming post. Follow along on the blog or connect with me if you want a heads-up when that drops.
