Guide to Prompting ACE-Step for AI Music Generation
Last Updated
Feb 13, 2026Fresh
Models Tested
ACE-Step
Introduction to ACE-Step Music Prompting
ACE-Step is an open-source AI music generation model that creates full songs from two simple inputs: tags (describing genre, mood, and instruments) and lyrics (the words to be sung or a marker for instrumental sections). It powers Ambience AI's audio generation pipeline, enabling creators to produce songs with vocals, instrumentals, and multi-genre compositions.
Built on a diffusion transformer architecture with a music autoencoder, ACE-Step can generate approximately 47 seconds of music in about 10 seconds on a single GPU. It supports over 100 genres, aligns vocals closely with lyrics, and produces audio at 48kHz stereo quality.
This guide covers everything you need to know about prompting ACE-Step effectively, from crafting tag prompts and writing singable lyrics to generating instrumentals, using editing features, and troubleshooting common issues.
Table of Contents
Understanding ACE-Step: How AI Music Generation Works
ACE-Step (ACE Studio Text-to-Music Step) is an open-source foundation model for music generation released by StepFun and ACE Studio. It uses a diffusion transformer to generate high-quality music from text descriptions and optional lyrics.
How ACE-Step Works
Tags (Style Prompt)
Tags describe the musical style you want: genre, mood, instruments, tempo, and era. They function like a creative brief for the model, telling it what kind of track to produce. Example: "cinematic orchestral, epic, slow tempo, strings, brass, emotional"
Lyrics (Content Input)
Lyrics provide the vocal content the model should sing. You can include structure tags like [verse], [chorus], and [bridge] to organize sections. For instrumental tracks, use [instrumental] or [inst].
Key Capabilities
- Generates ~47 seconds of music in ~10 seconds on a single GPU
- Supports 100+ genres including pop, rock, jazz, classical, electronic, hip hop, and more
- Strong lyrics-to-vocal alignment with natural-sounding singing
- 48kHz stereo output for high-quality audio
- Supports both vocal songs and pure instrumentals
Generating Instrumental Music
To generate music without vocals, use the [instrumental] or [inst] token in the lyrics field. This tells ACE-Step to produce a purely instrumental track.
Instrumental Setup
Tags: "cinematic orchestral, epic, slow tempo, strings, brass, emotional"
Lyrics: [instrumental]
Genre Fusion Tips
ACE-Step handles genre blending well when you guide it clearly. Structure your tags with a primary genre and a secondary influence to get coherent results.
Primary + Influence
"jazz, electronic elements, smooth, late night, saxophone, synth pads"
Era + Modern Twist
"70s funk, modern production, groovy, bass guitar, clavinet, punchy drums"
Writing Lyrics for Songs
When generating songs with vocals, the lyrics field is where you write the words the model will sing. ACE-Step supports structure tags to organize your song into sections.
Structure Tags
[verse]Song verses
[chorus]Repeated hook
[bridge]Contrasting section
[instrumental]No vocals (solo)
Example: Structured Lyrics
[verse] Walking through the morning light Every shadow fades from sight The world is waking up with me [chorus] We are the dreamers of the day Nothing can stand in our way Together we will find a way [bridge] When the night comes calling We won't stop from falling Into something beautiful [chorus] We are the dreamers of the day Nothing can stand in our way Together we will find a way
Lyrics Best Practices
Match Length to Duration
ACE-Step sings roughly 2 to 3 words per second. For a 47-second track, aim for around 90 to 140 words total. Too many lyrics will sound rushed; too few will leave long pauses.
Write for Singability
Use simple, natural phrasing. Short lines of 4 to 8 words work best. Avoid tongue twisters, complex vocabulary, or very long sentences that are difficult to sing.
Align Tags with Lyrics
Make sure your tags match the mood of your lyrics. Sad lyrics paired with "upbeat, party" tags will produce confusing results. Keep the emotional tone consistent.
Mix Vocals and Instrumentals
Use [instrumental] between lyric sections for solos or breaks. This creates dynamic variety and gives the track breathing room.
ACE-Step Prompt Examples for Songs and Instrumentals
Here are complete prompt templates for common music generation use cases. Each includes both tags and lyrics you can adapt for your projects.
Pop Song with Vocals
Tags:
"pop, female vocal, catchy, upbeat, synth, 120 BPM"
Lyrics:
[verse] Lights are flashing all around Feel the rhythm, feel the sound Tonight we're never coming down [chorus] Dance with me under the neon glow Let the music take control This is everything we know
Cinematic Instrumental
Tags:
"cinematic orchestral, epic, slow tempo, strings, brass, emotional, film score"
Lyrics:
[instrumental]
Lo-Fi Chill Beat
Tags:
"lo-fi hip hop, chill, mellow, vinyl crackle, piano, jazzy chords, 85 BPM"
Lyrics:
[instrumental]
Rock Track with Lyrics
Tags:
"rock, electric guitar, powerful drums, male vocal, energetic, 130 BPM"
Lyrics:
[verse] Standing on the edge of the unknown Fire in my veins, I'm not alone Every road I take leads me back home [chorus] We rise, we fall, we carry on Through the storm we're standing strong [instrumental]
Try these prompts in our AI audio generator to hear the results.
Editing and Refining ACE-Step Music
ACE-Step offers several editing modes that let you iterate on generated music without starting from scratch. These tools are essential for refining your tracks.
Retake (Variation)
Generate a new variation using the same tags and lyrics. Each retake produces a different arrangement while keeping your original creative direction intact.
Repaint (Selective Regeneration)
Regenerate a specific time range within a track. If the chorus sounds great but the verse needs work, repaint just the verse portion while keeping everything else.
Extend (Continue Track)
Add more time to your track by extending it beyond the original duration. The extension continues in the same style and key as the existing audio.
Edit / Remix
Change the style of an existing generation by modifying the tags. Keep the same structure and lyrics but shift the genre, mood, or instrumentation.
Refinement Workflow
Start by generating an initial track with your tags and lyrics. Listen through and identify what works and what doesn't. Use retake if you want a completely different arrangement, repaint to fix specific sections, extend to add length, or edit to shift the style. Iteration is key to getting the best results from ACE-Step.
Troubleshooting Common Issues
Here are solutions to the most common issues creators encounter when generating music with ACE-Step.
Vocals Sound Too Loud or Overpowering
Problem: The vocal track drowns out the instrumentals
Solution: Add more instrument-related tags to emphasize the backing track. Use tags like "rich instrumentation" or list specific instruments to give them more presence.
Lyrics Sound Rushed or Crammed
Problem: The model is trying to fit too many words into the track
Solution: Reduce the number of lyrics. Aim for 2 to 3 words per second. For a 47-second track, keep lyrics under 140 words. Add [instrumental] breaks between sections.
Muddled or Incoherent Style
Problem: The track sounds like a confused mix of genres
Solution: Remove contradictory tags. Stick to one primary genre and use 3 to 7 focused, complementary tags. Avoid mixing opposing moods or unrelated genres.
Unwanted Vocals on Instrumental Track
Problem: The model adds vocals despite using the instrumental token
Solution: Make sure you're using [instrumental] or [inst] as the only content in the lyrics field. Remove any other text. Try adding "instrumental" to your tags as well.
Start Creating Music with ACE-Step
ACE-Step makes AI music generation accessible to everyone. With the right combination of tags and lyrics, you can create everything from cinematic scores to pop songs with vocals in seconds. The key is to start simple and iterate.
Begin with the templates in this guide, experiment with different genre and mood combinations, and use the editing tools to refine your tracks. The more you create, the better your intuition for prompting will become. Try it now with our AI audio generator.
Looking for more creative techniques? Check out our Flux image prompting guide, WAN video prompting guide, Kling video prompting guide, or explore our complete suite of creative tools.
Sources & Citations
This guide has been compiled based on research and expert insights from the following sources:
Ready to Create Music with ACE-Step?
Put your new music prompting skills to use with our AI audio generator. Create songs, instrumentals, and soundscapes using the techniques you've just learned.