Guide to Prompting ACE-Step for AI Music Generation

Master AI music creation with tags, lyrics, and prompting techniques for ACE-Step

Last Updated

Feb 13, 2026Fresh

Models Tested

ACE-Step

Introduction to ACE-Step Music Prompting

ACE-Step is an open-source AI music generation model that creates full songs from two simple inputs: tags (describing genre, mood, and instruments) and lyrics (the words to be sung or a marker for instrumental sections). It powers Ambience AI's audio generation pipeline, enabling creators to produce songs with vocals, instrumentals, and multi-genre compositions.

Built on a diffusion transformer architecture with a music autoencoder, ACE-Step can generate approximately 47 seconds of music in about 10 seconds on a single GPU. It supports over 100 genres, aligns vocals closely with lyrics, and produces audio at 48kHz stereo quality.

This guide covers everything you need to know about prompting ACE-Step effectively, from crafting tag prompts and writing singable lyrics to generating instrumentals, using editing features, and troubleshooting common issues.

Understanding ACE-Step: How AI Music Generation Works

ACE-Step (ACE Studio Text-to-Music Step) is an open-source foundation model for music generation released by StepFun and ACE Studio. It uses a diffusion transformer to generate high-quality music from text descriptions and optional lyrics.

How ACE-Step Works

Tags (Style Prompt)

Tags describe the musical style you want: genre, mood, instruments, tempo, and era. They function like a creative brief for the model, telling it what kind of track to produce. Example: "cinematic orchestral, epic, slow tempo, strings, brass, emotional"

Lyrics (Content Input)

Lyrics provide the vocal content the model should sing. You can include structure tags like [verse], [chorus], and [bridge] to organize sections. For instrumental tracks, use [instrumental] or [inst].

Key Capabilities

  • Generates ~47 seconds of music in ~10 seconds on a single GPU
  • Supports 100+ genres including pop, rock, jazz, classical, electronic, hip hop, and more
  • Strong lyrics-to-vocal alignment with natural-sounding singing
  • 48kHz stereo output for high-quality audio
  • Supports both vocal songs and pure instrumentals

Crafting Effective Tag Prompts

Tags are the primary way to control the musical style of your generation. Think of them as comma-separated descriptors that guide the model. The sweet spot is 3 to 7 tags.

Tag Anatomy

1. Genre and Era First

Lead with the genre and optional era to set the foundation. Examples: "cinematic orchestral", "lo-fi hip hop", "80s synth pop", "jazz ballad"

2. Key Instruments

List the instruments you want featured. Examples: "acoustic guitar, piano, soft strings", "synth bass, drum machine"

3. Mood and Adjectives

Add emotional descriptors. Examples: "uplifting, hopeful", "dark, brooding", "dreamy, ethereal", "energetic, aggressive"

4. Tempo and BPM

Specify speed directly. Examples: "105 BPM", "fast tempo", "slow ballad", "moderate groove"

Example: Well-Structured Tags

"indie folk, acoustic guitar, piano, soft strings, warm, nostalgic, 95 BPM"

Genre: indie folk

Instruments: acoustic guitar, piano, soft strings

Mood: warm, nostalgic

Tempo: 95 BPM

Avoid Contradictory Tags

Conflicting descriptors confuse the model and produce inconsistent results. Avoid combinations like "ambient, metal" or "upbeat, melancholic" in the same prompt. If you want genre fusion, be deliberate. Use a primary genre with a secondary influence (e.g., "electronic with jazz influences") rather than listing contradictory styles.

Generating Instrumental Music

To generate music without vocals, use the [instrumental] or [inst] token in the lyrics field. This tells ACE-Step to produce a purely instrumental track.

Instrumental Setup

Tags: "cinematic orchestral, epic, slow tempo, strings, brass, emotional"

Lyrics: [instrumental]

Genre Fusion Tips

ACE-Step handles genre blending well when you guide it clearly. Structure your tags with a primary genre and a secondary influence to get coherent results.

Primary + Influence

"jazz, electronic elements, smooth, late night, saxophone, synth pads"

Era + Modern Twist

"70s funk, modern production, groovy, bass guitar, clavinet, punchy drums"

Writing Lyrics for Songs

When generating songs with vocals, the lyrics field is where you write the words the model will sing. ACE-Step supports structure tags to organize your song into sections.

Structure Tags

[verse]

Song verses

[chorus]

Repeated hook

[bridge]

Contrasting section

[instrumental]

No vocals (solo)

Example: Structured Lyrics

[verse]
Walking through the morning light
Every shadow fades from sight
The world is waking up with me

[chorus]
We are the dreamers of the day
Nothing can stand in our way
Together we will find a way

[bridge]
When the night comes calling
We won't stop from falling
Into something beautiful

[chorus]
We are the dreamers of the day
Nothing can stand in our way
Together we will find a way

Lyrics Best Practices

Match Length to Duration

ACE-Step sings roughly 2 to 3 words per second. For a 47-second track, aim for around 90 to 140 words total. Too many lyrics will sound rushed; too few will leave long pauses.

Write for Singability

Use simple, natural phrasing. Short lines of 4 to 8 words work best. Avoid tongue twisters, complex vocabulary, or very long sentences that are difficult to sing.

Align Tags with Lyrics

Make sure your tags match the mood of your lyrics. Sad lyrics paired with "upbeat, party" tags will produce confusing results. Keep the emotional tone consistent.

Mix Vocals and Instrumentals

Use [instrumental] between lyric sections for solos or breaks. This creates dynamic variety and gives the track breathing room.

ACE-Step Prompt Examples for Songs and Instrumentals

Here are complete prompt templates for common music generation use cases. Each includes both tags and lyrics you can adapt for your projects.

Pop Song with Vocals

Tags:

"pop, female vocal, catchy, upbeat, synth, 120 BPM"

Lyrics:

[verse]
Lights are flashing all around
Feel the rhythm, feel the sound
Tonight we're never coming down

[chorus]
Dance with me under the neon glow
Let the music take control
This is everything we know

Cinematic Instrumental

Tags:

"cinematic orchestral, epic, slow tempo, strings, brass, emotional, film score"

Lyrics:

[instrumental]

Lo-Fi Chill Beat

Tags:

"lo-fi hip hop, chill, mellow, vinyl crackle, piano, jazzy chords, 85 BPM"

Lyrics:

[instrumental]

Rock Track with Lyrics

Tags:

"rock, electric guitar, powerful drums, male vocal, energetic, 130 BPM"

Lyrics:

[verse]
Standing on the edge of the unknown
Fire in my veins, I'm not alone
Every road I take leads me back home

[chorus]
We rise, we fall, we carry on
Through the storm we're standing strong

[instrumental]

Try these prompts in our AI audio generator to hear the results.

Editing and Refining ACE-Step Music

ACE-Step offers several editing modes that let you iterate on generated music without starting from scratch. These tools are essential for refining your tracks.

Retake (Variation)

Generate a new variation using the same tags and lyrics. Each retake produces a different arrangement while keeping your original creative direction intact.

Repaint (Selective Regeneration)

Regenerate a specific time range within a track. If the chorus sounds great but the verse needs work, repaint just the verse portion while keeping everything else.

Extend (Continue Track)

Add more time to your track by extending it beyond the original duration. The extension continues in the same style and key as the existing audio.

Edit / Remix

Change the style of an existing generation by modifying the tags. Keep the same structure and lyrics but shift the genre, mood, or instrumentation.

Refinement Workflow

Start by generating an initial track with your tags and lyrics. Listen through and identify what works and what doesn't. Use retake if you want a completely different arrangement, repaint to fix specific sections, extend to add length, or edit to shift the style. Iteration is key to getting the best results from ACE-Step.

Troubleshooting Common Issues

Here are solutions to the most common issues creators encounter when generating music with ACE-Step.

Vocals Sound Too Loud or Overpowering

Problem: The vocal track drowns out the instrumentals

Solution: Add more instrument-related tags to emphasize the backing track. Use tags like "rich instrumentation" or list specific instruments to give them more presence.

Lyrics Sound Rushed or Crammed

Problem: The model is trying to fit too many words into the track

Solution: Reduce the number of lyrics. Aim for 2 to 3 words per second. For a 47-second track, keep lyrics under 140 words. Add [instrumental] breaks between sections.

Muddled or Incoherent Style

Problem: The track sounds like a confused mix of genres

Solution: Remove contradictory tags. Stick to one primary genre and use 3 to 7 focused, complementary tags. Avoid mixing opposing moods or unrelated genres.

Unwanted Vocals on Instrumental Track

Problem: The model adds vocals despite using the instrumental token

Solution: Make sure you're using [instrumental] or [inst] as the only content in the lyrics field. Remove any other text. Try adding "instrumental" to your tags as well.

Start Creating Music with ACE-Step

ACE-Step makes AI music generation accessible to everyone. With the right combination of tags and lyrics, you can create everything from cinematic scores to pop songs with vocals in seconds. The key is to start simple and iterate.

Begin with the templates in this guide, experiment with different genre and mood combinations, and use the editing tools to refine your tracks. The more you create, the better your intuition for prompting will become. Try it now with our AI audio generator.

Looking for more creative techniques? Check out our Flux image prompting guide, WAN video prompting guide, Kling video prompting guide, or explore our complete suite of creative tools.

Sources & Citations

Ready to Create Music with ACE-Step?

Put your new music prompting skills to use with our AI audio generator. Create songs, instrumentals, and soundscapes using the techniques you've just learned.