Guide to Prompting Minimax Music for AI Music Generation
Last Updated
Apr 20, 2026Fresh
Models Tested
Minimax Music v2
Introduction to Minimax Music Prompting
Minimax Music v2 is a large-scale AI music generation model built on a Mixture-of-Experts (MoE) architecture with 230 billion parameters. It produces fully mixed and mastered songs from two simple inputs: a style prompt describing the musical direction and lyrics containing the words and song structure. It powers Ambience AI's audio generation pipeline, enabling creators to produce professional-quality tracks with realistic vocals, rich instrumentals, and polished production.
The model generates 44.1kHz stereo audio for up to five minutes per generation. It supports over 100 genres, delivers synchronized vocals that match your lyrics precisely, and outputs audio that sounds ready for release. Minimax Music v2 also supports voice cloning and instrumental reference audio for even more control over the final result.
This guide covers everything you need to know about prompting Minimax Music v2 effectively, from writing style prompts and formatting lyrics with structural tags to generating instrumentals, using reference audio, and troubleshooting common issues.
Understanding Minimax Music
Minimax Music v2 is a commercial-grade music generation model developed by MiniMax. It uses a Mixture-of-Experts architecture with 230 billion parameters, making it one of the largest models purpose-built for music creation. The model separates musical direction from lyrical content through a two-input system designed for lyric-driven composition.
The Two-Input System
Style Prompt (10 to 300 characters)
The style prompt defines the musical direction: genre, mood, instruments, tempo, vocal style, and production qualities. It functions as a creative brief that shapes the overall sound. Example: "Indie folk, melancholic, introspective, longing, acoustic guitar, soft vocals"
Lyrics (10 to 3,000 characters)
The lyrics field provides the vocal content along with structural tags that organize the song into sections. Minimax Music v2 supports 14 structural tags including [Verse], [Chorus], [Bridge], and more. For instrumental tracks, enable the is_instrumental flag.
Key Capabilities
- 230 billion parameter MoE architecture for high-quality, fully mixed and mastered output
- 44.1kHz stereo audio with output up to 5 minutes per generation
- 100+ genres including pop, rock, jazz, classical, electronic, hip hop, R&B, and more
- Precise lyrics-to-vocal synchronization with realistic singing voices
- 14 structural tags for detailed song arrangement control
- Voice cloning and instrumental reference audio support
- Multiple output formats: MP3, WAV, and PCM
Crafting Style Prompts
The style prompt is your primary tool for shaping the sound of your generation. It must be between 10 and 300 characters. Think of it as a concise creative brief that tells the model what kind of track to produce.
Style Prompt Anatomy
1. Genre and Subgenre
Lead with the genre to establish the musical foundation. Be specific with subgenres when possible. Examples: "dreampop", "melodic techno", "neo-soul", "post-rock"
2. Mood and Atmosphere
Add emotional descriptors that guide the feel. Examples: "melancholic, introspective", "euphoric, anthemic", "dark, brooding", "warm, nostalgic"
3. Key Instruments
Specify instruments you want featured. Examples: "acoustic guitar, piano, soft strings", "808 bass, hi-hats, synth pads"
4. Vocal Direction and Tempo
Guide the vocal style and speed. Examples: "breathy female vocals, 90 BPM", "raspy male vocal, uptempo", "soulful harmonies, slow ballad"
Example: Well-Crafted Style Prompts
"Indie folk, melancholic, introspective, acoustic guitar, soft female vocals, gentle piano, 95 BPM"
Genre + mood + instruments + vocal direction + tempo (87 characters)
"Cinematic orchestral, epic, sweeping strings, brass fanfare, timpani, heroic, slow build"
Genre + mood + instruments + dynamic direction (89 characters)
"Lo-fi hip hop, chill, jazzy chords, vinyl crackle, mellow Rhodes piano, 80 BPM"
Genre + mood + instruments + tempo (79 characters)
Tips for Better Style Prompts
- Be specific. "Dark melodic techno, pulsing bassline, atmospheric pads" works better than just "electronic music."
- Stay within the 300-character limit. Focus on the most important descriptors rather than listing everything.
- Avoid contradictory descriptors. Combining "aggressive" with "gentle" or "ambient" with "thrash metal" will produce inconsistent results.
- Include vocal direction when generating songs. Describing the vocal quality (breathy, powerful, raspy) helps the model match your vision.
Generating Instrumental Music
To generate a purely instrumental track, use [Instrumental] as your lyrics. This tells the model to focus entirely on the musical arrangement without generating any vocals.
Instrumental Setup
Style Prompt: "Cinematic orchestral, epic, slow tempo, strings, brass, emotional, film score"
Lyrics: [Instrumental]
Instrumental Tags in Lyrics
Even when writing songs with vocals, you can include instrumental sections using structural tags. The [Inst] and [Solo] tags create vocal-free passages within your song. You can also add parenthetical instrument directions to guide what plays during these sections.
Example: Mixed Vocal and Instrumental
[Verse] Walking through the city lights Every corner tells a story tonight [Inst] (guitar solo, building intensity) [Chorus] We belong to the night Under neon skies so bright
Genre Fusion Tips
Minimax Music v2 handles genre blending well when you guide it clearly in the style prompt. Structure your prompt with a primary genre and secondary influences to get coherent results.
Primary + Influence
"Jazz fusion, electronic elements, smooth, late night, saxophone, synth pads"
Era + Modern Production
"70s funk, modern production, groovy, bass guitar, clavinet, punchy drums"
Writing Lyrics with Structural Tags
The lyrics field accepts 10 to 3,000 characters and supports 14 structural tags that organize your song into distinct sections. These tags tell Minimax Music v2 how to arrange the track and where to place vocals, instrumental breaks, and transitions.
All 14 Structural Tags
[Intro]Opening section
[Verse]Story sections
[Pre Chorus]Builds to chorus
[Chorus]Main hook
[Post Chorus]After the hook
[Bridge]Contrasting part
[Interlude]Musical break
[Transition]Section connector
[Build Up]Rising tension
[Break]Sparse moment
[Hook]Catchy phrase
[Inst]Instrumental
[Solo]Instrument solo
[Outro]Closing section
Example: Structured Lyrics
[Intro] (soft piano, ambient atmosphere) [Verse] Morning light through the window pane Every whisper calls your name I've been searching for a sign Something real, something mine [Pre Chorus] Can you feel it in the air tonight [Chorus] We are infinite, we are the stars Burning bright through all these scars Nothing in this world can pull us apart [Interlude] (strings swell, gentle build) [Bridge] When the darkness tries to find us We will be the light behind us [Chorus] We are infinite, we are the stars Burning bright through all these scars Nothing in this world can pull us apart [Outro] (fade out, piano and strings)
Lyrics Best Practices
Use Parenthetical Notes
Add stage directions in parentheses to guide the arrangement. For example, (whispering), (guitar solo), or (building intensity) give the model additional context.
Write for Singability
Use simple, natural phrasing. Short lines of 4 to 8 words work best. Avoid tongue twisters, complex vocabulary, or very long sentences that are difficult to sing naturally.
Match Style and Lyrics
Keep the emotional tone consistent between your style prompt and lyrics. Sad lyrics paired with "upbeat, party" in the style prompt will produce confusing results.
Stay Within Character Limits
Lyrics must be between 10 and 3,000 characters. For songs up to 5 minutes, you have plenty of room. Use structural tags and parenthetical notes to fill out the arrangement without needing excessive lyrics.
Minimax Music Prompt Examples
Here are complete prompt templates for common music generation use cases. Each includes both a style prompt and lyrics you can adapt for your projects.
Pop Song with Vocals
Style Prompt:
"Pop, catchy, upbeat, female vocal, synth, bright production, 120 BPM"
Lyrics:
[Intro] (synth arpeggios, building energy) [Verse] Lights are flashing all around Feel the rhythm, feel the sound Tonight we're never coming down This city's ours to own [Pre Chorus] Can you feel it rising [Chorus] Dance with me under the neon glow Let the music take control We are everything we'll ever know Let the night unfold [Inst] (synth breakdown, pulsing bass) [Chorus] Dance with me under the neon glow Let the music take control [Outro] (fade out, echoing vocals)
Cinematic Instrumental
Style Prompt:
"Cinematic orchestral, epic, sweeping strings, brass fanfare, timpani, heroic, slow build"
Lyrics:
[Instrumental]
Lo-Fi Chill Beat
Style Prompt:
"Lo-fi hip hop, chill, mellow, vinyl crackle, jazzy piano, soft drums, warm, 80 BPM"
Lyrics:
[Instrumental]
Rock Track with Lyrics
Style Prompt:
"Rock, electric guitar, powerful drums, male vocal, energetic, raw, 130 BPM"
Lyrics:
[Intro] (distorted guitar riff, crashing drums) [Verse] Standing on the edge of the unknown Fire in my veins, I'm not alone Every road I take leads me back home [Chorus] We rise, we fall, we carry on Through the storm we're standing strong This is where we all belong [Solo] (electric guitar solo, soaring) [Bridge] The ground may shake beneath our feet But we will never taste defeat [Chorus] We rise, we fall, we carry on Through the storm we're standing strong [Outro] (drums fade, final guitar chord rings out)
Try these prompts in our AI audio generator to hear the results.
Editing and Refining Your Music
Getting the perfect track often takes iteration. Minimax Music v2 offers several approaches for refining your generations and pushing results closer to your creative vision.
Generate Variations
Run the same style prompt and lyrics multiple times. Each generation produces a different arrangement, melody, and vocal interpretation. Generate three to five variations and pick the best one.
Iterate on Style Prompts
Tweak the style prompt between generations. If the track is too slow, add "uptempo" or increase the BPM. If the vocals are too prominent, emphasize instruments. Small changes in the style prompt can produce meaningfully different results.
Voice Cloning Reference
Upload a reference audio clip to guide the vocal timbre and style. This lets you maintain a consistent vocal character across multiple generations. The model adapts the singing voice to match the reference while following your lyrics.
Instrumental Reference Audio
Provide a reference track to guide the instrumental arrangement, production style, and overall sonic texture. The model uses this as a template for the backing track while generating new music that follows your style prompt and lyrics.
Refinement Workflow
Start by generating an initial track with your style prompt and lyrics. Listen through and identify what works and what needs adjustment. Tweak the style prompt to shift the genre, mood, or instrumentation. Revise your lyrics to improve flow or add structural variety. Use reference audio if you want to match a specific vocal character or instrumental style. Iteration is the key to getting professional results from Minimax Music v2.
Troubleshooting Common Issues
Here are solutions to the most common issues creators encounter when generating music with Minimax Music v2.
Model Singing Structural Tags
Problem: The model vocalizes the tag names (e.g., singing "verse" or "chorus" out loud)
Solution: Make sure tags use the exact supported format with proper capitalization: [Verse], [Chorus], etc. Avoid custom or unsupported tag names. Place each tag on its own line with a blank line before the lyrics.
Style Prompt Not Being Followed
Problem: The generated track doesn't match the style you described
Solution: Be more specific in your style prompt. Use concrete genre names, specific instruments, and clear mood descriptors. Avoid vague terms like "good" or "nice." If you need a particular vocal style, describe it explicitly (e.g., "breathy female vocal" instead of just "female vocal").
Choppy or Unnatural Vocals
Problem: The vocals sound robotic, choppy, or poorly synchronized
Solution: Simplify your lyrics. Use shorter lines with natural phrasing (4 to 8 words per line). Avoid complex vocabulary, tongue twisters, or lines that are too long. Add [Interlude] or [Inst] breaks to give the vocals breathing room between sections.
Choosing the Right Output Format
Question: Which output format should you select?
Answer: Use MP3 (up to 256kbps) for web sharing and general listening. Use WAV for lossless quality when you plan to do further editing or mixing. Use PCM for raw, uncompressed audio in professional production workflows.
Start Creating Music with Minimax Music
Minimax Music v2 brings professional-quality AI music generation to every creator. With the right combination of style prompts, structural tags, and lyrics, you can create everything from cinematic scores to pop anthems with realistic vocals. The key is to start simple and iterate.
Begin with the templates in this guide, experiment with different genre and mood combinations, and use reference audio to dial in the exact sound you want. The more you create, the better your intuition for prompting will become. Try it now with our AI audio generator.
Looking for more music techniques? Check out our ACE-Step music prompting guide for an alternative approach to AI music generation. You can also explore our Flux image prompting guide, WAN video prompting guide, Kling video prompting guide, or browse our complete suite of creative tools.
Sources & Citations
This guide has been compiled based on research and expert insights from the following sources:
Ready to Create Music with Minimax Music?
Put your new music prompting skills to use with our AI audio generator. Create songs, instrumentals, and soundscapes using the techniques you've just learned.