AI Studio Stem Generation: Guidelines

Moises AI Studio

What it does

It generates new audio parts that lock to your music, complementing an original session, a demo, or a single stem.

Start here: Set up a strong foundation

  • Lead with harmony, not melody: You’ll get far better results if your context is a chordal bed (guitar, piano, keys, pads, or even a simple block-chord synth). A lone melody (vocal, brass, solo guitar) is harder for the model to infer harmonically.
  • For remixes, a well-balanced final mix can serve as a good starting point (context). If the mix is too complex, consider isolating individual stems and muting some of the busier tracks.

Conditioning

Conditioning refers to the element that the model attempts generate similar content based on. It could be a preset, an audio reference or a text prompt.

Best control: Audio conditioning

To best control the model’s performance in terms of groove, play style, and timbre, you should provide a reference stem. These serve as the strongest signals for influencing the model’s output. If you don't have the stem separated, use our stem separation module to do so.

More Flexibility: Text prompts

Text prompts offer greater creative flexibility than audio prompts but with less predictable results. While they won't give you the precise control that audio conditioning provides, they excel at steering the overall direction, letting you nudge genre, playing style, density, and sonic characteristics (e.g., "blues rock drums, punchy kick, crisp snare drum"). Use text prompts on their own for exploratory generation, or combine them with audio conditioning.

Genre control: Presets & AI Match

We've put together some conditionings based on common needs. Each preset uses an internal audio or text prompt for quick, predictable outcomes. The AI Match option creates stems based on the context you give and the model's understanding of the style/genre. This mode has very high variability, so generate at least three variations for better results.

Advanced Params

These parameters will directly influence your generation results.

Creative Control

Creative control gives you the ability to modify the model's behavior based on your audio context and conditioning. Adjusting parameters will drastically affect the results. The model considers three key factors for each generation:

  • Conditioning: A target sound via audio reference, text prompt, or preset that steers timbre, playing technique, micro-patterns, and genre feel.
  • Audio context: Your stems (e.g., guitar/piano) used to infer tempo, phrasing, structure, and key, so new parts lock to timing and form.
  • Harmony adherence: Controls how strictly the generated stem follows chords, key, and scale detected from the audio context. This is not available on drums generation.

You have the flexibility to assign distinct weight values to Context, Conditioning, and Harmony. If you prefer to maintain consistent weighting across these elements, set them all to the same value.

  • Higher values = tighter adherence and more predictable results (less creative).
  • Lower values = more adventurous, exploratory results (more creative).

Optimize Conditioning

This parameter is available in Custom generation and is on by default. Rather than using your audio directly, it leverages similar embeddings from the model's training data. This produces output that resembles your audio without being too similar. You can turn this option off for closer fidelity to your reference, though this might result in less stable or lower quality output depending on how familiar our model is with the specific recording characteristics. We recommend experimenting with turning it off, especially if your reference stem is high quality and belongs to a well-known musical genre.

Generation workflow (order that tends to work well)

  1. Start with a strong chordal foundation using a guitar or piano stem, or a mix that clearly establishes the harmony.
  2. Next, generate the bass line to anchor the harmonic structure.
  3. Generate drums that complement both the chordal bed and bass (this creates a more cohesive groove).
  4. Finally, add supporting harmonic elements (pads, keys) followed by rhythmic and percussive accents.

Variability is high: Exploit it!

When generating, create multiple options. Minor adjustments can lead to significantly varied results. Generate several options using different conditioning and/or parameters, then evaluate them as you would different takes in a traditional workflow. Since the model's output varies with each seeding, it is recommended to generate multiple results, even when using identical conditioning.

When the model "messes up," regenerate smartly

To regenerate only a specific bar range, select the desired region and press "Regenerate." For optimal results, utilize the custom regeneration mode, which allows for parameter and conditioning adjustments. When regenerating, even with significant changes in conditioning, the original generation timbre and instrument characteristics are retained. However, the playing style and creative elements will vary.

Fine-tune Settings Based on Your Specific Needs

  • If timing feels off: Increase the audio context strictness value.
  • If you hear wrong-key moments: Increase the harmony strictness value compared to the other creative controls.
  • If results are too safe or boring: Lower all creative control values.
  • If you need a drum fill: Condition the model with a brief audio clip of a drum fill.
  • If you want more variability: Try adding or muting stems in your context to influence the regeneration process.
  • If results are overplaying/busy: Reduce the conditioning strictness values compared to the other creative controls.
  • If output is too generic: Provide a better style audio reference and/or reduce overall creative control values for more uniqueness.

Things to Remember

  • For audio conditioning, use original stems when you can, or isolate them with source separation.
  • Use text prompts to help steer style and density, but if results aren't satisfactory, try audio conditioning or a preset instead.
  • Generate broadly, then refine locally: variation first, regeneration for fixes, and surgical edits to finish.

Current Limitations

  • Our current models excel at generating harmonic accompaniments (like chords and rhythm sections) but are less effective at creating melodic lead lines or solos.
  • Our current training datasets lack sufficient data from less globally popular genres—such as Brazilian music (forró, pagode, sertanejo), Indian classical music, and Middle Eastern music. Attempting to generate stems for these unfamiliar genres will produce poor results. We plan to improve genre diversity in future model versions as we license and incorporate more data.