AI Portrait Tips From Someone Who Failed First

AI Prompt Asset
Extreme close-up portrait of a whimsical giraffe, head tilted 15 degrees, direct eye contact with viewer. Wearing oversized retro cat-eye sunglasses: teal acetate upper frames, butter-yellow lower rims, hot pink mirrored lenses with soft studio light reflections. Perfect spherical bubblegum bubble, candy-pink with subsurface translucency and specular highlights, stretching from lips. Dense Hawaiian-style floral lei around neck—coral peonies, lavender daisies, turquoise carnations, butter-yellow marigolds with morning dew droplets. Razor-sharp fur texture: individual caramel-brown patches with feathered edges against cream base, wispy mane hairs catching rim light. Lighting: single massive softbox 45 degrees above as key, white bounce card below for fill ratio 8:1, creating micro-contrast in whiskers and fur. Background: pure saturated sunflower yellow seamless paper with subtle gradient falloff. Shot on Hasselblad X2D, 120mm macro lens, f/5.6, editorial beauty photography, hypermaximalist detail, controlled chromatic aberration on sunglass edges, --ar 9:16 --style raw --s 750
Prompt copied!

Quick Tip: Click the prompt box above to select it, then press Ctrl+C (Cmd+C on Mac) to copy. Paste directly into Midjourney, DALL-E, or Stable Diffusion!

The Anatomy of a Renderable Portrait

The difference between a prompt that produces generic animal portraits and one that generates a specific, memorable image lies in how you construct physical reality. Most AI portrait failures stem from describing what you want to see rather than how light would physically interact with those materials. This distinction separates aesthetic wishes from technical instructions the model can execute.

Consider the sunglasses in this image. The original prompt specified "oversized retro cat-eye sunglasses with teal upper frames, yellow lower rims, and hot pink mirrored lenses." This works because it breaks a complex object into manufacturable components. The AI's training includes millions of product images with labeled parts—frames, rims, lenses, temples. When you speak in the language of construction, you activate that knowledge.

The critical failure mode here is vagueness. "Colorful sunglasses" or "stylish glasses" leaves the model to interpolate from its average of all eyewear. This produces brown-tinted safety glasses or wireframes with color bleeding unpredictably. The technical mechanism: diffusion models generate by predicting pixel probabilities based on training correlations. Specific component descriptions create narrow probability distributions; vague descriptions create wide ones that collapse to training-set averages.

Material Physics as Rendering Instruction

The bubblegum bubble demonstrates why material description matters more than subject description. "Blowing a bubble" describes an action. "Perfect spherical bubblegum bubble, candy-pink with subsurface translucency and specular highlights" describes how light behaves.

Subsurface scattering is the key term here. It describes light penetrating a translucent surface, bouncing internally, and exiting at a different point—creating that characteristic glow in wax, skin, and yes, bubblegum. Without this term, bubbles render as opaque plastic spheres or disappear entirely because the model cannot resolve "bubble" as a solid object with transparent properties. The specular highlight addition provides the surface reflection that anchors the sphere in three-dimensional space.

The same principle applies to the fur texture. "Razor-sharp fur texture showing individual caramel-brown patches against cream base" works because it describes both resolution (razor-sharp, individual) and color mapping (patches against base). The alternative—"realistic giraffe fur"—activates the model's concept of "realistic" as a quality judgment, not a physical specification. This produces smooth, airbrushed texture because "realistic" in training data often means "professionally retouched."

Lighting as Spatial Construction

Portrait lighting in AI requires the same specificity as studio photography. "Blazing high-key lighting from massive softbox above, subtle fill from below" succeeds because it specifies source quality, direction, and ratio.

The massive softbox creates a large, diffused source that wraps around facial contours—essential for the giraffe's elongated muzzle, which harsh lighting would flatten into shadow. The 45-degree angle (implied by "above" in portrait context) creates dimensional modeling without the horror-movie effect of direct overhead. The fill specification prevents the common failure mode of "studio lighting": flat, shadowless rendering that eliminates depth cues.

Color temperature deserves particular attention in fashion portraits. This prompt omits explicit Kelvin values because the saturated background and accessory colors provide sufficient chromatic anchoring. However, in neutral-toned portraits, specifying "5600K key with 3200K fill" creates the warm/cool contrast that distinguishes professional photography from AI defaults. Without temperature differential, models tend toward neutral gray-white lighting that feels clinical rather than intentional.

The background choice—"pure saturated sunflower yellow, seamless paper curve"—demonstrates environmental control. Specifying "seamless paper curve" creates the subtle gradient falloff that prevents the cutout effect of flat color backgrounds. The saturation level matters: "sunflower yellow" anchors the specific warmth, preventing the mustard or safety-orange drift that "yellow background" produces.

Camera Specifications as Style Anchors

The Hasselblad X2D, 120mm macro, f/5.6 specification does more than add technical flavor. It constrains the rendering to a specific optical signature.

Medium format sensors (X2D's 100MP medium format) produce a distinctive depth of field character—shallower than full-frame at equivalent apertures due to larger physical sensor size. The 120mm macro at portrait distance creates compression that flatters without the facial distortion of 85mm or the clinical detachment of 200mm. f/5.6 on medium format provides enough depth for sharp eyes and bubble surface while allowing background separation.

The "editorial beauty photography" style tag activates a specific genre in the model's training: high-production cosmetic and fashion imagery with controlled skin/fur texture, precise color grading, and commercial lighting. This prevents the drift toward photojournalistic or amateur snapshot aesthetics.

Chromatic Strategy for Complex Subjects

Multi-colored subjects with patterned subjects (giraffe + lei + sunglasses + background) risk chromatic chaos. The prompt manages this through hierarchical color assignment.

The giraffe's natural pattern provides neutral anchor points—caramel and cream. The sunflower yellow background creates harmonic relationship with the sunglass yellow rims. Teal and hot pink provide complementary contrast points that the AI renders reliably because they're specified as distinct material properties (frame color vs. lens color) rather than distributed attributes.

The lei flowers follow the same principle: coral, lavender, turquoise, butter-yellow. Four distinct hues, each assigned to specific flower types. This prevents the color bleeding that occurs when "colorful flowers" distributes saturation randomly across the garland. The dewdrops add specular anchor points—small, bright highlights that unify the floral mass into a dimensional object.

For related approaches to controlled color in AI portraiture, see our guide to dramatic feathered portraits and the technical breakdown of porcelain material rendering.

When Parameters Matter

The --s 750 stylization value sits in a deliberate middle ground. At default (--s 100), Midjourney applies heavy aesthetic interpretation, potentially smoothing the fur texture and simplifying the sunglass reflections. At --s 1000, the model adheres so strictly to the prompt that minor phrasing issues produce visible artifacts. 750 allows the prompt's material specifications to dominate while permitting the model's compositional intelligence to resolve spatial relationships.

The --style raw parameter removes Midjourney's default beauty-grade processing, preserving the micro-contrast and texture detail that "editorial beauty photography" already specifies. Using both the style tag and --style raw creates intentional redundancy—the tag guides the aesthetic direction, the parameter prevents automatic smoothing.

For understanding how different models handle similar technical specifications, Midjourney's documentation provides parameter behavior details that transfer to other diffusion-based systems.

The portrait succeeds because every element resolves to physical properties: materials that interact with light, colors that occupy specific objects, lighting that creates dimensional form. This is the difference between describing a vision and engineering an image. The failure that precedes success is almost always a prompt that asks for beauty without specifying how beauty manifests in physics.

Label: Fashion

Key Principle: Break every wearable object into its physical components and assign distinct materials; "sunglasses" fails, but "teal acetate upper frames, yellow lower rims, pink mirrored lenses" renders exactly what you envision.