Mandi Lu
Author's Statement
This paper was written for 76-101: Interpretation and Argument, AI and Art, taught by Professor Chad Szalkowski-Ference in the Writing and Communication program, Department of English. This semester-long course explored the intersections of artificial intelligence and artistic practices. For this assignment specifically, we were asked to explore an underexamined area of that intersection.
In this piece, I focus on the developing role of AI in cinematic architecture and set design. This area is often overlooked in the world of AI-generated art, as it doesn’t typically carry the uncanniness of commonly-used AI art platforms nor the immediate autogenerated disclaimer that AI-made art on social media comes with. Yet, it’s regularly consumed by audiences, just the same. While much of the conversation surrounding AI and art understandably centers on the artist, I was interested in shifting the perspective to the viewer.
This work reflects my deep interest in architecture, storytelling, and emerging technologies. Ultimately, I hope this piece encourages both viewers and creators to think more critically about how the usage of AI in shaping cinematic environments is shaping us back.
I would like to sincerely thank Professor Chad Szalkowski-Ference and Dr. Alan Thomas Kohler for their invaluable guidance and support throughout the development of this work.
- Mandi Lu
Algorithms, Architecture, aaand Action: The Evolving Role of AI in Cinematic Architecture
Abstract
Cinematic architecture sits at the intersection of two innately human desires to tell stories and create things. Yet in recent years, this foundational intersection has been increasingly shaped by the non-human. This study investigates the impact of artificial intelligence (AI) on authorship, creativity, and audience perception in cinematic architecture. Using a perceptual survey, participants evaluated paired human- and AI-generated set design concept art pieces on narrative coherence, expressive intent, creativity, and emotional resonance, while also attempting to identify AI-generated work. Overall, participants correctly identified AI-generated images just over half the time, with experience in architecture and/or design significantly improving accuracy, particularly among younger respondents. Across all audiences, the concept art believed to be human-made consistently received higher ratings for creativity and emotional impact, regardless of actual authorship. These findings highlight how assumptions about authorship shape the interpretation of visual environments and suggest that perceptions of AI and human-made art impact how viewers consume media.
1. Introduction
The first films, created exploratorily in Europe during the late 19th century, were colorless, plotless, silent, and shorter than the average TikTok video (Whittle 2025). In today’s standards, they would likely appear incredibly simplistic and perhaps even a little boring for the modern audience. As new technical elements were introduced and integrated into cinema, filmmaking evolved from inventive experimentation into a rigorous industry, and films transformed from short clips into gripping narratives. While characters, score, plot, and special effects often take center stage (or screen) in shaping the impression of a film, one factor has always played a critical role in filmmaking: the universe in which the film exists. It is the responsibility of set designers to create film architecture that realizes this universe.
Film architecture is the design of spaces, whether practically or digitally, for cinematic storytelling. This job is done by film architects, also called set designers. The field of film architecture is instrumental in establishing the emotion, narrative, and believability of a film. Through meticulous research and design, film architecture is responsible for the ever-evolving location and worldbuilding elements that undergird both the literal and metaphorical background of filmmaking.
Recent developments in artificial intelligence (AI) have allowed AI-powered tools to pervade the film architectural world. With AI’s ability to efficiently produce artistic renderings, 3D models, and concepts, an increasing number of bodies within the film industry have incorporated AI tools like MidJourney, NVIDIA Omniverse, and numerous others into their set-design processes. Proponents of AI-assisted film architecture believe it has the potential to minimize costs, accelerate progress, and complement the rest of the filmmaking process.
However, there are significantly fewer concrete predictions on how this may impact the way films are consumed by audiences. Therefore, it remains unclear how audiences will perceive authorship, creativity, and narrative intentionality in AI-generated cinematic architectural spaces when compared to perceptions of human-designed concept art. This study investigates how AI tools reshape the meaning of creative agency within film production design and, in doing so, seeks to help architects and creatives navigate challenges in the filmmaking process and enhance experiences for movie-makers and audiences alike. As emerging technologies continue to transform the tools of creation, they are quietly redefining how the stories we experience and connect through are visualized and told, a reminder that innovation in design is also innovation in storytelling.
2. Literature Review
2.1 Challenges Faced in Designing Film Architecture
To understand how AI is currently used in film, it’s important to understand the contemporary context of designers in the film industry. First and foremost, set designers are limited by time and budget constraints imposed by studio authorities. Set designers operate within the art department, led by the art director, who ensures that set designers’ projects can be achieved within these constraints. Deadlines and resource availability must be molded around the schedules, abilities, and policies of various other departments, all of which are ever-changing, a reality reflected in dramatic swings in shoot days and regional production capacity. For example, in the third quarter of 2025, the Greater Los Angeles area recorded 4,380 total shoot days across all categories, a 13.2% decrease from the previous year, and a number that is projected to continue to decline (FilmLA 2025).
Beyond these constraints, set designers are also tasked with maintaining believability while balancing storytelling and film aesthetics. Set designers need to maintain consistent visuals that enhance the desired mood, while also ensuring their sets are believable, even when fantastical. Film studies expert Liza Bode explains that the fear of incredulous viewers is reemerging in the film industry due to increased media exposure, online forums, and various photo manipulation tactics (2018).
Along with believability, set designs must also be visually compelling, despite visuals often clashing with storytelling. In the original Harry Potter movies, author J.K. Rowling intended for the student characters not to wear any school uniforms, in line with the original books from which the movies were derived. However, film designers immediately realized that this would cause a range of colors and textures across the costumes that was visually disharmonious (Fraser-Crook, J. 2010). This exemplifies a tension between visual cohesion and narrative fidelity, a tension common among set designers who ensure that the visual environment supports rather than distracts from the story. Without this cohesion, aesthetic inconsistencies could disrupt audience immersion and weaken the film's impact.
Set designers are also confronted with ethical concerns, particularly surrounding the growing homogenization of visual style in contemporary film. Hollywood continues to face backlash for a perceived lack of originality as large studios increasingly prioritize brand consistency over a distinct visual identity. Netflix, for instance, has been criticized in recent years for what is colloquially called the “Netflix look.” This look is characterized by even lighting, muted colors, and a lack of contrast, creating visuals that are flat and desaturated (Miller 2022). While this aesthetic optimizes clarity for viewers across multiple viewing conditions, it can result in projects that feel indistinct and dull. This trend raises ethical concerns for designers, as it risks limiting opportunities for creative expression by forcing a decision between creating unique environments that enhance narrative and conforming to platform-driven aesthetics that prioritize mass appeal and production efficiency.
2.2 Current Applications of AI Tools in Film Architecture
In the early stages of production, designers may use generative AI tools to produce visual concepts. Generative adversarial networks (GANs) and diffusion models (DMs) are most commonly used in this stage. Architecture firms, both within and outside the film industry, have used GANs to create concept images derived from text prompts as well as sketches. GANs have also been used to colorize human-made concept sketches (Li et al. 2024).
Midjourney is particularly effective at deriving concept design from prompts. Midjourney is a diffusion model that can generate both images and videos from text prompts, images, or both. It visually translates prompts, fed by a human, significantly cutting down the time required by hand-sketching. Unlike some of its DM counterparts, like Stable Diffusion and Stable Houdini, Midjourney allows users without significant artistic experience to effectively use it (Gao 2025). However, in a film crew of meticulously chosen industry professionals, user experience is less likely to be a concern. While Midjourney and AI tools as a whole can accelerate the brainstorming process, set designers and their administrative leaders (Art directors and/or production designers) are still the ones guiding it. Their creative intuition is necessary for prompting the AI tools and curating the results.
Following idea generation, early models of any scene are then constructed. This is a process that’s typically shared across several departments. The production designer in charge collaborates with architects, lighting engineers, the director of photography, and the visual effect department. They must also work with the director, cinematographer, and producer, among others, to ensure their designs are viable and adhere to the plot, characters, and the film’s intended style (DeGuzman 2023).
AI 3D modeling tools can streamline disconnects between these parties. Specifically, NVIDIA Omniverse can digitally render scenes in 3D while offering real-time synchronization between applications like Maya, Blender, and Unreal Engine (NVIDIA 2021). This enables simultaneous coordination between artists across physical barriers, allowing factors such as textures and geometries to be instantly updated. Additionally, complex lighting can be virtually simulated by constructing physical sets to do so (Lee 2024). Tools like NVIDIA Omniverse offer more flexibility to artists while also reducing the required time and costs within the set-design process.
These virtual models are either digitally incorporated into the film or used to physically fabricate set pieces. In the case of The Mandalorian (2019), AI tools allowed ILM Stagecraft to blend both methods. ILM StageCraft used real-time rendering powered by Epic Games' Unreal Engine and NVIDIA GPUs to display virtual landscapes on massive LED screens. Using this unique set, The Mandalorian’s set design team could “build” anything virtually on their stage. They could also switch between “filming locations” across the globe in under an hour (Obropta 2020).
Typically, digital and practical elements are more separated than they were on The Mandalorian. While spaces more closely tied to the narrative or requiring greater interaction are typically constructed physically, studios often digitally create backgrounds and larger landscapes in post-production (Crawford 2024). This process, especially when used to create urban environments, can become tedious and time-consuming because of its repetitive nature. However, AI-driven algorithms are increasingly used in constructing large cityscapes and landscapes. By training on large datasets of architectural forms, these GANs and DMs can autonomously generate structures that mimic real-world design logic while maintaining originality (Poolkrajang et al. 2024).
One such example of AI tools being used to create large backgrounds is seen in Spider-Man: No Way Home (2021). Digital Domain, the project’s primary VFX Studio, used the machine-learning software "Charlatan" to create a full 3D city and generate lifelike digital doubles more quickly. Across the full film, Digital Domain contributed over 520 shots and 600 unique assets, including vehicles and props (Milligan 2022). In other words, AI tools can significantly increase the scale and efficiency of set design, allowing entire environments to be rendered with unprecedented speed.
2.3 The Gap
Current research overwhelmingly frames AI in film design as a technical aid. Across each of these domains—visualization, collaboration, simulation, and generation—AI serves as an assistant rather than a complete replacement. However, knowledge of aesthetics, world-building logic, and spatial knowledge is still necessary to achieve resonant worlds. In this sense, the “architect” of a film is shifting from a solitary creator to more of a conductor of artificially intelligent tools. As AI tools become increasingly more advanced and common, the role of a architect-as-conductor grows more apparent. While AI tools aid in efficiency, what remains less examined is how these technologies alter the creative dynamics of the design process itself: how does AI impact creativity, authorship, and artistic agency as perceived by the industry’s audience? By focusing on creativity and authorship, this study contributes a perspective that is largely absent from existing literature. As these systems continue to evolve, they not only change how film architecture is built but also redefine the visual language of cinema in a new technological era.
2.4 Theoretical Framework: Defining Creativity and Authorship
This methodological framework draws on two theories to evaluate whether AI systems disrupt or redistribute creative agency within cinematic design workflows. This analysis is anchored in two foundational texts.
- “The Death of the Author” (1977) by Roland Barthes
Barthes argues that the meaning of a work does not reside solely in the artist’s intent, but in the interpretive space constructed by the viewer. In the context of AI tools, this raises questions about whether authorship shifts from designer to dataset, algorithm, or audience interpretation. The study uses Barthes to examine whether Midjourney-generated images obscure the role of the human concept designer. - “The Work of Art in the Age of Mechanical Reproduction” (1969) by Walter Benjamin
Benjamin’s concept of the “aura,” or the authenticity tied to an artwork’s unique presence, offers a critical lens for analyzing AI outputs. Since AI-generated images lack a singular origin or traceable craftsmanship, this framework helps examine whether participants perceive them as possessing diminished aura, or whether AI tools can artificially reconstruct it.
By integrating empirical perception data with authorship theory, this methodology assesses what their judgments reveal about shifting cultural definitions of creativity in architecture and cinematic design.
3. Methodology
This study combines perceptual evaluation, authorship theory, and qualitative analysis to investigate how emerging AI-powered image-generation tools reshape creative decision-making in cinematic set design. The methodology has two major components: (1) a perceptual survey measuring participants’ ability to interpret and distinguish human-generated set designs from AI-generated ones, and (2) a theoretical framework for analyzing how creativity and authorship are redistributed when computational systems enter the design workflow.
3.1 Stimuli Selection and Image Generation for Survey
To compare human and AI-generated designs under controlled conditions, this study curated six pairs of images. Each pair contained one image by humans and one by AI. Figures 1-6 show each pairing. The full survey can be found under Appendix A.

Figure 1: Human-generated set design on the left, AI-generated set design on the right.

Figure 2: Human-generated set design on the left, AI-generated set design on the right.

Figure 3: Human-generated set design on the left, AI-generated set design on the right.

Figure 4: Human-generated set design on the left, AI-generated set design on the right.

Figure 5: Human-generated set design on the left, AI-generated set design on the right.

Figure 6: Human-generated set design on the left, AI-generated set design on the right.
All human-generated set designs were sourced from publicly available concept art posted on ArtStation. Images were selected from professional production designers whose descriptions included narrative, material, spatial, or atmospheric intent. These images serve as the baseline representation of contemporary digital concept art created through established workflows in the film and game industries.
AI-generated images were created using Midjourney v6. For each selected human-made concept piece, the original artist’s written description on ArtStation was copied verbatim and appended to a standardized prompt: “Set designs based on this description:” followed by the original text. No stylistic modifiers or subjective adjectives were added. For consistency, only the first output generated by Midjourney was chosen. This procedure ensured that images created by AI reflected the artist’s conceptual language while remaining free from user-curated cherry-picking.
This pairing strategy allowed each AI-generated image to be structurally linked to a specific human work, enabling direct comparison along narrative, spatial, atmospheric, and material axes without confounding variables such as differing subject matter. After the image pairs were generated, participants were asked to identify which was made by humans and which was made by AI. They were then asked to rank certain qualities of each image.
3.2 Survey Design
A perceptual identification study was conducted using Google Forms. The survey consisted of a sequence of image pairs, with each pair containing one human-generated set design and one AI-generated Midjourney output based on the same description. Participants were all shown the same sequence of images and were not told which was which.
Participants viewed each pair and were asked to indicate which image they believed was AI-generated (Image A / Image B / Unsure).
They were then asked to rank their agreement with a series of statements on a 7-point Likert scale. (1=Strongly Disagree, 7=Strongly Agree). These statements measured 4 qualities that draw from Barthes and Benjamin’s theoretical frameworks:
Quality |
Corresponding statement(s) on form |
| Narative Coherence | "This image clearly supports a coherent story or cinematic scenario." |
| Expressive Intent |
"This image appears intentionally expressive rather than random." "Materials and lighting behave in a motivated, purposeful way." |
| Creative Originality | "This image feels creative or oringial." |
| Emotional Resonance | "This image evokes a clear emotional response." |
The survey also contained:
- an attention-check question to ensure valid responses
- demographic items, including experience with digital art, architecture, AI tools, or film design, and
- a debrief informing participants of the study's purpose.
3.3 Participants
34 participants, with ages ranging from below 18 to over 55, were recruited from college students, design peers, and general audiences familiar with visual media. No specialized expertise in architecture or set design was required. All participants consented to anonymous data collection through the Google Forms interface. Responses failing the attention-check were excluded before analysis.
3.4 Analytical Approach
The analysis examined both perceptual accuracy (participants’ ability to identify AI authorship) and evaluative judgments (participants’ ratings of narrative, expressive, creative, and emotional qualities).
For each image pair, participants selected “Image A”, “Image B”, or “Unsure” to indicate which image they believed to be generated by AI. Responses were coded as:
- Correct (participant accurately identified the AI image)
- Incorrect (participant misidentified the human image as AI)
- Unsure (coded separately, not treated as incorrect)
These results serve as a measure of perceived authorship transparency, or how visually legible AI authorship is in cinematic architectural imagery. For each image (not each pair), participants rated five dimensions using a 1–7 Likert scale. Ratings were averaged for AI-generated images and for human-made images. Independent-samples tests (or nonparametric equivalents, depending on distribution) compare whether participants consistently rated one category higher across these creative dimensions. These measures operationalize concepts drawn from authorship theory (intentionality, originality, and affective presence) in quantifiable form.
Participants’ reported experience with architecture, digital art, concept design, film production, and AI image generation was used to explore whether domain familiarity correlates with accuracy or evaluative judgments. All participants who selected “Yes” to having prior experience with any of the areas mentioned above had their average accuracy measured separately. All those who selected “No” then had their average accuracy measured separately. The two accuracy ratings were then compared. This determines whether experts perceive creative authorship—and its visual markers—differently from general audiences.
4. Results and Discussions
4.1 Authorship and Identification Accuracy
Across all image pairs, participants demonstrated limited ability to correctly identify which images were generated by AI versus humans. Overall accuracy across the 12 stimuli was moderate, with participants correctly identifying the AI-generated image approximately 60.25% of the time (SD = 3.9%). The majority of respondents were correct for 5 of the 6 pairings. “Unsure” responses on average accounted for 21% of judgments. Those responses were neither considered correct nor incorrect and were disregarded when calculating accuracy. Figure 7 displays the accuracy of author identification for each pairing. The full survey is in Appendix A.

Figure 7: Data visualized using ChartMaker (2026).
Human-made images were frequently mistaken for AI, while several AI-generated images were judged as human-made, suggesting that participants often relied on surface-level visual cues or preconceived associations about “AI-ness” rather than consistently identifiable stylistic features. These findings indicate that visual authorship is not inherently transparent in cinematic architectural imagery.
4.1.1 Accuracy by Architecture/Design Experience
The study also examined whether participants’ prior experience in architecture, film, or visual arts influenced their ability to accurately identify AI-generated versus human-made set designs. Participants who indicated that they had studied or worked in film, architecture, visual art, or production design (“Yes” group) demonstrated substantially higher accuracy than those without such experience (“No” group). Specifically, experienced participants correctly identified AI-generated images 76.83% of the time, compared to 49.19% for participants without relevant experience.
Interestingly, within the experienced group, younger participants were the most accurate, suggesting that both familiarity with design principles and recent exposure to digital media or AI tools may enhance perceptual sensitivity to subtle cues in cinematic architecture. These results indicate that domain knowledge significantly shapes the ability to discern AI authorship, reinforcing the idea that surface-level visual cues alone are often insufficient for accurate judgment. Overall, these findings highlight the role of expertise in navigating AI-mediated creative outputs: professional or educational experience in design disciplines improves the legibility of AI authorship, whereas lay audiences are more prone to misattribution or reliance on cognitive biases.
4.2 Comparative Perceptual Ratings: Humans vs. AI and "Believed Human" vs. "Believed AI"
To better understand how audiences evaluate cinematic environments, participants’ ratings across four perceptual dimensions—Narrative Coherence, Expressive Intent, Creativity/Originality, and Emotional Resonance—were compared both by actual authorship (human vs. AI) and by perceived authorship (what participants believed was human- or AI-made). This dual comparison provides insight not only into the aesthetic qualities of AI-generated sets but also into the cognitive expectations that shape audience interpretation.
4.2.1 Human-Made vs. Ai-made images
Trends were seen when comparing images that were human-made versus AI-made. This comparison focuses on actual authorship rather than participant perception, allowing differences in evaluation to be attributed to the production method itself. These trends are broken down by the 4 areas of study listed in Section 3.2—narrative coherence, expressive intent, creativity/originality, and emotional resonance—to determine whether consistent distinctions emerge between human-made and AI-generated set designs. The images made by AI were B, C, F, H, I, and L. The images made by human artists were A, D, E, G, and J. These groupings were then analyzed across each category to identify patterns in how participants responded to the two modes of creation.
4.2.1.1 Narrative Coherence
Across all pairs, AI-made images tended to score slightly higher in narrative coherence, but the margins varied. Images K, G, and D—all human-made—received some of the strongest coherence ratings (6.56, 5.88, 5.23). However, several AI-generated images also scored competitively. For example, Image F (AI) received a high coherence rating of 5.71, nearly matching its human counterpart in Pair 3. This suggests that contemporary generative models are capable of producing visually coherent cinematic spaces that viewers interpret as narratively functional, even without an underlying story architecture.
4.2.1.2 Expressive Intent
Human-made images generally scored higher on intentionality—particularly Image K (6.29) and Image G (5.38). AI images were more variable, with some (like C and F) performing strongly, while others (like B and H) were rated less purposeful. This wider spread suggests that AI systems do not convey a clear sense of design intention across examples as consistently as human-made images. Overall, participants tended to attribute more purposeful lighting, material logic, and intentional expression to human works. This likely reflects viewers’ assumptions about the designer’s presence. Human authorship implies conscious planning and narrative motivation behind choices. Contrastingly, while AI outputs might appear stylistically coherent, they are also less explicitly motivated, leading to uncertainty about whether compositional elements were placed intentionally or generated through pattern-based synthesis. As a result, AI scenes may be perceived as less intentional, not due to a lack of complexity, but due to ambiguity surrounding decision-making.
4.2.1.3 Creativity/Originality
Creativity showed the most divergence. Human-made images—especially J (6.09) and K (6.06)—received the strongest originality scores, suggesting that participants perceive more distinct ideas in human designers. AI images tended to cluster around the mid-range (4.4–5.1). This pattern indicates a persistent belief that creative originality is tied to human authorship, even when AI images were visually compelling. This tighter grouping also suggests that AI-generated images may be more conceptually familiar, as they draw from recognizable themes. Participants appeared more cautious about labeling AI work as “creative,” despite comparable visual complexity. These hesitations may imply that we can sense when art stems from existing data rather than intentional concept development.
4.2.1.4 Emotional Resonance
Emotional response also favored human images overall. Human-made images G (5.65), K (5.74), and J (4.88) scored noticeably higher than their AI counterparts, indicating that participants experienced clearer emotional tones in these pieces. Several AI images—especially B, F, H, and I—clustered around the low to mid 4’s. This suggests that viewers perceive human-made environments as more emotionally intentional and affectively legible, possibly due to subtle cues in composition, atmosphere, and worldbuilding logic. Human designers may embed emotional signals in deliberate lighting or guiding lines, which guide the audience’s interpretation. AI tools, in comparison, may lack the same deliberacy due to how they synthesise several design decisions instead of intentionally pursuing a few. Therefore, participants may have interpreted human-made images as more emotionally resonant. Collectively, these actual-author comparisons indicate that while AI-generated set designs often achieve technical coherence, they still fall short, on average, in intentionality, emotionality, and perceived originality.
4.2.2 "Believed Human"-made vs. "Believed AI"-made Images
A second pattern emerged when comparing images that participants believed were human-made versus AI-made, regardless of the truth. The images most commonly believed to be AI-generated were B, C, E, H, I, and L.
4.2.2.1 Narrative Coherence
Images believed to be AI-made tended to receive slightly lower coherence scores, even when they were human-made (e.g., E scored 4.68 and was widely misidentified as AI-generated). Conversely, images that a higher proportion (although not the majority) of participants believed to be human typically scored higher, even when they were actually AI-generated (such as C and I). This reveals a top-down cognitive bias: coherence is partially inferred from presumed authorship.
4.2.2.2 Expressive Intent
A similar pattern emerged with expressive intent. Images misidentified as AI-generated (e.g., E and H) received lower scores on purposeful lighting and composition. By contrast, images believed to be human-made (such as C, J, and K) scored higher in perceived intentionality regardless of their actual origin. This suggests a persistent mental model in which AI is associated with randomness or lack of design agency, shaping the way viewers interpret lighting, form, and spatial logic.
4.2.2.3 Creativity/Originality
Perceived authorship had the strongest effect here. Images believed to be human were systematically rated as more creative or original. This includes cases where participants were wrong: for instance, Image I (AI, but widely mistaken for human) received an originality score of 5.00, outperforming several human images. This suggests that judgments of creativity were influenced not only by the visual content but also by assumptions about the creator. In other words, when viewers believed a human was responsible, they appeared more willing to interpret unusual or complex elements as intentional innovation rather than stylistic variation. The data demonstrates a powerful attribution effect: People project creativity onto images they believe had a human author.
This aligns with existing scholarship on human–AI co-authorship and “algorithmic skepticism.” Participants may associate creativity with conscious ideation, personal experience, or deliberate experimentation—qualities typically linked to human designers. As a result, identical visual features may be interpreted differently depending on perceived authorship. This reinforces the idea that perceived creativity is not solely a property of the image itself, but is shaped by expectations about authorship and agency.
4.2.2.4 Emotional Resonance
Emotional response followed the same trend: images believed to be human (C, G, J, K) produced stronger emotional resonance, even when similar affective cues existed in AI-generated images. This suggests that participants were more likely to interpret factors like atmosphere, lighting, and spatial storytelling as emotionally significant when they assumed a human designer was responsible for them. In these cases, viewers may have read subtle compositional elements such as contrast, scale, or implied narrative as deliberate attempts to evoke a feeling. By contrast, when images were believed to be AI-generated, comparable visual cues may have been interpreted as stylistic rather than expressive, resulting in weaker reported emotional impact.
The key implication is that emotional response is mediated not only by visual content but by the viewer’s assumptions about intention, authorship, and agency. When viewers believe an image has a human creator, they may subconsciously search for emotional meaning, amplifying perceived resonance. Conversely, they may approach AI-labeled images more analytically, with viewers less inclined to attribute emotional intent.
4.2.3 summary of perceptual trends
Across both actual and perceived authorship comparisons, four major trends emerge:
- AI images are becoming narratively coherent, but intention and emotion
- Misidentified human images (e.g., E) consistently received lower scores, confirming that expectations shape interpretation.
- Creativity shows the strongest authorship bias—participants hesitate to assign creativity to AI outputs, even when visually strong.
- Participants consistently rated images they believed were human higher across all perceptual dimensions.
Table 2 displays average rankings across all 4 perceptual dimensions for AI-made and human-made concept art. Table 3 displays average rankings across all 4 areas for what the majority of respondents believed to be AI-made and what the majority of respondents believed to be human-made concept art.
Dimension |
Human-made |
AI-made |
Narrative Coherence |
5.48 | 5.53 |
Expressive Intent |
5.30 | 5.15 |
Creativity/Originality |
5.23 | 4.73 |
Emotional Resonance |
4.90 | 4.68 |
Table 2: Human-made images scored higher on Expressive Intent, Emotional Resonance, and especially Creativity/Originality, while AI-generated images outperformed humans in Narrative Coherence.
Dimension |
Perceived Human-made |
Perceived AI-made |
Narrative Coherence |
5.56 | 5.35 |
Expressive Intent |
5.44 | 4.10 |
Creativity/Originality |
5.25 | 4.71 |
Emotional Resonance |
4.97 | 4.60 |
Table 3: Images believed to be human-made outperformed those believed to be AI-generated across all categories, with performance gaps much larger than those observed between images that were actually human-made versus actually AI-generated.
Together, these findings highlight a crucial tension in cinematic design: AI-generated environments may be visually sophisticated enough to pass as human-made, but viewers still fundamentally associate creativity, emotional resonance, and intentionality with human authorship.
4.3 Preference for Human-Made Images in Final Scene Selection
Across all six image pairs, participants were asked which image they would choose for the final scene of a hypothetical film sequence. A clear and consistent pattern emerged: respondents overwhelmingly preferred the images they believed were human-made, regardless of the images’ actual authorship.
Notably, this preference held even when participants misidentified the images. For example:
- Image C (AI-made) was frequently chosen as the preferred final-scene environment—but only because respondents incorrectly believed it to be human-made.
- Conversely, Image E (human-made but widely misidentified as AI-generated) was rarely chosen, despite being a professionally designed set with strong compositional logic.
This demonstrates that participants’ choices were shaped far more by their belief about who created the work rather than the visual qualities of the work itself.
5. Conclusion
This study investigated how audiences perceive AI-generated versus human-made cinematic architecture, examining both perceptual accuracy and evaluative judgments across narrative coherence, expressive intent, creativity, and emotional resonance. The findings demonstrate that while AI-generated set designs can achieve impressive visual coherence, participants consistently attribute higher creativity, intentionality, and emotional resonance to images they believe are human-made. Expertise in architecture, film, or visual art significantly improves the ability to identify AI authorship, suggesting that domain knowledge enhances sensitivity to subtle cues, whereas lay audiences are more susceptible to misattribution and cognitive biases. Furthermore, participants’ preference for human-made images in hypothetical final-scene selections highlights the central role of perceived authorship in shaping aesthetic and narrative evaluations.
These results reveal a tension in cinematic design: AI can produce technically competent environments, yet audiences continue to associate meaningful creativity and emotional impact with human authorship. Understanding this dynamic is critical for filmmakers, designers, and AI developers, as it underscores both the potential and limitations of AI usage in creative workflows. It also suggests that the success of AI-assisted design may not only depend on improving visual output, but on how authorship is communicated and integrated within collaborative creative processes.
For filmmakers and production designers, the results suggest that the perceived authorship of a set can shape audience engagement and emotional response, regardless of the objective quality of the environment or creator behind it. This means that even as AI tools become increasingly advanced in visual coherence and narratively supportive designs, human involvement remains central to the audience’s sense of creativity, intention, and emotional depth when responding to the film. This data proves there is a difference in how viewers respond to content based on real and perceived authorship.
For set designers, the study underscores the importance of integrating AI as a tool rather than a replacement. AI tools aid efficiency while human-guided design choices convey a stronger sense of intentionality, emotional resonance, and especially originality. This suggests that, thus far, AI tools may be most effective in early-stage ideation, while human designers refine these concepts for narrative logic, atmosphere, and conceptual direction. By maintaining creative authorship, designers can leverage AI’s speed without sacrificing the perceived intentionality that audiences associate with human-made environments. Therefore, these hybrid workflows may offer greater technical productivity as well as stronger audience engagement. For audiences, the research highlights biases in aesthetic judgment. Human authorship strongly influences evaluations. Therefore, as AI-generated imagery becomes more prevalent in film, viewers may over- or under-estimate the creative contributions of AI compared to those of humans, which has the potential to reshape expectations for cinematic storytelling.
More broadly, this tension between AI tools’ technical competence and AI-generated images’ perceived meaning calls for critical reflection on how AI is both credited and contextualized in film production, emphasizing that AI’s technological sophistication alone is insufficient to fully replace the human touch in shaping compelling cinematic experiences.
Looking forward, future research should explore how different AI tools, levels of human curation, or collaborative human–AI workflows influence audience perception of creativity and intentionality. Longitudinal studies could also examine how exposure to AI-generated media affects assumptions about authorship and artistic value over time, as well as how these perceptions vary across cultural contexts or age groups. By continuing to investigate these intersections, scholars and practitioners can better navigate the evolving landscape of AI-mediated cinematic design, ensuring that technological innovation complements rather than diminishes creative agency. Perceptions of cinematic design are shaped by assumptions of authorship as much as by visual content.
Works Cited
- Whittle, Charlie. (2025, February 10). The Bristorian. The Bristorian.
- FilmLA. (2025, October 14). L.A. Area Film Shoot Days Decline in Third Quarter, as New Incentive-Backed Projects Offer Positive Early Signs for Greater L.A. Film Ecosystem. FilmLA.
- Bode, L. (2018). “It’s a Fake!”: Early and Late Incredulous Viewers, Trick Effects, and CGI. Film History,
- Fraser-Crook, J. (Director). (2010). Creating the world of Harry Potter [TV series]. Warner Bros.
- Miller, A. (2022, September 22). Why Do All Netflix Shows Look the Same? No Film School.
- Li, C., Zhang, T., Du, X., Zhang, Y., & Xie, H. (2024). Generative AI models for different steps in architectural design: A literature review. Frontiers of Architectural Research, 14(3).
- Gao, Y. (2025) . Embrace AI to Complete Film Odyssey: A Comparative Analysis of Conventional and AI-generated Production Design-Crisis and Prospect. Lecture Notes in Education Psychology and Public Media Ewadirect.com.
- DeGuzman, K. (2023). Set Design in Film — Process and Purpose Explained. (2023, October 15). StudioBinder.
- NVIDIA Launches Omniverse Design Collaboration and Simulation Platform for Enterprises. (2021). NVIDIA Newsroom.
- Lee, A. (2022, June 22). Meet the Omnivore: Director of Photography Revs Up NVIDIA Omniverse to Create Sleek Car Demo. NVIDIA Blog.
- Crawford, M. (2024, February 22). What Is Digital Backlot in Film? Creating Worlds Without Leaving the Studio. Filmmaking Lifestyle.
- Poolkrajang, S., Bhojan, A. (2024) Towards Generating 3D City Models with GAN and Computer Vision Methods, GRAPP 2024 - 19th International Conference on Computer Graphics Theory and Applications
- Obropta, C. T. (2020, September 3). How THE MANDALORIAN And ILM Created A Visual Effects Breakthrough. Film Inquiry.
- Millian, M. (2022). How Digital Domain Built a Digital, Destructible Slice of NYC for ‘Spider-Man: No Way Home’. Animation Magazine.
-
Barthes, R. (1977). The Death of the Author (S. Heath, Trans.). In Image, Music, Text (pp.
142-148). Fontana Press.
-
Benjamin, W. (1969). The work of art in the age of mechanical reproduction. In H. Arendt (Ed.), (H. Zohn, Trans.) (pp. 219-253). Harcourt, Brace & World.
Appendix A: Student Questionnaire
Pair #1
Image A:

- This image clearly supports a coherent story or cinematic scenario. (1-Strongly Disagree, 7-Strongly Agree)
- This image appears intentionally expressive rather than random. (1-Strongly Disagree, 7-Strongly Agree)
- Materials and lighting behave in a motivated, purposeful way. (1-Strongly Disagree, 7-Strongly Agree)
- This image feels creative or original. (1-Strongly Disagree, 7-Strongly Agree)
- This image evokes a clear emotional response. (1-Strongly Disagree, 7-Strongly Agree)
Image B:

- This image clearly supports a coherent story or cinematic scenario. (1-Strongly Disagree, 7-Strongly Agree)
- This image appears intentionally expressive rather than random. (1-Strongly Disagree, 7-Strongly Agree)
- Materials and lighting behave in a motivated, purposeful way. (1-Strongly Disagree, 7-Strongly Agree)
- This image feels creative or original. (1-Strongly Disagree, 7-Strongly Agree)
- This image evokes a clear emotional response. (1-Strongly Disagree, 7-Strongly Agree)
- Which image would you prefer to use as concept art for this film scene? (A/B)
- Which image was made with AI? (A/B/Unsure)
Pair #2
Image C:

- This image clearly supports a coherent story or cinematic scenario. (1-Strongly Disagree, 7-Strongly Agree)
- This image appears intentionally expressive rather than random. (1-Strongly Disagree, 7-Strongly Agree)
- Materials and lighting behave in a motivated, purposeful way. (1-Strongly Disagree, 7-Strongly Agree)
- This image feels creative or original. (1-Strongly Disagree, 7-Strongly Agree)
- This image evokes a clear emotional response. (1-Strongly Disagree, 7-Strongly Agree)
Image D:

- This image clearly supports a coherent story or cinematic scenario. (1-Strongly Disagree, 7-Strongly Agree)
- This image appears intentionally expressive rather than random. (1-Strongly Disagree, 7-Strongly Agree)
- Materials and lighting behave in a motivated, purposeful way. (1-Strongly Disagree, 7-Strongly Agree)
- This image feels creative or original. (1-Strongly Disagree, 7-Strongly Agree)
- This image evokes a clear emotional response. (1-Strongly Disagree, 7-Strongly Agree)
- Which image would you prefer to use as concept art for this film scene? (A/B)
- Which image was made with AI? (A/B/Unsure)
Pair #3
Image E:

- This image clearly supports a coherent story or cinematic scenario. (1-Strongly Disagree, 7-Strongly Agree)
- This image appears intentionally expressive rather than random. (1-Strongly Disagree, 7-Strongly Agree)
- Materials and lighting behave in a motivated, purposeful way. (1-Strongly Disagree, 7-Strongly Agree)
- This image feels creative or original. (1-Strongly Disagree, 7-Strongly Agree)
- This image evokes a clear emotional response. (1-Strongly Disagree, 7-Strongly Agree)
Image F:

- This image clearly supports a coherent story or cinematic scenario. (1-Strongly Disagree, 7-Strongly Agree)
- This image appears intentionally expressive rather than random. (1-Strongly Disagree, 7-Strongly Agree)
- Materials and lighting behave in a motivated, purposeful way. (1-Strongly Disagree, 7-Strongly Agree)
- This image feels creative or original. (1-Strongly Disagree, 7-Strongly Agree)
- This image evokes a clear emotional response. (1-Strongly Disagree, 7-Strongly Agree)
- Which image would you prefer to use as concept art for this film scene? (A/B)
- Which image was made with AI? (A/B/Unsure)
Pair #4
Image G:

- This image clearly supports a coherent story or cinematic scenario. (1-Strongly Disagree, 7-Strongly Agree)
- This image appears intentionally expressive rather than random. (1-Strongly Disagree, 7-Strongly Agree)
- Materials and lighting behave in a motivated, purposeful way. (1-Strongly Disagree, 7-Strongly Agree)
- This image feels creative or original. (1-Strongly Disagree, 7-Strongly Agree)
- This image evokes a clear emotional response. (1-Strongly Disagree, 7-Strongly Agree)
Image H:

- This image clearly supports a coherent story or cinematic scenario. (1-Strongly Disagree, 7-Strongly Agree)
- This image appears intentionally expressive rather than random. (1-Strongly Disagree, 7-Strongly Agree)
- Materials and lighting behave in a motivated, purposeful way. (1-Strongly Disagree, 7-Strongly Agree)
- This image feels creative or original. (1-Strongly Disagree, 7-Strongly Agree)
- This image evokes a clear emotional response. (1-Strongly Disagree, 7-Strongly Agree)
- Which image would you prefer to use as concept art for this film scene? (A/B)
- Which image was made with AI? (A/B/Unsure)
Pair #5
Image I:

- This image clearly supports a coherent story or cinematic scenario. (1-Strongly Disagree, 7-Strongly Agree)
- This image appears intentionally expressive rather than random. (1-Strongly Disagree, 7-Strongly Agree)
- Materials and lighting behave in a motivated, purposeful way. (1-Strongly Disagree, 7-Strongly Agree)
- This image feels creative or original. (1-Strongly Disagree, 7-Strongly Agree)
- This image evokes a clear emotional response. (1-Strongly Disagree, 7-Strongly Agree)
Image H:

- This image clearly supports a coherent story or cinematic scenario. (1-Strongly Disagree, 7-Strongly Agree)
- This image appears intentionally expressive rather than random. (1-Strongly Disagree, 7-Strongly Agree)
- Materials and lighting behave in a motivated, purposeful way. (1-Strongly Disagree, 7-Strongly Agree)
- This image feels creative or original. (1-Strongly Disagree, 7-Strongly Agree)
- This image evokes a clear emotional response. (1-Strongly Disagree, 7-Strongly Agree)
- Which image would you prefer to use as concept art for this film scene? (A/B)
- Which image was made with AI? (A/B/Unsure)
Pair #6
Image K:

- This image clearly supports a coherent story or cinematic scenario. (1-Strongly Disagree, 7-Strongly Agree)
- This image appears intentionally expressive rather than random. (1-Strongly Disagree, 7-Strongly Agree)
- Materials and lighting behave in a motivated, purposeful way. (1-Strongly Disagree, 7-Strongly Agree)
- This image feels creative or original. (1-Strongly Disagree, 7-Strongly Agree)
- This image evokes a clear emotional response. (1-Strongly Disagree, 7-Strongly Agree)
Image L:

- This image clearly supports a coherent story or cinematic scenario. (1-Strongly Disagree, 7-Strongly Agree)
- This image appears intentionally expressive rather than random. (1-Strongly Disagree, 7-Strongly Agree)
- Materials and lighting behave in a motivated, purposeful way. (1-Strongly Disagree, 7-Strongly Agree)
- This image feels creative or original. (1-Strongly Disagree, 7-Strongly Agree)
- This image evokes a clear emotional response. (1-Strongly Disagree, 7-Strongly Agree)
- Which image would you prefer to use as concept art for this film scene? (A/B)
- Which image was made with AI? (A/B/Unsure)
Background and Experience
- Have you studied or worked in film, architecture, visual art, or production design? (Yes/No/Prefer not to say)
- How familiar are you with AI image-generation tools (Midjourney, DALL·E, Stable Diffusion, etc.)? (Never used them/Tried once or twice/Use occasionally/Use frequently)
- Age Range (Under 18/18–24/25–34/ 35–44/45–54/55+/Prefer not to say)
- Education Level (High school or below/Some college/Bachelor’s degree/Graduate degree/Prefer not to say)