Speakers

Wharton Human-AI Research presents the

3rd Annual Business & Generative AI Conference
sponsored by

Keynote Speakers

Guan,Lan-crop

Lan Guan

Chief AI Officer, Accenture
Headshot Jon Jones

Jon Jones

Vice President of Amazon Web Services (AWS) Startups and Venture Capital

More

Jon Jones is the Vice President and Global Head of Startups and Venture Capital at AWS. Previously, Jon led Global Go-To-Market for AWS products services, launching businesses including AWS custom silicon chips, Autonomous Vehicles, Quantum Computing, and Generative AI.

Prior to Amazon, Jon built Google’s first U.S.-based startup accelerator as part of the Google Developer Launchpad program, and he is a two-time startup founder. Before joining the startup ranks, Jon lead Sales, Marketing, Customer Success, and Technical teams for global infrastructure companies for a decade in Silicon Valley.

Jon is passionate about leveraging technology for good and has addressed this topic globally, including at the United Nations General Assembly in New York, where he focused on democratizing access to technology for entrepreneurs and organizations who can help us solve the world’s biggest challenges. Jon has an MBA from Wharton and is based in Seattle, Washington.

Presenters

Wajeeha Ahmad

Graduate Researcher, Stanford University

"What Content Grows User Contributions on Platforms? Evidence from a Field Experiment"

Digital media platforms use artificial intelligence (AI) algorithms to organize content, drive network effects, and spur growth. Given the attention-based business model of mainstream digital media platforms, platforms typically use algorithms to expose users to content that maximizes their engagement (Moehring, 2024). However, it is unclear whether optimizing for user engagement is a viable strategy for growing user contributions in the long run. In order to facilitate valuable exchanges between users, platforms need to initiate and sustain network effects among participants by maintaining a sufficient flow of content contributions by users (Lerner and Tirole, 2002; Lee et al., 2015; McIntyre and Srinivasan, 2017). In this paper, I examine how exposure to different types of content affects users’ contributions on the platform in order to inform the design of AI systems used by platforms to meet their strategic goals.

Examining how the type of content people are exposed to on a platform affects users’ behavior is important since it can ultimately affect the platforms’ revenue as well as societal outcomes. The use of AI algorithms by platforms has been widely conjectured to facilitate the propagation of content that stokes provocative discourse at the expense of users’ preferences (Brady et al., 2017; Rathje et al., 2021; Levy, 2021; Van Bavel et al., 2021). While existing algorithms may be optimized to expose users to content that keeps them engaged in the short-term, little is known about whether this approach can help platforms meet their longer-run goal of sustainably growing more contributions from users. Given that platforms hosting user-generated content commonly suffer from a decline in user participation over time (Halfaker et al., 2013), expanding the scale of user contributions is important during both the initial and later stages of a platform’s growth (Tiwana et al., 2010; Healy and Schussman, 2003). This creates a fundamental governance challenge: What kinds of content should platforms expose users to in order to ensure sustainable user participation and long-term value creation on a platform? While addressing this question is key for platform managers that aim to maximize revenue and stay competitive, it is also important to understand platforms’ incentives to self-regulate.

In this paper, I conduct the first field experiment to causally disentangle the effects of consuming different types of content on users’ follow-on contributions on a major digital platform. Using generative AI agents to randomly posting comments of different categories on user’s posts, I show that receiving informative and agreeable responses stimulates future contributions by new users in the longer-run. However, controversial comments receive the greatest amount of short-term engagement, i.e. users engage more and engage in longer discussions in response to controversial comments. These results indicate that platform optimizing for content production in the short-run may not be retaining user contributions in the longer run. In other words, there is an alignment problem between content production in the short-run, which AI algorithms tend to optimize for, and content production in the longer-run, which is valuable for platforms to sustainably grow user contributions.

This paper makes three key contributions to the prior research. First, this research relates to a strand of work studying the impact of AI-based recommender systems on user behavior. Prior work argues that ad-driven platforms can find it profitable to display harmful content even if users do not wish to see it because users engage more with such content, resulting in more time spent, more content generated and more ad revenue for the platform (Beknazar-Yuzbashev et al., 2024). Empirically, previous research shows that platforms face a trade-off between maximizing engagement and content credibility (Moehring, 2024). While these papers examine the impact of specific types of content on user’s short-term engagement, they ignore longer-run impacts on content production, which is vital for platforms’ ability to both retain users beyond their immediate engagement with content and ultimately generate revenue. Anecdotal evidence suggests that digital platforms typically choose shorter-run metrics to optimize for since such metrics have tighter feedback loops, which helps machine learning algorithms improve faster (Athey, 2018). Building on prior work that has suggests that short-term engagement signals are not always aligned with users’ actual preferences (Lu et al., 2018; Milli et al., 2025), this research is the first to empirically demonstrate that optimizing for short-term metrics can impede the growth of user contributions in the longer run.

Second, a large literature on platform growth examines how platforms use a range of methods to promote user participation and growth. While prior work has examined how the volume of initial content begets further user contributions (Aaltonen and Seiler, 2015; Nagaraj, 2021) or examined the impact of a specific type of content, e.g. content that is toxic (Beknazar-Yuzbashev et al., 2022) or sparks outrage (McLoughlin et al., 2024), it ignores how different categories of content supplied on digital platforms can differentially affect future contributions by users. Given that the type of content and the way in which information is shared online can vary widely, this question is of great importance to platform managers and designers. Through introducing exogenous variation in the type of content people are exposed to, this work contributes to the literature on platform growth by unveiling the value of informative, agreeable comments for growing new users’ contributions in the long-term.

Finally, this work also advances the use of AI agents for research. Previous research has demonstrated the use of AI in the lab to predict treatment effects (Ashokkumar et al., 2024), and simulate research participants (Horton, 2023), human behavior (Park et al., 2023) and strategic decision-making (Mirzayev et al., 2025). Prior work has also studied the deployment of AI in the field to provide advice and improve business processes by assisting human workers (Brynjolfsson et al., 2025; Otis et al., 2024). Building on these streams, this work is among the first to deploy AI agents in the field to both autonomously interact with the websites of large platforms and provide contextual, scalable interventions.

Ada Aka

Assistant Professor of Marketing, Stanford Graduate School of Business

"Better Together: Quantifying the Benefits of AI-Assisted Recruitment"

Artificial intelligence (AI) is increasingly used in recruitment, yet empirical evidence quantifying its impact on hiring efficiency and candidate selection remains limited. We randomly assign 37,000 applicants for a junior-developer position to either a traditional recruitment process (resume screening followed by human selection) or an AI-assisted recruitment pipeline incorporating an initial AI-driven structured video interview before human evaluation.

Candidates advancing from either track faced the same final-stage human interview, with interviewers blind to the earlier selection method. In the AI-assisted pipeline, 54% of candidates passed the final interview compared with 34% from the traditional pipeline, yielding an average treatment effect of 20 percentage points (SE 12 pp.). Five months later, we collected LinkedIn profiles of top applicants from both groups and found that 18% (SE 1.1%) of applicants from the traditional track found new jobs compared with 23% (SE 2.3%) from the AI group, resulting in a 5.9 pp. (SE 2.6 pp.) difference in the probability of finding new employment between groups. The AI system tended to select younger applicants with less experience and fewer advanced credentials. We analyze AI-generated interview transcripts to examine the selection criteria and conversational dynamics. Our findings contribute to understanding how AI technologies affect decision making in recruitment and talent acquisition while highlighting some of their potential implications.

Ravi Bapna

Professor, University of Minnesota

"Agentic AI and Managers’ Analytics Capabilities: An Exploration"

As firms across industries adopt data-driven decision-making practices, the need for managers fluent in advanced analytics has grown increasingly urgent. Business schools have responded by expanding advanced analytics offerings in MBA programs, but the wide range of student backgrounds—spanning “poets and quants”—poses challenges for uniformly developing these capabilities. Can agentic-AI tools help bridge this gap? We investigate this question through a field deployment of a no-code, agentic-AI tool—functioning as a virtual data scientist—in a core analytics course at a top-30 global MBA program. This tool allows students to solve machine learning problems using natural language across a range of tasks, including segmentation, classification, prediction, and text analytics. Using detailed logs of student-agent interactions and closed-book final exam scores, we examine (1) whether use of the agentic-AI agent improves learning outcomes, (2) whether usage patterns differ across students, and whether these differences affect learning, and (3) what mechanisms link agentic-AI usage with learning outcomes. We find that doubling the intensity of agentic-AI usage increases a student’s final exam score by an average of 0.624 points (or about 1.56 percentage points)—exceeding the typical gap between an ‘A’ and an ‘A-.’ Lower-performing students are less likely to use the agent, but when they do engage, they experience greater marginal returns. These performance gains are concentrated in the applied business case portion of the exam, rather than the analytics concepts section. Notably, students benefit most when they adopt a “going-to-the-gym” usage mode—actively working through MBA-level case exercises with the tool—rather than relying on it for exam preparation or passively observing teammates.

Anouk Bergner

Assistant Profess of Marketing, University of Geneva

"Managing Online Toxicity: How AI-Enabled Empathic Support Transforms Consumer Coping Behaviors"

This research demonstrates how AI-enabled empathic support systems outperform human supporters in helping victims of online harassment. Through two studies, we show that LLM-based AI models exhibit superior emotional resonance, perspective taking, and action orientation, making people feel more genuinely heard and improving their coping self-efficacy, especially in severe cases.

Hemant Bhargava

Distinguished Professor, UC Davis

"AI Pricing: The Value-Revenue Paradox"

Seat-based pricing dominated the software-as-a-service (SaaS) industry, and has quickly became popular for AI tools (e.g., ChatGPT’s $20/per-user-per-month model). But with AI’s promise to increase productivity and headcount, seat-based pricing can create a paradox where firms that enjoy greater value and savings pay less; a paradox that does not bedevil traditional software.

Nazli Bhatia

Associate Practice Professor of Behavioral Science, University of Pennsylvania

"Predicting Partner Satisfaction in Negotiation"

In this paper, we examine whether negotiators can accurately predict their partners’ satisfaction and explore how language used during negotiations can offer cues to this process. In Study 1, participants negotiated via online chat over a used car. Results showed that while negotiators achieved some accuracy in predicting their partners’ satisfaction, generative AI tools, such as ChatGPT, estimated negotiator satisfaction much better. Negotiators failed to maximize accuracy because they over relied on their own satisfaction and economic outcomes to estimate partner satisfaction while neglecting the language their partners used that can offer cues to satisfaction. Study 2 examined this phenomenon in an in-person negotiation but found that negotiators were more accurate than ChatGPT in predicting satisfaction in this context.

Behnaz Bojd

Assistant Professor, UC Irvine

"Information-Seeking from AI Chatbots: Tradeoff between Judgment and Misinformation Concerns under Stigma"

Timely access to information is critical for navigating challenging circumstances. However, stigma presents a significant barrier to information-seeking. This study examines how stigma shapes individuals’ preference for conversational information-seeking from AI chatbots compared to human experts. Through a series of randomized controlled experiments and a field experiment, we find that stigma consistently increases the preference for AI chatbots relative to human experts, effectively reducing algorithm aversion. We identify two key mechanisms underlying this effect: judgment concern and misinformation concern. Under high stigma, individuals are more concerned about social judgment, making chatbots a more appealing, nonjudgmental alternative. Simultaneously, stigma reduces concern about misinformation, further lowering resistance to AI chatbots. We discuss the theoretical contributions and practical implications of these findings for organizations aiming to enhance access to critical information in stigmatized domains.

Leonard Boussioux

Assistant Professor, University of Washington (Foster); Laboratory for Innovation Science at Harvard

"Narrative AI and the Human-AI Oversight Paradox in High-Stakes Evaluation"

Do AI-generated explanations enhance human judgment or undermine it? Through two complementary studies—a field experiment with 228 evaluators screening 48 innovations and a lab experiment on real-world ethical dilemmas—we uncover a consistent paradox: AI explanations increase human reliance on algorithmic recommendations rather than strengthening independent judgment.

In both contexts, users weighted AI input significantly more than their personal judgment when reasoning was provided. Innovation evaluators were 19 percentage points more likely to align with AI recommendations when explanations were present, particularly for rejection decisions. In ethical dilemmas, decisive AI recommendations with justifications similarly increased user alignment and confidence. This effect intensified under high cognitive load or controversy, where AI narratives appeared to substitute for, rather than supplement, human deliberation.

While AI assistance improved overall decision quality compared to human-only conditions, explanatory narratives paradoxically diminished meaningful human oversight. In innovation screening, this led to increased rejection of potentially transformative solutions deviating from standard frameworks. In ethical scenarios, it manifested as over-reliance on AI in precisely those ambiguous situations requiring human values and contextual understanding. These findings reveal the double-edged nature of explainable AI and highlight critical tensions in designing decision-support systems that preserve human agency while leveraging algorithmic capabilities.

Noah Castelo

Associate Professor, University of Alberta

"AI Assistance Can Decrease Motivation to Improve Cognitive Skills"

Popular AI products like ChatGPT can improve human performance at a range of knowledge-based tasks. However, there is a growing concern that increasing reliance on such programs could lead to “deskilling,” a phenomenon in which users who offload tasks to AI eventually become worse at performing such tasks independently. We document one way in which this deskilling might occur. Specifically, in four incentive-aligned experiments, we find that people who use AI to help with various cognitive tasks show less subsequent motivation to improve at such tasks compared to those who complete the tasks alone. We find that this effect occurs when it seems futile to improve a skill that AI can perform. The effect does not occur when it does not seem futile to improve one’s skills (e.g., when AI will not be available for subsequent tasks, or when some intrinsically enjoys the task). We also document an important downstream consequence of this decreased motivation: poorer performance on a subsequent, similar task when AI assistance is no longer available.

Zhaoqi Cheng

Assistant Professor, Worcester Polytechnic Institute

"Navigating the Frontier of Artificial Intelligence via Agentic Taxonomy Induction"

We propose an agentic framework for building an adaptive taxonomy of generative AI that dynamically characterizes the ecosystem of AI technologies. Our system leverages large language models (LLMs) and an agentic architecture to annotate multimodal data streams, particularly from open-source software repositories. The result is a fine-grained, evolving taxonomy that offers greater interpretability and resolution, providing a near real-time view of the AI landscape.

Our application of this framework to over 267,000 open-source repositories reveals several key insights into the AI ecosystem’s evolution. First, we document concentration and entry dynamics across AI sub-fields, showing that while new firm entry has declined after peaking between 2014 and 2017, market concentration has remained relatively low, suggesting persistent fragmentation. Second, we find that large technology firms (GAFAM) play an outsized role in setting technical direction and in the accumulation of repositories, even as a long tail of other firms contributes meaningfully to the ecosystem. Finally, by visualizing firms’ movements across the AI technological space, we find a notable convergence as most firms increasingly focus on similar domains, while observing distinct patterns for firms like Apple, which maintains a specialized focus, and Microsoft, which executed a significant strategic pivot toward foundational AI capabilities.

Andrea Contigiani

Assistant Professor, Ohio State University

"Experimentation in the Age of Generative AI: Evidence from ChatGPT"

We explore the role of Generative AI in reshaping entrepreneurial experimentation. More precisely, we study whether and how the emergence of GenAI has changed the way early-stage firms engage in experimental activities. To study this question, we build a comprehensive dataset of U.S.-based ventures founded between January 2020 and November 2022, capturing the relevant population prior to the emergence of ChatGPT. Our preliminary analysis points to a negative relationship. This evidence appears to be consistent with the view that the emergence of GenAI “crowds out” the use of experimentation.

Amit Dhanda

Senior Applied Scientist, Amazon

"Multi-Dimensional Summarization Agents with Context-Aware Reasoning over Enterprise Table"

We propose a novel framework for summarizing structured enter- prise data across multiple dimensions using large language model (LLM)-based agents. Traditional table-to-text models often lack the capacity to reason across hierarchical structures and context- aware deltas, which are essential in business reporting tasks. Our method introduces a multi-agent pipeline that extracts, analyzes, and summarizes multi-dimensional data using agents for slicing, variance detection, context construction, and LLM-based genera- tion. Our results show that the proposed framework outperforms traditional approaches, achieving 83% faithfulness to underlying data, superior coverage of significant changes, and high relevance scores (4.4/5) for decision-critical insights. The improvements are especially pronounced in categories involving subtle trade-offs, such as increased revenue due to price changes amid declining unit volumes, which competing methods either overlook or address with limited specificity. We evaluate the framework on Kaggle datasets and demonstrate significant improvements in faithfulness, relevance, and insight quality over baseline table summarization approaches.

Darima Fotheringham

Assistant Professor, Marketing, Rawls College of Business, Texas Tech University

"More Than Just a Coach: The Psychological Limits of AI in Driving Human Performance"

The rapid proliferation of generative AI (GenAI) has positioned foundation models, such as OpenAI’s GPT series, as essential infrastructure across domains including law, healthcare, and software development. These models are typically accessed through two channels: consumer-facing subscriptions and developer-facing APIs. The latter enables third-party developers to build specialized applications by licensing model access, forming a modular innovation ecosystem. This structure contributes significantly to AI providers’ revenue. OpenAI, for instance, earned $1.4 billion in API-related revenue in 2023. However, it also creates friction over data governance, value appropriation, and competition between model providers and downstream developers.

Amid the growing scarcity of high-quality training data, providers have turned to API-generated data submitted via developer applications as a critical input for model refinement. Historically, many providers, including OpenAI, permitted such data usage by default unless developers opted out. While this practice enhances model performance, it has raised concerns over privacy and fairness. Developers argue that their proprietary workflows are being exploited to improve a competing product without adequate consent, while users face increased risk of privacy violations.

In response to these concerns, OpenAI revised its policy in 2023 to prohibit API data usage for training without explicit opt-in. However, policies remain inconsistent across the industry. DeepSeek, for instance, continues to use API data for training—and uncertainty persists regarding actual practices and associated risks. These developments pose a broader strategic dilemma: do privacy-preserving data policies impair the provider’s ability to sustain model quality, and how do such policies affect the dynamics of innovation and competition in the GenAI ecosystem?

Traditional privacy research emphasizes a tradeoff between personalization and privacy. Yet in the GenAI context, the interdependence between model providers and developers introduces a distinct dynamic. Developers rely on the provider’s model as a technological backbone, while providers benefit from data generated via developers’ applications. This mutual reliance creates a feedback loop in which user interactions fuel continuous model improvement. When data flows from developers to the provider are restricted, a condition we refer to as data constriction, this loop can break, weakening the provider’s core model and undermining ecosystem performance. For developers, the ability to differentiate their services through fine-tuning reduces direct competition with the provider but incurs customization costs. Their incentive to invest in differentiation depends on both privacy preferences and whether the provider can reuse their API-generated data. Thus, while restricting data reuse may enhance privacy, the downstream implications for model quality, pricing, innovation incentives, and consumer welfare remain theoretically ambiguous.

To examine these tradeoffs, we develop an analytical model featuring three key actors: an AI provider, a developer, and a continuum of consumers with heterogeneous preferences. The provider offers access to its foundation model through direct subscriptions and API licensing. The developer builds a differentiated service using the API, pays a usage fee, and sets its own price. Consumers choose between the two services, or neither, based on pricing, customization, and privacy concerns. We compare two regimes: one where the provider can use API-generated data for training, and one where such usage is restricted.

Our analysis identifies two countervailing effects. The first, the data constriction effect, reduces the provider’s access to training data, potentially diminishing model quality. The second, the differentiation enhancement effect, arises because restrictions on data usage encourage the developer to further differentiate its service to attract privacy-conscious consumers. This, in turn, expands the provider’s own direct user base whose data remains accessible and may offset or even surpass the loss of API data.

These effects yield nuanced outcomes. When consumers place a high value on privacy, the differentiation enhancement effect can dominate. In such cases, training data restrictions may improve the provider’s service (model) quality by expanding its direct data pool. Additionally, the softened competition between provider and developer leads to higher subscription prices and greater willingness to pay, raising both the API fee and profit margins. Consequently, both the provider and the developer may benefit from adopting privacy-sensitive data policies. However, consumer surplus does not always improve. Although privacy concerns are alleviated and service quality may increase, consumers often face higher prices. In other cases, when privacy is less salient and the data constriction effect dominates, service quality suffers, which further erodes consumer welfare. Thus, while restrictive data policies can realign incentives and reinforce ecosystem sustainability, they do not guarantee improved outcomes for consumers.

Overall, this study highlights the complex tradeoffs involved in API data governance within GenAI ecosystems. Restricting the reuse of API-generated data can paradoxically enhance model performance and the developer’s profitability under certain conditions but may reduce consumer surplus. These findings underscore the importance of aligning privacy protection with innovation incentives through thoughtful platform governance and regulatory frameworks.

Yi Gao

Assistant Professor, Rawls College of Business, Texas Tech University

"When Data Can’t Flow Upstream: Implications of Restricting API Data for Model Training"

The rapid proliferation of generative AI (GenAI) has positioned foundation models, such as OpenAI’s GPT series, as essential infrastructure across domains including law, healthcare, and software development. These models are typically accessed through two channels: consumer-facing subscriptions and developer-facing APIs. The latter enables third-party developers to build specialized applications by licensing model access, forming a modular innovation ecosystem. This structure contributes significantly to AI providers’ revenue. OpenAI, for instance, earned $1.4 billion in API-related revenue in 2023. However, it also creates friction over data governance, value appropriation, and competition between model providers and downstream developers.

Amid the growing scarcity of high-quality training data, providers have turned to API-generated data submitted via developer applications as a critical input for model refinement. Historically, many providers, including OpenAI, permitted such data usage by default unless developers opted out. While this practice enhances model performance, it has raised concerns over privacy and fairness. Developers argue that their proprietary workflows are being exploited to improve a competing product without adequate consent, while users face increased risk of privacy violations.

In response to these concerns, OpenAI revised its policy in 2023 to prohibit API data usage for training without explicit opt-in. However, policies remain inconsistent across the industry. DeepSeek, for instance, continues to use API data for training—and uncertainty persists regarding actual practices and associated risks. These developments pose a broader strategic dilemma: do privacy-preserving data policies impair the provider’s ability to sustain model quality, and how do such policies affect the dynamics of innovation and competition in the GenAI ecosystem?

Traditional privacy research emphasizes a tradeoff between personalization and privacy. Yet in the GenAI context, the interdependence between model providers and developers introduces a distinct dynamic. Developers rely on the provider’s model as a technological backbone, while providers benefit from data generated via developers’ applications. This mutual reliance creates a feedback loop in which user interactions fuel continuous model improvement. When data flows from developers to the provider are restricted, a condition we refer to as data constriction, this loop can break, weakening the provider’s core model and undermining ecosystem performance. For developers, the ability to differentiate their services through fine-tuning reduces direct competition with the provider but incurs customization costs. Their incentive to invest in differentiation depends on both privacy preferences and whether the provider can reuse their API-generated data. Thus, while restricting data reuse may enhance privacy, the downstream implications for model quality, pricing, innovation incentives, and consumer welfare remain theoretically ambiguous.

To examine these tradeoffs, we develop an analytical model featuring three key actors: an AI provider, a developer, and a continuum of consumers with heterogeneous preferences. The provider offers access to its foundation model through direct subscriptions and API licensing. The developer builds a differentiated service using the API, pays a usage fee, and sets its own price. Consumers choose between the two services, or neither, based on pricing, customization, and privacy concerns. We compare two regimes: one where the provider can use API-generated data for training, and one where such usage is restricted.

Our analysis identifies two countervailing effects. The first, the data constriction effect, reduces the provider’s access to training data, potentially diminishing model quality. The second, the differentiation enhancement effect, arises because restrictions on data usage encourage the developer to further differentiate its service to attract privacy-conscious consumers. This, in turn, expands the provider’s own direct user base whose data remains accessible and may offset or even surpass the loss of API data.

These effects yield nuanced outcomes. When consumers place a high value on privacy, the differentiation enhancement effect can dominate. In such cases, training data restrictions may improve the provider’s service (model) quality by expanding its direct data pool. Additionally, the softened competition between provider and developer leads to higher subscription prices and greater willingness to pay, raising both the API fee and profit margins. Consequently, both the provider and the developer may benefit from adopting privacy-sensitive data policies. However, consumer surplus does not always improve. Although privacy concerns are alleviated and service quality may increase, consumers often face higher prices. In other cases, when privacy is less salient and the data constriction effect dominates, service quality suffers, which further erodes consumer welfare. Thus, while restrictive data policies can realign incentives and reinforce ecosystem sustainability, they do not guarantee improved outcomes for consumers.

Overall, this study highlights the complex tradeoffs involved in API data governance within GenAI ecosystems. Restricting the reuse of API-generated data can paradoxically enhance model performance and the developer’s profitability under certain conditions but may reduce consumer surplus. These findings underscore the importance of aligning privacy protection with innovation incentives through thoughtful platform governance and regulatory frameworks.

Goodman Gu

Director of AI, Adobe Inc.

"AKAP: A Multi-Agent Framework for Enterprise Strategic Knowledge Asset Management"

In today’s generative AI-driven business landscape, strategic knowledge assets—the deeply embedded capabilities, intellectual property, and organizational know-how that underpin competitive advantage—remain elusive and underutilized. Despite decades of effort in knowledge management, firms continue to face what we term the Strategic Knowledge Asset Paradox: they possess knowledge of immense strategic value that is fragmented, tacit, and disconnected from value-generating activity. This paper introduces the Agentic Knowledge Asset Platform (AKAP), a multi-agent, AI-powered system designed to resolve this paradox. AKAP transforms passive, static knowledge repositories into dynamic, coordinated systems of intelligent agents that proactively identify,structure, disseminate, and activate enterprise knowledge assets—turning them into operational and strategic leverage points.

Alok Gupta

Curtis L. Carlson Schoolwide Chair, University of Minnesota

"Roles of AI in Collaboration with Humans: Automation, Augmentation and the Future of Work"

Humans will see significant changes in the future of work as collaboration with artificial intelligence (AI) will become commonplace. This work explores the benefits of AI in the setting of judgment tasks when it replaces humans (automation) and when it works with humans (augmentation). Through an analytical modeling framework, we show that the optimal use of AI for automation or augmentation depends on different types of human–AI complementarity. Our analysis demonstrates that the use of automation increases with higher levels of between-task complementarity. In contrast, the use of augmentation increases with higher levels of within-task complementarity. We integrate both automation and augmentation AI roles into our task allocation framework, in which an AI and humans work on a set of judgment tasks to optimize performance with a given level of available human resources. We validate our framework with an empirical study based on experimental data in which humans classify images with and without AI support. When between-task and within-task complementarity exist, we see a consistent distribution of work pattern for optimal work configurations: AI automates relatively easy tasks, AI augments humans on tasks with similar human and AI performance, and humans work without AI on relatively difficult tasks. Our work provides several contributions to theory and practice. The findings on the effects of complementarity provide a nuanced view regarding the benefits of automation and augmentation. Our task allocation framework highlights potential job designs for the future of work, especially by considering the often ignored, critical role of human resource reallocation in improving organizational performance.

Manuel Hoffmann

Assistant Professor, University of California, Irvine

"The Generative AI - Gender Puzzle"

 

 

Xiyang Hu

Assistant Professor, Arizona State University

"DrugAgent: Automating AI-aided Drug Discovery Programming through LLM Multi-Agent Collaboration"

Recent progress in Large Language Models (LLMs) has drawn attention to their potential for accelerating drug discovery. However, a central problem remains: translating theoretical ideas into robust implementations in the highly specialized context of pharmaceutical research. This limitation prevents practitioners from making full use of the latest AI developments in drug discovery. To address this challenge, we introduce DrugAgent, a multi-agent framework that automates machine learning (ML) programming for drug discovery tasks. DrugAgent employs an LLM Planner that formulates high-level ideas and an LLM Instructor that identifies and integrates domain knowledge when implementing those ideas. We present case studies on three representative drug discovery tasks. Our results show that DrugAgent consistently outperforms leading baselines, including a relative improvement of 4.92% in ROC-AUC compared to ReAct for drug-target interaction (DTI).

Michael G. Jacobides

Sir Donald Gordon Professor of Entrepreneurship & Innovation and Professor of Strategy, London Business School

Rachit Kamdar

PhD Candidate, University of Maryland

"Modeling the Interviewer: Leveraging LLMs to Uncover Personality Mismatch Effects in Interview Assessment"

This talk explores how interviewer personality traits influence job interview dynamics, a topic traditionally constrained by limited access to human interviewers. Leveraging large language models (LLMs), this study developed an automated interview system that embeds personality traits into virtual interviewers. In a randomized experiment this study finds that personality alignment between interviewer and interviewee enhances the accuracy of personality assessments. This research offers new insights into personality-driven AI-human interactions and presents a scalable framework for studying employment interviews.

Sophia Kazinnik

PhD Candidate, Wilfrid Laurier University

"Understanding effects of LLM information framing: The Impact of Persona-based Framing on Persuasive Effectiveness"

As more shoppers rely on conversational AI for product research, LLMs are the final touchpoint before consumers make purchase decisions. For example, Perplexity added one-click shopping within its chat interface (Sigalos, 2025), and OpenAI integrated Shopify into ChatGPT (Stambor, 2025). In response, marketers are reallocating budgets from SEO to “LLM optimization,” to improve brand visibility in LLM output (Gupta, 2025), indicating an urgent need for understanding how LLMs generate and present product information to users and their effects on consumer behavior.

Comparative analysis of eight LLMs revealed a distinct linguistic framing pattern: brief persona tags (e.g., “Top pick for fitness enthusiasts”) appended to product listings. Early Perplexity models matched every item to a persona, while newer models applied persona tags selectively. Persona framing—long used in gifting magazines and lifestyle blogs to cast products through archetypal users (“For the home chef…”)—is now automated by LLMs, which blend official and third-party content into composite descriptions that can subtly sway consumer perceptions. This pervasive use of persona-based framing raises our central question: How does LLM-generated persona framing affect consumer shopping behavior?

Our studies demonstrate that persona-based framing of product information significantly enhances Consumer Felt Understanding (CFU). This shows that a linguistic framing can validate consumers’ self-identity/s and improve LLM–user interaction quality. We differentiate between two CFU levels: collective and individual. Collective CFU drives persuasion and confidence, while individual Consumer Felt Understanding mainly affects confidence.

These findings matter for two reasons. First, persona framing enables consumers to feel understood without sharing personal data, reducing privacy risks inherent in true personalization. Second, despite this privacy protection, persona framing markedly boosts the persuasive power of LLM recommendations—a development that raises ethical considerations for AI researchers. Finally, this research answers a pressing managerial question and offers actionable guidance for digital marketers on optimizing LLM-driven content strategies.

Sakshi Korde

PhD Candidate, Wilfrid Laurier University

"Understanding effects of LLM information framing: The Impact of Persona-based Framing on Persuasive Effectiveness"

As more shoppers rely on conversational AI for product research, LLMs are the final touchpoint before consumers make purchase decisions. For example, Perplexity added one-click shopping within its chat interface (Sigalos, 2025), and OpenAI integrated Shopify into ChatGPT (Stambor, 2025). In response, marketers are reallocating budgets from SEO to “LLM optimization,” to improve brand visibility in LLM output (Gupta, 2025), indicating an urgent need for understanding how LLMs generate and present product information to users and their effects on consumer behavior.

Comparative analysis of eight LLMs revealed a distinct linguistic framing pattern: brief persona tags (e.g., “Top pick for fitness enthusiasts”) appended to product listings. Early Perplexity models matched every item to a persona, while newer models applied persona tags selectively. Persona framing—long used in gifting magazines and lifestyle blogs to cast products through archetypal users (“For the home chef…”)—is now automated by LLMs, which blend official and third-party content into composite descriptions that can subtly sway consumer perceptions. This pervasive use of persona-based framing raises our central question: How does LLM-generated persona framing affect consumer shopping behavior?

Our studies demonstrate that persona-based framing of product information significantly enhances Consumer Felt Understanding (CFU). This shows that a linguistic framing can validate consumers’ self-identity/s and improve LLM–user interaction quality. We differentiate between two CFU levels: collective and individual. Collective CFU drives persuasion and confidence, while individual Consumer Felt Understanding mainly affects confidence.

These findings matter for two reasons. First, persona framing enables consumers to feel understood without sharing personal data, reducing privacy risks inherent in true personalization. Second, despite this privacy protection, persona framing markedly boosts the persuasive power of LLM recommendations—a development that raises ethical considerations for AI researchers. Finally, this research answers a pressing managerial question and offers actionable guidance for digital marketers on optimizing LLM-driven content strategies.

YoungJin Kwon

PhD Candidate, University of Minnesota

"Large Language Models in Academia: Boosting Productivity but Reinforcing Inequality"

Large language models (LLMs) have garnered significant attention for their potential to enhance knowledge worker productivity. In this study, we provide the first large-scale empirical evaluations of LLMs’ impact on academic research productivity. Leveraging a comprehensive dataset of 4,582 computer science scholars across 194 top U.S. universities and analyzing 251,124 research papers published between 2018 and 2024, we find that the introduction of LLMs is associated with about 8% increase in publication output—a gap that persists across alternative measures, including the first-author publications and top-tier conference papers. Our regression discontinuity in time (RDiT) analysis further reveals that LLMs not only shifted the average publication level but also accelerated the growth rate of productivity, rising to 3.2% in 2023 and 12.8% in 2024. Notably, junior scholars realize stronger gains than their senior counterparts, with the productivity benefit diminishing by roughly 1% for each additional year of experience. Recognizing that LLMs’ benefits may not be uniformly distributed, we also investigate their impact on non-native English-speaking (NNES) researchers, who have historically faced disadvantages in academic writing (Liao et al., 2024). Difference-in-differences and generalized synthetic control analyses indicate that, following LLM adoption, native English-speaking (NES) researchers produced more papers than their NNES counterparts. Overall, our findings indicate that while LLMs significantly boost scholarly productivity, they also exhibit dual effects, lowering barriers for junior scholars while potentially reinforcing linguistic inequities.

Mario Leccese

Assistant Professor, Boston University

"Navigating the Frontier of Artificial Intelligence via Agentic Taxonomy Induction"

We propose an agentic framework for building an adaptive taxonomy of generative AI that dynamically characterizes the ecosystem of AI technologies. Our system leverages large language models (LLMs) and an agentic architecture to annotate multimodal data streams, particularly from open-source software repositories. The result is a fine-grained, evolving taxonomy that offers greater interpretability and resolution, providing a near real-time view of the AI landscape.

Our application of this framework to over 267,000 open-source repositories reveals several key insights into the AI ecosystem’s evolution. First, we document concentration and entry dynamics across AI sub-fields, showing that while new firm entry has declined after peaking between 2014 and 2017, market concentration has remained relatively low, suggesting persistent fragmentation. Second, we find that large technology firms (GAFAM) play an outsized role in setting technical direction and in the accumulation of repositories, even as a long tail of other firms contributes meaningfully to the ecosystem. Finally, by visualizing firms’ movements across the AI technological space, we find a notable convergence as most firms increasingly focus on similar domains, while observing distinct patterns for firms like Apple, which maintains a specialized focus, and Microsoft, which executed a significant strategic pivot toward foundational AI capabilities.

Dokyun (DK) Lee

Associate Professor in IS & Computing and Data Science, Boston University

"Breaking News and Filter Bubbles: Generative AI Search and the Future of News Consumption"

The proliferation of Gen-AI systems has transformed how digital content platforms support user search and retrieval. By synthesizing natural language responses tailored to user queries, Gen-AI search tools aim to streamline information discovery and reduce cognitive effort. However, while their operational potential has been widely recognized, relatively little is known about how such systems influence user behavior in algorithmically curated, information-rich news environments. In particular, their effects on the breadth of information exposure—often framed as “filter bubble” (Pariser 2011, Bakshy et al. 2015)—remain theoretically and empirically ambiguous. Some prior studies suggest that algorithmic curation may inadvertently narrow the content landscape to which users are exposed, while others posit that well-designed AI systems can promote exploratory information access (Dubois and Blank 2018). This study seeks to examine whether Gen-AI-based search interfaces increase efficiency at the cost of informational breadth, or whether they can enable both simultaneously. We address this question through a large-scale randomized field experiment conducted in collaboration with The Washington Post. Between October 1 and November 7, 2024, more than 14 million users were randomly assigned—via persistent cookie identifiers and client-side execution logic—to either a treatment or a control group. The control group accessed the platform’s standard search experience, which relied on editorially promoted search banners and a keyword-based engine powered by Google Vertex. In contrast, users in the treatment group engaged with a Gen-AI search interface featuring a natural language prompt inviting open-ended queries.

Heeseung Andrew Lee

Assistant Professor, University of Texas at Dallas

"Breaking News and Filter Bubbles: Generative AI Search and the Future of News Consumption"

The proliferation of Gen-AI systems has transformed how digital content platforms support user search and retrieval. By synthesizing natural language responses tailored to user queries, Gen-AI search tools aim to streamline information discovery and reduce cognitive effort. However, while their operational potential has been widely recognized, relatively little is known about how such systems influence user behavior in algorithmically curated, information-rich news environments. In particular, their effects on the breadth of information exposure—often framed as “filter bubble” (Pariser 2011, Bakshy et al. 2015)—remain theoretically and empirically ambiguous. Some prior studies suggest that algorithmic curation may inadvertently narrow the content landscape to which users are exposed, while others posit that well-designed AI systems can promote exploratory information access (Dubois and Blank 2018). This study seeks to examine whether Gen-AI-based search interfaces increase efficiency at the cost of informational breadth, or whether they can enable both simultaneously. We address this question through a large-scale randomized field experiment conducted in collaboration with The Washington Post. Between October 1 and November 7, 2024, more than 14 million users were randomly assigned—via persistent cookie identifiers and client-side execution logic—to either a treatment or a control group. The control group accessed the platform’s standard search experience, which relied on editorially promoted search banners and a keyword-based engine powered by Google Vertex. In contrast, users in the treatment group engaged with a Gen-AI search interface featuring a natural language prompt inviting open-ended queries.

Jiasun Li

Robert Johnston Endowed Professorship and Associate Professor of Finance, George Mason University

"Will Generative AI Replace Human Creatives? Insights from Financial Economics"

With the rise of generative AI models, concerns grow about the future of human creatives. Will genAI replace all creative jobs? We argue no, even if genAI reaches its theoretical limit. Our theory parallels the classic concept in financial economics of Grossman & Stiglitz (1980): if genAI can produce all content at lower costs, humans will have little incentive to create, as they cannot profit. However, if no humans create, then genAI will have no human content to learn from (or only learn from outdated information) to further generate relevant, up-to-date content reflecting real-world happenings. This creates a paradox.

J. Frank Li

Assistant Professor, University of British Columbia

"What Can Robots Do? Using Large Language Models to Understand How Embodied Computation May Affect Occupations and the Economy"

The rapid advancement of artificial intelligence (AI) has spurred significant growth in robotics, particularly in embodied intelligence. Robotics technology has progressed from industrial automation to sophisticated humanoid and warehouse robots, such as Tesla Optimus, Sanctuary AI, and Pickle Robot. This technological evolution is reshaping industries, altering labor markets, and raising important questions about the suitability of robotic solutions for various occupational tasks. Despite extensive research on the economic and labor impacts of automation and AI, existing studies lack a comprehensive framework for assessing robotic capabilities at the task level. This is critical because the economy (country, industry, or firm) consists of a collection of occupations organized in a certain way and occupations are bundles of tasks to be performed. Thus, a granular approach to mapping robotic capabilities to occupational tasks is necessary to understand the broader implications for industries, firms, and labor markets. This paper introduces a Robotics Rubric, a structured evaluation system leveraging large language models (LLMs) to systematically assess the feasibility of robotics across various occupations and household production tasks. By labeling and aggregating occupational task suitability, this study aims to provide a data-driven foundation for analyzing the economic impact of embodied AI.

Wei Lu

Assistant Professor of Marketing, CUNY Baruch College

"Aligning Large Language Model Agents with Rational and Moral Preferences: A Supervised Fine-Tuning Approach"

Understanding how large language model (LLM) agents behave in strategic interactions is essential as these systems increasingly participate autonomously in economically and morally consequential decisions. We show that baseline LLMs exhibit diverse strategic behaviors in canonical economic games, with models like GPT-4o showing excessive cooperation and limited incentive sensitivity, while reasoning models, such as o3-mini, align more consistent with payoff-maximizing strategies. To address this, we develop a supervised fine-tuning pipeline using synthetic data generated from stylized utility functions grounded in economic theory. This approach aligns LLM behavior with two interpretable preference types: rational (homo economicus) and moral (homo moralis) agents. We illustrate how preference structures shape behavior across both economic games and policy-relevant applications: Moral dilemmas involving autonomous vehicles and algorithmic pricing in competitive markets. The aligned agents exhibit more consistent and structured behavior, and their choices reveal how different normative objectives can influence market and moral outcomes such as algorithmic collusion. This work establishes a replicable, cost-efficient, and economically grounded pipeline for AI preference alignment, offering a foundation for both empirical analysis and the principled design of autonomous agents.

Xueming Luo

Charles Gilliland Distinguished Chair Professor of Marketing, Strategy, and MIS, Temple University

"Designing AI-Human Supervision to Improve Worker Performance and Reduce Gender Gap"

We study the impact of various human-AI supervisor assemblages on workers’ task performances. Academic/Practical Relevance: Despite the promises of artificial intelligence (AI), there are concerns from both workers and managers about adopting AI at workplaces. Particularly, we examine how firms can integrate AI into supervising worker performance. We utilize data from a field experiment on customer service agents in a fintech company who are randomly assigned to receive job performance feedback from human managers only, an AI bot only, or human-AI supervisory assemblages. The AI bot monitors and analyzes agent call data to generate performance feedback, which is traditionally performed by human managers. A unique feature in our experiment is that the assemblages encompass a dual human-and-AI configuration (where agents receive supervisory feedback from both human managers and an AI bot in parallel) and a shadow-AI-human-face configuration (where agents receive supervisory feedback that is generated by an AI bot but delivered by human man- agers).

We find that the AI-based supervision (AI-solo) positively impacts agent task performance on loan repayment collections by 3.3%, relative to conventional human supervision (human-solo). However, relative to AI-solo, a dual human-and-AI design negatively impacts agent performance by 4.7%-4.8%, whereas a shadow-AI-human-face design positively impacts agent task performance by 4.9%-5.0%. Additional mechanisms reveal that relative to the AI-solo supervision, agents under the shadow-AI (dual) configuration tend to make significantly more (fewer) corrections to improve the job skills, and result in better (poorer) customer reaction outcomes as measured by customer anger. The dual human-and-AI configuration leads agents to perceive greater confusion in feedback, discourages proactive feedback-seeking, and ultimately harms task performance in a vicious cycle. In contrast, the shadow-AI design increases agents’ willingness to proactively seek feedback, fostering a virtuous cycle of improvement. Finally, while AI-solo supervision widens gender performance disparities relative to human-solo, the dual configuration reduces performance for both genders and undesirably reverses the gap. In contrast, the shadow-AI configuration narrows the gender gap in a desirable way by uplifting male workers slightly greater while also enhancing female workers. These findings suggest that firms should prudently design the human-AI supervisory assemblages. As a double-edged sword, AI-based supervision should be deployed in the shadows to empower human managers, rather than to displace or compete with them, to achieve higher worker productivity, healthier worker-manager relationships, and more equitable service operations.

Raveesh Mayya

Assistant Professor of Technology, NYU Stern School of Business

"Mitigating Spoken Language Barriers in AI-Assisted Programming: Evidence from a Field Experiment"

In the talk, we document the existence of spoken language barriers in adoption of GenAI tools for tasks where the outcome of the GenAI model is “not” English sentences. We then propose a solution using intermediate representation (something like a Controlled Natural Language). In our context, the intermediate representation is Metadata prompting, which programmers understand quite well.

Amit Mehra

Professor, UT Dallas

"Mitigating Spoken Language Barriers in AI-Assisted Programming: Evidence from a Field Experiment"

This talk examines the emerging “AI supply chain,” where upstream providers offer fine-tuning and inference services of foundation models to downstream firms that develop specialized AI applications and market to consumers. Using a game-theoretic framework, we analyze the provider’s optimal pricing strategies and the effectiveness of regulatory policies within the AI supply chain. We show that providers may use loss-leader pricing for fine-tuning. Furthermore, intensified downstream price competition may reduce provider profits while benefiting firms. We also find declining compute costs can reduce downstream firms’ profits. Finally, we analyze the impact of different regulatory tools. While pro-price-competition policies and compute subsidies are not always effective, they are complementary under different cost conditions. In contrast, pro-quality-competition policies consistently enhance consumer surplus. These findings offer practical implications for pricing, investment, and regulatory design in the evolving AI market.

Lily Poursoltan

PhD Candidate, University of California, San Diego

"Cognitive Load in Human-in-the-Loop AI System: Real-World Evidence from AI-Assisted In-Basket Messaging in EHR"

Propelled by rapid technological innovation and the growing demand for efficient, high-quality care, the healthcare sector is undergoing profound change. The widespread adoption of Electronic Healthcare Record (EHR) systems—initially envisioned to achieve the Institute for Healthcare Improvement’s ‘Triple Aim’—has fundamentally transformed clinical workflows, enabling advances in information sharing, documentation, and patient safety. However, this transformation has also introduced substantial new challenges. Most notably, the surge of electronic patient “in-basket” messaging, accelerated by telemedicine and virtual care during the COVID-19 pandemic, has emerged as a critical source of physician cognitive load and burnout, with wide-ranging implications for care quality, safety, and provider well-being.

Recent advances in artificial intelligence (AI), particularly large language models (LLMs), promise to alleviate this burden by automatically drafting replies to patient messages—streamlining workflows and reducing clinician workload. In these “human-in-the-loop” AI systems, AI-generated drafts are reviewed and edited by clinicians prior to sending. Early studies highlight the potential benefits of this approach, showing that evaluators often prefer LLM-generated responses for their quality and empathy. Yet, emerging evidence—including recent findings by Tai-Seale et al. (2024)—suggests that LLM integration can inadvertently increase physician review and editing time and does not necessarily reduce overall turnaround time. These nuanced outcomes reveal that, despite their promise, human-in-the-loop AI solutions may introduce unanticipated complexities, potentially increasing cognitive burden for clinicians rather than relieving it.

Critically, there remains a significant gap in the literature: while prior research has focused on the accuracy and perceived empathy of AI-generated content, few studies have systematically examined the real-world integration of human-in-the-loop AI within production EHR platforms or rigorously quantified their impact on physician cognitive load and workflow. Addressing this gap, the present study offers the first comprehensive empirical investigation of human-in-the-loop agentic AI for in-basket message management within the Epic EHR system, focusing on the patterns and consequences of clinician editing and the cognitive loads introduced by these new technologies. Specifically, we address two central research questions: (RQ1) What categories of modifications do clinicians make when editing AI-generated clinical communications, and to what extent do they modify the AI-generated content? (RQ2) How do different categories of clinical modifications associated with cognitive burden, as measured through editing time patterns?

Leveraging real-world deployment of AI integrated into the Epic EHR for patient-provider messaging at UC San Diego Health, our study encompasses healthcare providers across 994 unique clinicians and a broad array of clinical specialties. In this workflow, patients initiate communication by selecting from four message categories—General, Results, Medications, and Paperwork—triggering the AI system to generate drafts based on category-specific prompts, demographics, and medical history. Our analytical focus centers on the editing modifications clinicians make to these AI-generated drafts before sending final responses, representing the core cognitive work in human-in-the-loop agentic AI systems.

We analyzed over 7,900 instances where clinicians edited AI-generated drafts prior to sending, capturing detailed data on both the original AI output and the final, clinician-approved message. Our two-stage analytical framework combines LLM-based edit categorization with mixed-effects regression modeling. In Stage I, we employ Llama 3.1 to systematically categorize and quantify editing patterns, producing a 15-category taxonomy validated by clinical experts across four domains: Medication Management, Diagnostic and Testing Procedures, Care Coordination, and Patient Education and Support. Editing intensity is scored on a four-point scale, capturing the depth and clinical significance of each modification. In Stage II, we use linear mixed-effects regression to analyze the relationship between edit categories and editing time—a proxy for cognitive burden—while controlling for message complexity, temporal factors, and provider-level heterogeneity.

Our empirical analysis uncovers a clear and systematic hierarchy of cognitive burden across 15 distinct categories of clinician modifications, revealing up to 53.9% increases in editing time for the most complex tasks. This enables the first empirically grounded taxonomy of cognitive demands in AI-assisted patient-clinician communication, establishing four distinct cognitive load tiers. Results highlight that interpretation tasks—such as radiology result explanations, specialist referrals, and medication side effect counseling—impose the highest cognitive burden, while administrative tasks and supportive communications represent the lowest. These findings provide actionable guidance for optimizing human-in-the-loop AI systems workflows, indicating that generic AI solutions are insufficient for high-burden categories, and that targeted automation or delegation could effectively reduce workload for lower-burden tasks.
Beyond practical implications, this study makes several contributions to the human-AI and information systems literature by (1) establishing a robust modification taxonomy for analyzing human-AI collaboration, (2) positioning cognitive burden as a critical metric for evaluating human-in-the-loop AI systems, and (3) empirically evaluating the cognitive load of agentic AI systems in real-world patient-clinician communication. The observed hierarchy of modification intensity, particularly when clinicians must override or correct AI outputs, underscores the need for continuous model refinement and specialized AI development. While our findings are based on data from a single academic medical center, our validated framework lays the foundation for future research and implementation efforts aiming to optimize human-in-the-loop agentic AI systems, improve provider well-being, and foster trust in clinical AI technologies.

Ruben R. Salas

PhD Student, The Wharton School, University of Pennsylvania

"Beyond Fluency: A Process-Centric Analysis of Human–AI Collaboration in Creative Industries"

As generative AI becomes increasingly integrated into creative workflows, a key question remains: how and when should it be used to support human creativity? This study adopts a process-centric lens to examine human–AI collaboration across two critical stages of creativity—idea generation (divergence) and idea evaluation (convergence). Using a field experiment modeled on the New Yorker Caption Contest, we randomly assign participants to one of several collaboration designs, varying the presence of AI support during each phase.

Our results reveal that AI enhances both the quantity and quality of creative output, with the greatest gains occurring when it is used in both stages. Crucially, we find that the two stages are interdependent: AI support during evaluation influences user behavior during generation, increasing idea fluency, while AI-generated content narrows semantic diversity—an effect compounded when AI also selects the output. We also uncover significant heterogeneity in AI’s value across users and outcomes. Full AI assistance benefits less creative individuals by elevating baseline performance, but can suppress the originality of more skilled users. At the same time, AI’s impact varies across the distribution of outputs: partial AI support reduces low-quality submissions, while full support boosts performance at both extremes. Together, these findings provide a nuanced perspective on AI integration that accounts for stage-specific dynamics and individual differences.

Benedikt Roder

Doctoral Researcher, Technical University of Munich

"Music Marketing with Generative AI"

Whether used in ancient rituals or included in modern playlists, music has always been integral to human expression. Nowadays, the multibillion-dollar music industry faces a significant paradox: while global music consumption is at an all-time high, most music releases struggle to gain visibility due to severe content oversupply and marketing bottlenecks. Over 99,000 tracks are uploaded daily to digital platforms, but only a fraction receive adequate promotion, especially long-tail catalog tracks released over 18 months ago. Traditional multimedia advertising, which is critical for fan engagement and streaming revenue, is also prohibitively expensive and labor-intensive, leaving smaller artists and legacy content underleveraged. These challenges are well-suited to generative AI, given its novel capabilities in multimedia content creation.

Addressing this gap, our research introduces the Marketing Asset Generation Engine (MAGE), a generative AI-based system that automates the creation of dynamic, multimodal video ads using only a track’s audio and album artwork as input. Through three preregistered field studies and one controlled lab study, we evaluate MAGE’s effectiveness across 72 campaigns promoting 33 real-world music tracks on TikTok and Meta, in collaboration with one of the world’s largest music labels. MAGE can generate over 700 million parameterized video ad variations reflecting each artist’s visual identity without requiring human intervention. For example, AI-generated assets can outperform human-designed ads by 17% in average click-through rate (CTR) while reducing production costs by 98%. Moreover, our results reveal that randomly sampled MAGE outputs perform comparably to expert-curated selections, challenging assumptions about the necessity of human oversight in creative AI workflows. Furthermore, video ads can be personalized with AI by adapting generation parameters for certain target audience demographics or platforms. Study 1 compares AI-generated and human-created ads across 50 campaigns and finds that MAGE consistently outperforms traditional methods in CTR and cost-effectiveness on TikTok, a platform aligned with short-form, audio-visual content. Study 2 further explores the value of human-in-the-loop curation, showing no significant performance gains over randomly sampled of video ads from MAGE. Study 3 introduces an evaluative AI that predicts which MAGE-generated ads will perform best for specific audience segments (based on age, gender, and platform) before launch, enabling automated content personalization at scale. These predictions were strongly correlated with actual ad performance, supporting the viability of AI-driven personalization. Finally, a lab study replicates the field study results and identifies engagement as a key mechanism driving the superior CTR performance of MAGE-generated ads, confirming that viewers found these assets more stimulating and were more inclined to click. Our findings offer critical implications for industry applications at the intersection of Generative AI and business innovation.

First, MAGE revolutionizes the economics of music marketing, allowing artists and labels to scale multimedia campaigns previously constrained by budget and labor, enabling a 98% production cost reduction, from $100 to $2 per video ad and enabling smaller artists and catalog content to access high-quality promotion. Second, MAGE introduces a paradigm shift in AI-enabled personalization: by using generation parameters and audience features as predictive signals, the system can adapt creative content without any manual customization. Third, we demonstrate that generative AI does not necessarily homogenize content; rather, it enables combinatorial creativity within artist-defined visual constraints, maintaining brand coherence while achieving large-scale creative variation. This work also contributes theoretically by offering a scalable framework for AI-driven, multimodal content generation that integrates seamlessly into existing marketing workflows. Beyond music, our methodology can apply to other industries where brand-consistent, personalized media assets are critical, such as retail, entertainment, or consumer goods. Finally, the research raises important questions about the evolving nature of creativity, labor, and control in AI-driven ecosystems, suggesting that selection of assets may be less essential for certain applications than previously assumed when generative systems are rigorously designed with creative constraints. By bridging technical innovation, empirical validation, and practical deployment, our research demonstrates how generative AI can transform marketing strategy, productivity, and audience engagement, particularly in oversaturated, content-intensive industries like music. MAGE exemplifies how AI can broaden access to high-quality creative tools, unlock underutilized catalog value, and personalize content delivery at unprecedented scale.

Gregor Schubert

Assistant Professor of Finance, UCLA Anderson School of Management

"Organizational Technology Ladders: Remote Work and Generative AI Adoption"

What drives inequality in technology adoption among firms? In this study, I propose that there is an “organizational technology ladder”: firms’ human and organizational capital adapts when firms incorporate new technologies and this adaptation can make it easier to adopt later technologies, leading to path-dependent divergence in technological capabilities.

To provide empirical evidence, I exploit the sequential exposure of firms to two large changes in organizational technology: the rise of remote work 2020-2021, and the proliferation of generative AI technology since November 2022. I use detailed job posting data to study how firms’ labor demand transformed in response to these changes in technology. I develop an IV approach to show that firms that had to adopt remote work later demand more generative AI-related skills. I also provide evidence from a synthetic difference-in-differences estimation that firms that were more exposed to generative AI technology reduced their demand for remote workers after ChatGPT was released. I explore the mechanism for these effects: When firms adopt remote work technology, they invest in skills that enable a more rapid generative AI adoption. Firms that are less able to accommodate remote work because they have lower managerial, communication, or decision-making capabilities are more likely to invest in generative AI to reduce their reliance on remote workers. I rationalize these results using a task-based model of firm investments in new technology.

João Sedoc

Assistant Professor, Stern School of Business, New York University

"To Err Is Human; To Annotate, SILICON? Reducing Measurement Error in LLM Annotation"

We introduce the SILICON workflow (Systematic Inference with LLMs for Information Classification and Notation) to guide reproducible, rigorous text annotation using Large Language Models (LLMs) in management research. SILICON integrates principles from human annotation with prompt optimization and model selection. Validated through seven case studies, it highlights the importance of expert baselines, refined guidelines, and testing multiple LLMs. LLMs show moderate to high agreement with experts in most tasks but struggle with complex multi-label classification. Reasoning-based models underperform chat-based ones. A regression-based method enables prompt/model comparisons. The workflow offers practical guidance for reducing measurement error when using LLMs in empirical research annotation.

Fangchen Song

PhD Student, The University of Texas at Austin

"Can AI Make Us Happier? A Randomized Experiment on AI’s Emotional Impact"

AI-based chatbots have emerged as a potential tool for enhancing emotional states in everyday life. In this study, we investigate whether and how AI chatbots provide emotional support, and whether their effects are beneficial for all users. We conducted controlled experiments to isolate the emotional effects of interacting with AI chatbots and identify the psychological processes involved. Results show that such interactions significantly reduce negative emotions, including sadness, anger, disgust, and fear.

To explain these effects, we test two competing mechanisms: social presence, the perception of human-like companionship, and cognitive reappraisal, the reinterpretation of emotionally salient experiences to regulate affect. Mediation analyses reveal that the emotional benefits of AI chatbot interactions are primarily driven by cognitive reappraisal, rather than perceived social presence. However, these emotional benefits are not evenly distributed across different age groups. Older adults exhibit only modest reductions in negative emotions, highlighting age-related disparities in the emotional effectiveness of AI chatbot interactions. To investigate this gap, we analyze linguistic and conversational patterns in AI responses. Compared to younger users, AI chatbots use more formal, analytical language and emphasize informational over emotional support when responding to older adults, differences that may account for the reduced emotional benefits. These findings highlight both the promise and the limitations of AI chatbots for emotional support. While such AI chatbots can potentially improve emotional states, their design must account for user heterogeneity to ensure they are effective for everyone.

He Sun

Postdoc, Yale School of Management

"A Digital Twin for Mall Consumer Behavior: Generating Modeling of Shopper Trajectories from Mall Foot-Traffic Data"

Shopping malls remain among the most data-rich yet under-optimized physical retail environments. Strategic decisions regarding store placement, tenant mix, and spatial layout are often made with limited ability to anticipate their behavioral consequences. While managers can observe what shoppers do, they are rarely equipped to ask “what if” questions: What if a store were moved closer to an anchor? If a competitor opened next door? If a new entrance shifted traffic patterns? Traditional analytics, however granular, are fundamentally backward-looking and bound to the observed world.

We propose a generative modeling framework that enables forward-looking, layout-sensitive simulation model that enables the design and testing of counterfactual spatial interventions on shopper behavior. Our goal is to construct a digital twin of mall consumer dynamics: a virtual, behaviorally realistic environment in which operators can experiment with spatial design, simulate counterfactual configurations, and evaluate predicted outcomes. By generating complete individual trajectories under alternative layouts, this framework opens a new frontier for data-driven retail strategy.

Achieving this goal requires answering a central question: Can we realistically simulate how consumers would navigate alternative retail layouts, and use these simulations to inform spatial design decisions? This question is both practically relevant and methodologically novel. While prior work has modeled shopping behavior using transaction data, surveys, or partial path data, none have enabled scalable, counterfactual generation of full shopper trajectories at the individual level. Yet this capability is essential to optimize layout before making costly changes.

We leverage a large-scale, high-resolution dataset of anonymized foot-traffic trajectories from four major shopping malls, collected via AI-enabled sensors. These sensors capture detailed, time-stamped sequences of store visits, dwell durations, entry points, demographic attributes, and visit context (e.g., time of day, day of week, weather, and whether the shopper arrived alone or in a group). The dataset includes tens of millions of individual shopping trips across varied malls, calendar periods, and shopper types, enabling granular modeling of consumer decision-making in physical space.

To simulate behavior under alternative mall configurations, we develop a generative model composed of two core components. First, a graph neural network (GNN) embeds the mall’s spatial structure, with nodes representing stores and edges reflecting walkable adjacency. Second, a conditional sequence generator produces shopper trajectories step by step, conditioned on demographic and temporal inputs such as entry time, weather, calendar effects, and group status. This architecture allows the model to learn fine-grained transition patterns across stores and generate realistic, spatially coherent shopping sequences.

We validate the model in two ways. First, we compare generated data to empirical hold-out trajectories across key statistics (e.g., number of stores visited, total dwell time, walking distance, and unique zones passed). The generated data closely replicate empirical distributions of those key target metrics, demonstrating strong internal validity. Second, we evaluate predictive accuracy in response to real-world layout changes, such as store openings and closures that occurred during the observation period and find high predictive accuracy relative to post-change data. In both validation settings, the model accurately reproduces mall-level and path-level metrics, demonstrating its robustness and predictive realism supporting its credibility as a digital twin of mall dynamics.
This project makes both methodological and practical contributions. Methodologically, we introduce a generative simulation framework for modeling individual-level shopping paths, integrating spatial layout with rich temporal and contextual features. Unlike predictive models bound to historical configurations, our model supports behavioral counterfactuals—enabling simulation of unobserved layouts and their effects on aggregate outcomes.

Practically, the framework operationalizes the concept of a digital twin in retail. Because the model accurately reconstructs observed patterns and anticipates responses to real-world layout changes, it offers a scalable platform for virtual experimentation. Retail planners can use it to simulate interventions (such as relocating stores, modifying entrance flows, or testing adjacency configurations) and evaluate derived outcomes of interest (e.g., dwell time, store-level exposure, customer throughput) in this generated digital twin world before implementation.

Importantly, such simulation is not feasible with traditional methods, which are grounded in static historical data, are inherently limited in their ability to evaluate how shopper behavior would respond to unobserved spatial changes. In contrast, our generative framework enables the simulation of entire shopper trajectories under counterfactual layouts, making it possible to assess the behavioral implications of design interventions without actual implementation. Our digital twin framework thus offers a novel and actionable approach for retail strategy grounded in machine-learned realism and operational applicability.

Alp Sungu

Assistant Professor, The Wharton School, University of Pennsylvania

"Can Generative AI Harm Teaching?"

In a large-scale randomized field experiment, we examine the impact of generative AI on K-12 teaching.

Anjana Susarla

Omura Saxena Professor of Responsible AI, Michigan State University

"Do Large Language Models’ (LLMs) Generative Capabilities Boost Creativity? Assessing AI-Augmented Creativity with LLMs"

Large Language Models (LLMs) have been increasingly used to facilitate human endeavors in complex creative tasks ranging from product ideation to digital art. Such novel capabilities of LLMs have ushered in a new era of collaboration between humans and Artificial Intelligence (AI). In this work, we explicitly observe the generative capability of generative AI (GenAI) by manipulating a dimension of LLMs – randomness -through a quasi-experiment. We then assess the perceived creativity of work through an online survey. By focusing on the human-AI collaboration process, given varying LLMs’ generative capabilities and algorithm aversion among human creators, we highlight the nature of interaction patterns in human-AI collaboration processes. We find that collaborating with a LLM with high randomness that generates more diverse advice does not necessarily lead to increased perceived creativity of work, as the role of humans matters. Moreover, we explore how the characteristics of human evaluators and their perceived extent of AI use influence their assessments of creativity. We find that compared to people with low familiarity with LLMs, human evaluators who are more familiar with LLMs tend to perceive more creativity. However, external evaluators who suspect high AI use tend to evaluate text as less creative, despite the source being undisclosed.

Prasanna (Sonny) Tambe

Professor, The Wharton School, University of Pennsylvania

"Does Organization Theory Matter for Organizing AI Agents?"

A core premise in organization, strategy, and information systems research is that firms that organize more effectively perform better, given equivalent inputs. The literature has shown that how firms organize human-human and human-technology interaction drives differential performance. The growing adoption of Generative AI technologies, capable of complex reasoning, raises the question of whether firms that better structure the interactions of AI agents can also achieve superior performance. In this paper, we explore whether foundational concepts from organization theory remain relevant when the task is to develop high-performing organizations of autonomous, interacting AI agents. We first test how division of labor affects the performance of multi-agent structures. We then test how different design choices of how to organize LLM interactions — related to team composition and leadership style — affect outcomes.

Artem Timoshenko

Associate Professor, Northwestern University

"GenAI for Identifying Customer Needs"

Voice-of-the-Customer (VOC) studies traditionally rely on the manual analysis of interview transcripts and online reviews to understand customer experiences and concisely formulate “jobs to be done.” Can LLMs formulate customer needs as well as professional analysts? We fine-tuned LLMs using VOC data and conducted a blind study with market research experts to answer this question.

Kiran Tomlinson

Senior Researcher, Microsoft Research

"Measuring the Impact of Generative AI on Work Activities and Occupations"

Given the rapid adoption of generative AI and its potential to impact a wide range of tasks, understanding the effects of AI on the economy is one of society’s most important questions. In this work, we take a step toward that goal by analyzing the work activities people do with AI, how successfully and broadly those activities are done, and combine that with data on what occupations do those activities. We analyze a dataset of 200k anonymized and privacy-scrubbed conversations between users and Microsoft Bing Copilot, a publicly available generative AI system.

We find the most common work activities people seek AI assistance for involve gathering information and writing, while the most common activities that AI itself is performing are providing information and assistance, writing, teaching, and advising. Combining these activity classifications with measurements of task success and scope of impact, we compute an AI applicability score for each occupation. We find the highest AI applicability scores for knowledge work occupation groups such as computer and mathematical, and office and administrative support, as well as occupations such as sales whose work activities involve providing and communicating information. Additionally, we characterize the types of work activities performed most successfully, how wage and education correlate with AI applicability, and how real-world usage compares to predictions of occupational AI impact.

Canberk Ucel

Assistant Professor, ESSEC Business School

"Improving Farm Productivity and Technology Adoption through LLM-Powered Digital Advisory: A Randomized Controlled Trial Among 1,500 Smallholder Rice Farmers"

Smallholder farmers are central to global food security and climate mitigation—but they face persistent barriers to adopting proven, sustainable technologies. Digital advisory tools, supported by growing mobile phone use and rural internet access, offer significant promise. Yet many industry-led efforts have struggled with low engagement and limited impact. This talk presents early insights from an ongoing randomized controlled field trial in the Philippines and India, which tests whether a generative AI-powered advisor bot can help overcome these challenges by delivering timely, localized, and interactive farm advice to 2,000 smallholder rice farmers via common messaging platforms.

Delivered via Facebook Messenger and WhatsApp, the expert AI bot provides localized, context-specific advice on practices like Direct-Seeded Rice (DSR), pest management, and seed selection, as well as broader business practices. The study compares outcomes across three groups: a control group using a general-purpose daily chatbot without the farming and business features; a treatment group with access to the expert AI bot and regular reminders; and a second treatment group that also receives short-term incentives for intensive early use of the expert bot. The evaluation leverages three rounds of online surveys that capture rich information on farming and management practices, partly inspired by the World Management Survey, as well as various socio-economic outcomes and detailed farm and household characteristics, alongside detailed user interaction data.

The bot was developed with input from agronomy experts and tailored for low-resource settings through extensive user testing—addressing safety, bandwidth, local language, and device constraints. Early data show high engagement, especially among younger farmers, and indicate demand for actionable, detailed technical guidance, alongside challenges with connectivity and device limitations.

This research contributes new evidence on how generative AI can support practice change in complex, low-bandwidth production environments—and is part of a broader agenda exploring AI’s role in advancing inclusive productivity across smallholder and microenterprise settings.

Shai Vardi

Assistant Professor, University of South Florida

"Fragile Preferences: A Deep Dive Into Order Effects in Large Language Models"

Large language models (LLMs) are increasingly used in decision-support systems across high-stakes domains such as hiring and university admissions, where decisions often involve selecting among competing alternatives. While prior work has noted positional order biases in LLM-driven comparisons, these biases have not been systematically dissected or linked to underlying preference structures. We provide the first comprehensive investigation of positional biases across multiple LLM architectures and domains, uncovering strong and consistent order effects—including a novel centrality bias not previously documented in human or machine decision-making. We also find a quality-dependent shift: when options are high quality, models exhibit primacy bias, but favor latter options when option quality is low. We further identify a previously undocumented bias favoring certain names over others. To distinguish superficial tie-breaking from true distortions of judgment, we introduce a framework that classifies pairwise preferences as robust, fragile, or indifferent. We show that order effects can lead models to select strictly inferior options, and that positional biases are typically stronger than gender biases. These findings suggest that LLMs are not merely inheriting human-like biases, but exhibit distinct failure modes not seen in human decision-making. We propose targeted mitigation strategies, including a novel use of the temperature parameter, to reduce order-driven distortions.

Gavin Wang

Assistant Professor, University of Texas at Dallas

"Large Language Models Polarize Ideologically but Moderate Affectively in Online Political Discourse"

In this paper, we examine how the release of ChatGPT influenced political discourse on social media. Using data from Reddit’s largest political discussion subreddit, we measure the political slant of each comment to trace users’ evolving ideological positions. We find that ChatGPT’s release significantly amplified political polarization: liberal users became more liberal, and conservative users more conservative. This shift was primarily driven by previously inactive users who began posting more extreme content using ChatGPT, which in turn stimulated greater engagement from active users. Notably, same-partisan ChatGPT-generated comments significantly reinforced ideological polarization, while cross-partisan comments often prompted moderation—although the scale of these cross-partisan effects was comparatively smaller, making their depolarizing effects insufficient to counterbalance the broader polarization trend. Despite this, we observe no rise in incivility or hostility; rather, expressions of anger and toxicity declined. These findings indicate that while ChatGPT heightened political polarization, it simultaneously reduced affective polarization, potentially improving the quality of online political discourse.

Shane Wang

Professor of Marketing, Virginia Tech

"Predicting Behaviors with Large Language Model (Llm)-Powered Digital Twins of Customers"

Digital twins of customers (DToC) have emerged as a promising approach to simulate consumer thinking, feeling, and decision-making in marketing contexts. This research proposes and empirically tests a methodological framework that combines fine-tuning and retrieval-augmented generation (RAG) to construct LLM-based customer digital twins. Fine-tuning on user-generated content allows the model to internalize individual traits, preferences, and behaviors, while RAG equips the twin with real-time access to contextual product information. We demonstrate the framework using Amazon e-commerce data, constructing 306 personified digital twins and evaluating their performance in predicting both purchase decisions and review contents. The resulting digital twins achieve high accuracy in predicting future purchases (83%) and generate product reviews with strong semantic alignment to actual customer content (cosine similarity above 0.94). This method opens new possibilities for personalized marketing, pre-deployment campaign testing, and privacy-compliant consumer modeling. The findings contribute to emerging literature on generative AI and synthetic agents in marketing, advancing the conceptual and technical foundation for predictive, interactive, and individualized customer simulation.

Yunfei Wang

PhD Candidate, Robert H. Smith School of Business, University of Maryland

"Authenticity in the Age of AI Music: The Effect of GPT-4 on Digital Music Consumption"

This study investigates the impact of authenticity demonstration on digital listenership for artists following the release of GPT-4. We measure artists’ authenticity with two components: originality and contagion. Combining online streaming and live events data collected from various sources, we identify the effect of artists’ authenticity on their online streaming after GPT-4’s release using a staggered differences-in-differences (DID) design. Our findings show that, on average, artists who demonstrated authenticity post-GPT-4 experienced a 6.34% increase in digital streaming compared to those who did not. Additionally, the authenticity effect was 8.68% greater for artists affiliated with indie labels than those with major labels. We conduct several robustness checks to confirm the consistency of our findings. These findings underscore the growing importance of authenticity in the AI-influenced music industry, where consumers increasingly value artists’ authenticity. Our study adapts theoretical frameworks on authenticity in digital music and offers practical insights for music industry stakeholders as they face the disruption of generative AI.

Jiannan Xu

PhD Candidate, Robert H. Smith School of Business, University of Maryland

"AI Self-preferencing in Algorithmic Hiring: Empirical Evidence and Insights"

Generative Artificial Intelligence (AI) is transforming the hiring landscape, with job applicants increasingly using Large Language Models (LLMs) to craft resumes and employers integrating LLMs into their recruitment pipelines. This dual adoption of AI raises urgent ethical concerns, particularly around AI self-preferencing bias — the tendency of an algorithm to favor its own generated content over content written by humans or generated by other models. This study empirically investigates the presence and extent of AI self-preference bias in algorithmic hiring. We distinguish between two forms of bias: (i) an LLM preferring its own generated resume over human-written ones, and (ii) an LLM preferring its own generated resume over those generated by alternative LLMs. Using a series of controlled resume correspondence experiments, we find strong and systematic evidence of both types of self-preferencing across widely used commercial and open-source LLMs. The bias against human-written resumes is more substantial and pervasive, with self-preference bias ranging from 68% to 92% across major models. These biases persist even after controlling for content quality through human annotations. Our findings reveal significant risks for fairness in the labor market, particularly for candidates who do not have access to AI tools. We conclude with policy and design recommendations to mitigate AI self-preferencing and promote more equitable and transparent hiring practices.

Kan Xu

Assistant Professor, Arizona State University, W. P. Carey School of Business

"Learning User Engagement on Gamng Platform using Deep Matching Transformers"

In the competitive landscape of online gaming platforms, predicting short-term user engagement is critical for retaining users and maintaining platform success. Existing methods that rely on demographic data and gameplay statistics often fail to capture the complexity of user behaviors and interactions, limiting their ability to predict disengagement accurately. To address this, we propose a deep learning framework designed to model the gaming process and provide interpretable insights into why users may disengage.

Our approach creates user embeddings by integrating diverse match-specific data, including game outcomes, in-game dialogue, and paralinguistic cues like emojis, to comprehensively capture user experiences. The model incorporates the game structure and user interactions, allowing it to explain potential causes for short-term user dropout. Using a multi-task learning strategy, the framework predicts engagement while considering behavioral and emotional signals, offering a nuanced view of factors influencing user retention.

Crucially, our framework balances interpretability and predictive performance. It provides actionable insights that enable platforms to develop targeted retention strategies informed by a deeper understanding of user disengagement. Empirical evaluations using real-world gaming data show that our method significantly outperforms traditional benchmarks, demonstrating its effectiveness in predicting engagement and supporting policy decisions.

This research advances user engagement analytics and policy learning by introducing an interpretable, data-driven approach for predicting short-term engagement. The proposed framework serves as a decision-support tool that enables online gaming platforms to optimize business strategies, improve user retention, and drive revenue growth by proactively addressing the challenges of user disengagement.

Haonan Yin

PhD Candidate, University of California - Irvine

"Exposure, Readiness, and Firm Valuation of Generative AI"

Generative AI (GenAI) is a powerful technology with broad potential, but its impact on firm value depends on how it’s implemented. While firms with high GenAI Exposure experience positive returns, real value emerges only when paired with strong AI Readiness—strategic alignment and technology investment. Firms with both show significant value creation, while misalignment leads to limited or negative outcomes. Success with GenAI requires high-exposure to new technology, synchronized strategy, and strong alignment between the two.

Pearl (Peiyan) Yu

Assistant Professor, Terry College of Business, University of Georgia

"The Impact of Generative AI on Freelancer Job Preferences and Bidding Behaviors: Evidence from a Randomized Field Experiment"

Generative AI has recently experienced rapid breakthroughs, offering powerful new capabilities in knowledge work. Yet how these tools might reshape worker behavior in real-world labor markets remains largely unknown. This study investigates the impact of a generative AI tool, Microsoft 365 Copilot, on freelancers’ job application and completion behaviors through a field experiment on an online labor marketplace. Half of the freelancers in our study were randomly granted access to Copilot, a GPT-powered tool integrated into Microsoft Office. The results illustrate strategic behavior changes in bidding patterns. Across contract types, high earners with Copilot access applied to a greater proportion of fixed-price contracts and proposed shorter completion times, suggesting that fixed-price contracts offer stronger incentives to use Copilot to save time and maximize earnings. When applying for hourly jobs, treated high-earning freelancers applied to lower-value hourly posts but placed higher bids relative to the median bid, while low-earning freelancers placed lower bids on similar posts. Regarding job performance, treated high earners received a higher share of ratings, indicating greater client satisfaction. Post-experiment surveys corroborate these findings, reporting improved ratings, time savings, and increased project capacity. These patterns suggest high earners leveraged Copilot to enhance quality and justify higher rates, while low earners used the tool to reduce effort and improve competitiveness through lower bids. Our findings highlight the differential impacts of generative AI on freelancers, shaping strategic job selection and bidding, and offer insights into digital labor markets, future of work and implications for platform design in AI-assisted work environments.

Shubin Yu

Associate Professor of Marketing, HEC Paris

"Step Further Towards Automated Consumer Research: Developing and Validating an AI-Powered Interview Platform"

This talk introduces MimiTalk, an AI-powered automated interview platform designed to address the challenges of traditional qualitative research methods in social sciences. By leveraging Large Language Models and advanced system architecture, the platform aims to reduce resource requirements while maintaining high-quality data collection standards. We conducted comprehensive experiments with 20 participants, evaluating the platform’s performance through both quantitative metrics (information entropy, NLP analysis, and semantic coherence) and qualitative feedback. Results demonstrate that AI-conducted interviews achieve comparable information entropy to human-led interviews, with high semantic coherence scores and positive user experience feedback. While the platform successfully addresses geographical constraints and interviewer bias, limitations in emotional cue detection and cultural sensitivity were identified. This research contributes to the evolving landscape of automated qualitative research methods, suggesting that while AI cannot fully replace human interviewers, it can significantly enhance research efficiency and scalability. The findings lay groundwork for future developments in AI-assisted qualitative research methodologies.

Zhe Yuan

Assistant Professor, Zhejiang University

"Where and How Generative AI Boosts Firm Productivity: Field Experiments in Online Retail"

We study the impact of Generative Artificial Intelligence (GenAI) on firm productivity through a series of large-scale randomized field experiments involving millions of users conducted at a leading cross-border online retail platform. Over six months, GenAI-based enhancements were integrated into seven customer-facing business workflows. We find that GenAI improves productivity in a statistically and economically significant manner, but the effects are highly workflow-specific, with gains in Gross Merchandise Volume (GMV) ranging from 0% to 16.3%. The improvements stem not from reductions in input cost—which were held constant across experimental arms—but from higher conversion rates, indicating a genuine increase in customer value. We also document heterogeneity in treatment effects: smaller and less experienced sellers benefit most from GenAI deployment; while consumer-side effects vary by context. These findings provide rare causal evidence on the effects of GenAI applications in the retail industry, and highlight the importance of process-specific insights and firm heterogeneity in evaluating emerging technologies.

Luyang Zhang

Ph.D. student, Carnegie Mellon University

"Fairshare Data Pricing for Large Language Models via Data Valuation"

Training data is the backbone of large language models (LLMs), yet today’s data markets often operate under exploitative pricing — sourcing data from marginalized groups with little pay or recognition. This paper introduces a theoretical framework for LLM data markets, modeling the strategic interactions between buyers (LLM builders) and sellers (human annotators). We begin with theoretical and empirical analysis showing how exploitative pricing drives high-quality sellers out of the market, degrading data quality and long-term model performance. Then we introduce fairshare, a pricing mechanism grounded in \textit{data valuation} that quantifies each data’s contribution. It aligns incentives by sustaining seller participation and optimizing utility for both buyers and sellers. Theoretically, we show that fairshare yields mutually optimal outcomes: maximizing long-term buyer utility and seller profit while sustaining market participation. Empirically when training open-source LLMs on complex NLP tasks, including math problems, medical diagnosis, and physical reasoning, fairshare boosts seller earnings and ensures a stable supply of high-quality data, while improving buyers’ performance-per-dollar and long-term welfare. Our findings offer a concrete path toward fair, transparent, and economically sustainable data markets for LLM.

Posters

Yue (Katherine) Feng

Associate Professor, Hong Kong Polytechnic University

"The Impact of AI-Generated Summaries on User Engagement in Video-Sharing Platforms: Evidence from A Randomized Field Experiment?"

Generative AI is increasingly adopted for content generation on online platforms. A prominent application is to produce concise summaries of video content, which is typically lengthy and complex. However, the impact of such AI-generated summaries (AIGS) on user engagement with original videos remains underexplored. In this study, we conduct a randomized field experiment on a major video-sharing platform in China. AIGS are embedded in the review sections of videos in the treatment group, and we examine their effects on multiple engagement behaviors, including likes, shares, virtual tips, and review generation. Our results show that the inclusion of AIGS significantly enhances all forms of engagement. However, these effects diminish when user-generated reviews have higher volume, greater neutrality, or more divergent opinions. Additionally, the positive effects of AIGS are more pronounced for cognitively demanding content, such as utilitarian or longer videos. Further analysis reveals that the perceived credibility of AIGS, rather than their placement, is associated with AIGS effectiveness. These findings suggest that AIGS serve as a valuable informational aid, reducing users’ cognitive load and fostering greater engagement. This research enriches our understanding of AI’s role in digital content ecosystems and offers nuanced insights for researchers, content creators, and platform managers.

Angad Jasuja

Research Assistant, Yale University

"Development of Alternative Weight-Based Implementations of Shapley Values for Feature Importance"

Increasing use of artificial intelligence (AI) and predictive models has highlighted transparency and explainability as key issues in the development and adoption of these technologies. Shapley values, derived from cooperative game theory, provide a solution to quantify the contribution of each feature in a model’s prediction. The current best practice, SHAP (SHapley Additive exPlanations), aims to solve for feature importance by removing each feature in a sequence and measuring the difference in the final prediction.

However, the current implementation of Shapley values in Python has several deficiencies, including a lack of directional information and limited consideration of all possible removal orderings. On the first deficiency, while SHAP provides information about the difference in prediction attenuated to each feature, it does not provide information on whether the predictive power of our model is helped by a feature. On the second challenge, if there are n features, then there would be n! ways of removing these features, but SHAP only looks at one specific ordering, prioritizing the features with the lowest prediction difference. This paper proposes alternative methods to calculate Shapley values using varying prediction difference metrics, such as absolute prediction difference and ROC AUC score, and weighted feature removal processes. While the absolute prediction difference mimics SHAP, the difference in ROC AUC score allows for a quantitative measure of whether feature removal affects the model’s predictive power. Additionally, while SHAP assumes removal of features through the features with the lowest difference or smallest impact, additional methods were developed to implement uniform and weighted probability removal techniques. The developed algorithms were then validated on an open-source Iranian telecommunications customer churn dataset using a random forest classifier model. A case study sourced from Datacamp was used as the reference point to compare the results of the alternative algorithms to and to ensure consistency with published literature on how to use SHAP. Initially, exploratory data analysis was performed to develop hypotheses on the predictive power of the features and also identify dataset based issues such as imbalance and missing values, which were addressed in model building and training.

The results show that the alternative algorithms provided different feature importance rankings and relative importance proportions compared to SHAP. While SHAP ranks features in a set order, the other developed algorithms highlight that this is subject to change, and the ROC AUC methods had contrasting feature importance rankings, showing that each feature’s prediction difference may vary from its help to the model’s predictive power. Additionally, because SHAP only breaks down one such ordering of feature removal, the variation in feature removal for a particular variable, “Complains,” was also analyzed, revealing that removal order produces variations in the difference attributed to it. These disparities highlight the interpretability and subjectivity relating to transparency in AI systems and the acute need for accurate feature importance frameworks to improve understanding and trust in predictive models. This also shows that there may be potential improvements needed for SHAP and that it is not a “one size fits all” approach when it comes to feature importance and explainability.

Tai Lam

Assistant Professor, UCLA Anderson

"Generative AI in Equilibrium: Evidence from a Creative Goods Marketplace"

We study the implications of generative artificial intelligence (GenAI) for the production and consumption of creative goods such as images, music, and writing. We start with a simple model of technology adoption, production and consumer search to highlight diverse equilibrium implications of GenAI. The combination of cost and quality advantages of GenAI determine congestion, crowd-out and match rates. Then, using a difference-in-differences design, we causally estimate the impact of GenAI on product production, entry, sales, quality and variety.

We find that GenAI is a substitute for non-GenAI products and crowds out the production of non-GenAI content. Still, substantial GenAI firm entry leads to an increase in the quality and variety of produced and sold goods, expanding sales. Thus, our results imply that unregulated GenAI poses a substantial threat to non-GenAI production but is likely beneficial for some consumers. Market heterogeneity suggests that legal and labeling policies may help mitigate concerns of non-GenAI crowd-out, but smaller niche markets are particularly at risk. Our findings add empirical evidence to an ongoing debate over the use of copyrighted material in GenAI training.

Christine Lee

Assistant Professor, California Polytechnic State University

"Competitor or Collaborator? UX Designers’ Experiences, Emotions, and Identity Negotiation with AI"

As artificial intelligence (AI) becomes increasingly embedded in creative and technical workflows, UX design professionals are navigating a time of rapid transformation. This research investigates how UX designers are thinking and feeling about AI as it enters their workspaces, workflows, and professional identities.

The study used a qualitative phenomenological approach to surface the lived experiences of twenty UX professionals across a range of industries and company sizes. Participants were interviewed using semi-structured and unmoderated interviews with open-ended questions about how AI has affected their role, creative process, feelings about job security, and expectations for the future. The research design was grounded in constant comparative analysis, allowing key themes to emerge directly from the data.

Five primary themes emerged from the analysis: AI as a collaborative partner, anxiety around job relevance, compression and role overload, evolving professional identity, and the emotional tension between control and uncertainty. These themes reflect not just how designers are using AI but how they are making sense of its presence, interpreting its implications, and integrating it into their thinking about career, creativity, and capability. Using Albert Bandura’s self-efficacy theory as a conceptual framework, the study explores how designers interpret AI’s role as well as their role when working with AI, what they perceive as its risks and rewards, and how those perceptions shape their engagement, mindset, and emotional state.

The most common theme was that of AI as a collaborative partner. Many designers described AI tools as companions that support ideation, validate workflows, or streamline time-consuming tasks. They use AI to synthesize research notes, brainstorm design directions, or act as a neutral sounding board for early-stage ideas. Rather than viewing AI as a rival, many participants spoke about it as a tool that helps them think more clearly, work more efficiently, and explore a broader set of creative options. This theme aligns closely with Bandura’s concept of verbal persuasion, as designers referenced AI feedback often provided reassurance, guidance, and momentum in otherwise ambiguous and potentially lonely phases of the design process.

A second major theme was anxiety about future job security and future prospects. Participants voiced concern that AI might erode the need for foundational design tasks, especially tasks that rely on speed, pattern recognition, or standardization. Several junior designers expressed fear that the value of human design work could be diminished or misunderstood by managers who see AI as a cheaper or faster alternative. Designers who saw AI as more of a threat than an opportunity described feeling disoriented, doubtful, or creatively blocked. Others who interpreted AI as a tool described feeling motivated to learn, adapt, and lead.

The third theme was compression and role overload. Participants repeatedly mentioned that AI’s efficiency could paradoxically lead to increased expectations as teams shrink and each designer is asked to do more. Designers described feeling that they had to take on broader responsibilities across strategy, execution, development, and sometimes even marketing. Many anticipated a future where UX designers would need to be more hybrid, capable of integrating insights from multiple functions while also guiding and interpreting AI output. This sense of role expansion created both excitement and concern, and surfaced questions about boundaries, quality control, and the sustainability of design work in the age of automation.

The fourth theme was evolving professional identity. Designers discussed the challenge of staying relevant as their own value proposition shifts from tactical execution to strategic orchestration. They spoke about needing to reframe their skills, reposition their contributions, and make the case that human-centered design is about far more than layout and interaction. For many, AI prompted an identity shift from “”designer”” to “”curator,”” “”facilitator,”” or “”translator.”” These shifts were not only cognitive but emotional, as they involved reimagining professional purpose and redefining what success looks like in an AI-enhanced environment.

The final theme was the tension between control and uncertainty. Designers described moments of wonder and inspiration when AI performed at or above their expectations, but also discomfort when outputs felt unpredictable, generic, or out of sync with human nuance. They toggled between curiosity and skepticism, empowerment and vulnerability. This tension maps directly onto Bandura’s framework, where self-efficacy is shaped not just by what people can do, but by how they interpret what happens when they try.

Taken together, these themes suggest that the impact of AI on UX design is not primarily about job elimination but about emotional adaptation, identity shifts, and changing definitions of value. Designers are not only learning how to use new tools, they are also learning how to feel competent, confident, and creative in a new context. Those with high self-efficacy described more adaptive responses, such as experimenting with AI tools, seeking training, or initiating conversations about integration strategies. Those with lower self-efficacy expressed more hesitation, avoidance, and self-doubt.

This research contributes to ongoing conversations about AI and the future of work by emphasizing the psychological and emotional dimensions of technological change. Rather than treating designers as passive recipients of AI, it centers their voices as active meaning-makers who are interpreting, shaping, and negotiating what this transformation means. For organizations seeking to implement AI in design teams, the findings point to the need for emotional intelligence, mentorship, and investment in self-efficacy as opposed to only technical training. When designers feel confident in their ability to adapt and lead, they are more likely to collaborate with AI in productive, ethical, and creatively expansive ways.

Yi Liu

Assistant Professor, University of Wisconsin-Madison

"Generative AI, Open Source, and Application-Layer Product Development"

The rise of foundation models in generative AI (GenAI) driven by the development of large language models (LLMs), is reshaping firms’ product development strategies through open-source technologies. Unlike traditional open-source environments characterized by collaboration and standardization, LLM-based open-source technologies present unique complexities, such as fine-tuning uncertainty and limited inter-firm collaboration, which influence both horizontal and vertical innovation efforts of firms. This paper develops an analytical model to examine how these complexities shape firms’ innovation strategies in product development, contrasting scenarios involving traditional open-source technologies with those built on LLMs. Our results reveal that fine-tuning uncertainty acts as a double-edged sword: while it disrupts predictable product development outcomes for firms, it can also introduce artificial vertical differentiation, mitigating direct competition with other firms and potentially boosting profitability.

Yizhi Liu

PhD Candidate, University of Maryland, College Park

"Deepfakes for Good: An Agentic AI Framework for Mitigating Bias"

Measuring bias in decisions involving visual cues presents significant methodological challenges, as traditional approaches struggle to isolate the causal effect of bias-sensitive attributes while maintaining high controllability. We address this challenge by proposing a novel framework: calibration deepfakes. Through systematic analysis of 826 academic papers, we identify that while deepfakes are commonly characterized as malicious tools, they have substantial yet unexplored potential for bias measurement by creating controlled visual variations.

Our framework precisely modifies bias-sensitive attributes (e.g., race, age) in images while preserving decision-relevant ones, enabling causal identification of bias in decision-making. We demonstrate this approach in pain assessment, where 3,802 assessors evaluated deepfake-modified facial images displaying identical pain expressions across demographic variations. Results reveal systematic bias patterns across demographic variables, particularly a consistent own-race effect in pain assessment. Notably, these biases manifest in distinct facial regions: White assessors focus primarily on upper facial features, while Black assessors attend more to lower facial areas. These findings suggest that culturally shaped attentional mechanisms drive bias in pain assessments. Given that human biases are persistent and correcting them is costly and time-consuming, we further develop Multi-Agent Debiasing Systems (MADS) that automate the entire bias measurement and correction pipeline, demonstrating effective bias mitigation in decision-making. Our work establishes a rigorous framework for repurposing controversial technologies for societal benefit and provides actionable insights for bias mitigation in healthcare and other high-stakes decision contexts.

Karthik Babu Nattamai Kannan

Assistant Professor of IT and Operations Management, Southern Methodist University

"The Impact of Generative AI on Open-Source Community Engagement"

Emergent literature has shown significant productivity gains from generative AI tools, such as GitHub Copilot, in software development. However, the mechanisms driving these gains, particularly in distributed software development contexts, remain largely unexplored. In this study, we exploit GitHub’s trial of “CoPilot for Pull-Requests (PR)” to examine the impact on PR latency — the time taken to close PRs. We analyze more than 100 open-source repositories with around 200,000 PRs using Propensity Score Matching and panel data analysis to find that the usage of Copilot for PR reduces latency by as much as 4 days on average. This reduction stems from a decreased number of required reviewers, saving substantial review time. Repository-level analysis further reveals that higher Copilot adoption correlates with cumulative latency reductions, demonstrating its scalable impact on workflow efficiency. Further analysis of the underlying mechanism suggests that Copilot-generated summaries improve collaboration by enabling precise and comprehensive documentation at the PR creation stage, reducing review iterations. These findings provide critical insights into the role of AI in optimizing collaborative software development workflows. Copilot not only enhances individual productivity but also addresses systemic inefficiencies within the pull-based development model. The results underscore the importance of integrating AI tools into the software development lifecycle while addressing potential challenges, such as workflow friction and information asymmetries, to fully realize their benefits. This research contributes to the growing literature on AI-driven productivity and offers actionable recommendations for practitioners seeking to enhance project performance through AI-enabled tools.

Jiaxin Pei

Postdoc, Stanford University

"The Automated but Risky Game: Modeling Agent-to-Agent Negotiations and Transactions in Consumer Markets"

AI agents are increasingly used in consumer-facing applications to assist with tasks such as product search, negotiation, and transaction execution. In this paper, we explore a future scenario where both consumers and merchants authorize AI agents to fully automate negotiations and transactions. We aim to answer two key questions: (1) Do different LLM agents vary in their ability to secure favorable deals for users? (2) What risks arise from fully automating deal-making with AI agents in consumer markets? To address these questions, we develop an experimental framework that evaluates the performance of various LLM agents in real-world negotiation and transaction settings. Our findings reveal that AI-mediated deal-making is an inherently imbalanced game — different agents achieve significantly different outcomes for their users. Moreover, behavioral anomalies in LLMs can result in financial losses for both consumers and merchants, such as overspending or accepting unreasonable deals. These results underscore that while automation can improve efficiency, it also introduces substantial risks. Users should exercise caution when delegating business decisions to AI agents.

Tianyi Peng

Assistant Professor, Columbia University

More

Twin-2K-500: A dataset for building digital twins of over 2,000 people based on their answers to over 500 questions.

Katsiaryna Siamionava

Assistant Professor of Information Systems, Arizona State University

"Do You See My Side? Field Evidence on How Stance and Perspective Shape Engagement with LLM-powered Chatbots"

In this study, we examine the effects of language framing and stance alignment of LLM-powered chatbots on user engagement during conversations about socially sensitive topics such as climate change and abortion. Through a field experiment involving over 1,000 chat sessions on social media summarization platform KOAT, we manipulate two key chatbot design features — user-chatbot stance alignment and chatbot’s perspective-taking — and quantify their impact on user responses. Findings show that misaligned stances reduce user engagement, but contextual perspective-taking mitigates negative reactions and fosters curiosity toward opposing views. The results highlight how conversational framing in LLMs can shape feedback, sharing behavior, and openness across ideological divides, offering practical guidance for building more inclusive and balanced AI systems.

Michal Strahilevitz

Professor, Saint Mary's College of California

"Competitor or Collaborator? UX Designers’ Experiences, Emotions, and Identity Negotiation with AI"

As artificial intelligence (AI) becomes increasingly embedded in creative and technical workflows, UX design professionals are navigating a time of rapid transformation. This research investigates how UX designers are thinking and feeling about AI as it enters their workspaces, workflows, and professional identities.

The study used a qualitative phenomenological approach to surface the lived experiences of twenty UX professionals across a range of industries and company sizes. Participants were interviewed using semi-structured and unmoderated interviews with open-ended questions about how AI has affected their role, creative process, feelings about job security, and expectations for the future. The research design was grounded in constant comparative analysis, allowing key themes to emerge directly from the data.

Five primary themes emerged from the analysis: AI as a collaborative partner, anxiety around job relevance, compression and role overload, evolving professional identity, and the emotional tension between control and uncertainty. These themes reflect not just how designers are using AI but how they are making sense of its presence, interpreting its implications, and integrating it into their thinking about career, creativity, and capability. Using Albert Bandura’s self-efficacy theory as a conceptual framework, the study explores how designers interpret AI’s role as well as their role when working with AI, what they perceive as its risks and rewards, and how those perceptions shape their engagement, mindset, and emotional state.

Wendao Xue

Postdoc, The University of Texas at Austin

"The Spread of the Political Video DeepFakes on Social Media"

Social media platforms face growing challenges in combating misinformation, particularly with the rise of AI-generated deepfakes. Video deepfakes are often considered more dangerous than text-based misinformation due to their media richness. However, empirical evidence on users’ sharing behavior and perceived believability of video misinformation remains mixed.

We conducted a preregistered experiment using a Facebook-like platform to examine how video versus text formats influence misinformation sharing and believability. Participants were randomly assigned to one of four post types: 1. Authentic political news videos, 2. Authentic text news, 3. AI-generated deepfake videos, and 4. Fake text news. To assess network-level spread, we simulated diffusion on a Facebook social graph, parameterized by experimentally derived sharing probabilities.

Findings:

  • Lower Sharing of Video Misinformation: Users shared significantly less misinformation in video format than in text.
  • Mechanism – Enhanced Detection: Users were more likely to believe authentic video news and less likely to believe fake video news compared to text format.
  • Reduced Network Spread: Video format reduced misinformation diffusion by ≈60 fewer sharers and ≈114 fewer exposures in a Facebook social graph (4,039 nodes; 88,234 edges).
  • Sharing remained higher when misinformation aligned with participants’ political beliefs.

Maggie Zhang

Postdoctoral Research Associate, University of Virginia, McIntire School of Commerce

"When Influencers Delegate Replies: How Social AI Agents Shape User Engagement"

As social media platforms deploy LLM-powered agents to help influencers manage social relationships with users, it remains unclear how this delegation impacts user engagement. Automating interactions provides scalability and efficiency for influencers, but it may weaken the influencer-user relationship if the agents fail to serve as effective social delegates. To explore this question, we empirically investigate the impact on user engagement when influencers delegate social interaction tasks, such as replying to comments, to a Social AI Agent—an LLM-powered proxy that responds on behalf of an influencer. Leveraging the rollout of a Social AI Agent feature on a major social media platform, we use a staggered difference-in-differences design to compare engagement behaviors between users who received an AI reply (i.e., a reply from an influencer’s Social AI Agent) and those who did not. Our results show that receiving an AI reply significantly increases user commenting on subsequent influencer posts, particularly when AI replies amplify an influencer’s social presence, as reflected in content relevance, stylistic alignment, and reply timeliness. We also find heterogeneous effects based on influencer-user relationships: engagement gains are stronger among fans and weaker for commercialized influencers. Additionally, an engagement boost is observed for both sponsored and non-sponsored posts from influencers and extends to user reposting behavior. This study contributes to the literature on AI delegation and influencer engagement by highlighting when and how delegating social relationship management to Social AI Agents can enhance user engagement.

Echo Zhou

PhD Candidate, Stanford Graduate School of Business