Luke Wroblewski - Ploar founder

另一种聊天界面布局 || An Alternative Chat UI Layout

2025-11-10T18:00:00+00:00

如今，几乎每个软件应用都在添加AI聊天功能。由于这些功能在额外思考和工具辅助下表现更佳，自然这些元素也随之加入。当这种情况发生时，相同的可用性问题会在不同应用中反复出现，我们设计师需要新的解决方案。

聊天本是个简单且广为人知的交互模式...那么问题出在哪里？当只是两个人在消息应用中对话时，一切都很简单。但当对话另一端是充满推理痕迹和工具调用（即具有代理能力）的AI模型时，聊天就不再那么简单了。

现在的模式不再是"你提问-AI回答"，而是演变为：

你提出问题
模型展示其思考过程
调用工具并显示结果
阐述对结果的看法
继续调用其他工具...

虽然这类代理循环极大增强了AI模型的能力，但它们看起来更像是冗长的内心独白，而非两人间的对话交流。当聊天功能以侧边面板形式加入现有应用时，有限的屏幕空间使得这种"独白式"交互的问题更加突出。

在VS Code等开发环境中使用Augment Code就典型体现了这个问题。狭窄的侧边面板需要显示Augment编写和编辑代码时的多重思考痕迹和工具调用。虽然其工作成果令人惊叹，但在窄面板中跟踪整个过程却十分困难。任务完成时，初始用户消息早已滚出视野，人们不得不上下滚动来获取上下文并评估结果。

此时设计团队开始思考：模型中多少内部独白需要显示？哪些部分可以删除或折叠？不同应用给出了不同答案。但关键在于，了解AI的工作过程往往很有价值，完全隐藏并非良策。

如果能将AI模型完成任务的过程（思考痕迹、工具调用）与最终结果分离会怎样？这正是"聊天+画布"设计模式的核心思想——过程与结果分处不同区域。虽然理论上很美好，但实践中很难清晰界定何为输出、何为过程。"最终"结果需要多完整？后续问题如何处理？中间步骤又如何安排？

即便能清晰分离过程与结果，也只是实现了视觉上的区隔。这并不理想，特别是当二者需要相互参照时。

为解决这些问题，我们探索了双滚动面板的AI聊天界面布局。用户指令、思考过程和工具调用显示在左列，结果输出在右列。当AI完成思考和工具使用后，过程部分会折叠，左列仅显示摘要，而右列结果保持持久可滚动。

为对比说明，下图展示了ChatDB中原有的代理式聊天界面（附视频）。侧边面板中用户输入指令，模型展示思考过程、工具使用及结果。尽管我们已经折叠了大量思考和工具调用内容，用户仍需在初始消息和最终结果间频繁滚动。

重新设计的双面板布局中，初始指令和过程显示在左列，结果输出在右列。这种设计让用户能同时保持两者上下文。如下方视频所示，您可以轻松滚动查看结果，同时始终可见产生这些结果的指令和过程。

鉴于相同的代理式UI问题出现在多个应用中，我们计划在更多场景测试这种布局，以深入了解其优缺点。随着AI技术的快速演进，相信还会有新的挑战等待我们思考。

---------------

Nowadays it seems like every software application is adding an AI chat feature. Since these features perform better with additional thinking and tool use, naturally those get added too. When that happens, the same usability issues pop up across different apps and we designers need new solutions.

Chat is a pretty simple and widely understood interface pattern... so what's the problem? Well when it's just two people talking in a messaging app, things are easy. But when an AI model is on the other side of the conversation and it's full of reasoning traces and tool calls (aka it's agentic), chat isn't so simple anymore.

Instead of "you ask something and the AI model responds", the patterns looks more like:

You ask something
The model responds with it's thinking
It calls a tool and shows you the outcome
It tells you what it thinks about the outcome
It calls another tool ...

While these kinds of agentic loops dramatically increase the capabilities of AI models, they look a lot more like a long internal monologue than a back and forth conversation between two people. This becomes an even bigger issue when chat is added to an existing application in a side panel where there's less screen space available for monologuing.

Using Augment Code in an development application, like VS Code, illustrates the issue. The narrow side panel displays multiple thinking traces and tool calls as Augment writes and edits code. The work it's doing is awesome, staying on top of it in a narrow panel is not. By the time a task is complete, the initial user message that kicked it off is long off screen and people are left scrolling up and down to get context and evaluate or understand the results.

That this point design teams start trying to sort out how much of the model's internal monologue needs to be shown in the UI or can parts of it be removed or collapsed? You'll find different answers when looking at different apps. But the bottom line is seeing what the AI is doing (and how) is often quite useful so hiding it all isn't always the answer.

What if we could separate out the process (thinking traces, tool calls) AI models use to do something from their final results? This is effectively the essence of the chat + canvas design pattern. The process lives in one place and the results live somewhere else. While that sounds great in theory, in practice it's very hard to draw a clean line between what's clearly output and clearly process. How "final" does the output need to be before it's considered "the result"? What about follow-on questions? Intermediate steps?

Even if you could separate process and results cleanly, you'd end up with just that: the process visually separated from the results. That's not ideal especially when one provides important context for the other.

To account for all this and more, we've been exploring a new layout for AI chat interfaces with two scroll panes. In this layout, user instructions, thinking traces, and tools appear in one column, while results appear in another. Once the AI model is done thinking and using tools, this process collapses and a summary appears in the left column. The results stay persistent but scrollable in the right column.

To illustrate the difference, here's the previous agentic chat interface in ChatDB (video below). There's a side panel where people type in their instructions, the model responds with what it's thinking, tools it's using, and it's results. Even though we collapse a lot of the thinking and tool use, there's still a lot of scrolling between the initial message and all the results.

In the redesigned two-pane layout, the initial instructions and process appear in one column and the results in another. This allows people to keep both in context. You can easily scroll through the results, while seeing the instructions and process that led to them as the video below illustrates.

Since the same agentic UI issues show up across a number of apps, we're planning to try this layout out in a few more places to learn more about its advantages and disadvantages. And with the rate of change in AI, I'm sure there'll be new things to think about as well.

重新思考AI/ML时代的网络架构 || Rethinking Networking for the AI/ML Era

2025-10-31T17:00:00+00:00

在Sutter Hill Ventures举办的AI演讲系列中，谷歌杰出工程师Nandita Dukkipati阐述了AI/ML工作负载如何彻底颠覆了传统网络架构。以下是我记录的演讲要点：

AI打破了我们的网络假设。传统网络允许存在一定延迟波动和偶发故障，但AI工作负载要求极致性能：高带宽、超低抖动（微秒级）和近乎完美的可靠性。一个慢节点就会导致整个训练任务失败。

AI的特殊性在于：这些工作负载采用批量同步并行计算。所有节点必须在屏障处等待每个步骤完成。最慢的工作节点决定整体速度——即使100个节点中有99个快速完成也无济于事。

真实案例：Gemini流量显示数百毫秒的线速传输，但平均利用率仅为峰值的1/5。同步突发流量使统计复用优势荡然无存，同时对延迟敏感和带宽密集。

三大技术突破

Falcon（硬件传输协议）：现有硬件传输假设无损网络，与以太网根本冲突。Falcon将十年软件优化精髓硬件化，实现100倍提升：基于延迟的拥塞控制、智能负载均衡、现代丢包恢复。原本受限于软件扩展的高性能计算应用，使用Falcon后立即突破瓶颈。

CSIG（拥塞信号）：端到端拥塞控制存在盲区——无法感知反向路径拥塞或可用带宽。CSIG在每个数据包中以线速嵌入多比特信号（可用带宽、路径延迟），无需探测。关键创新：在应用上下文中提供信息，精准定位拥塞路径。

Firefly：抖动是AI工作负载的致命伤。Firefly通过分布式共识实现数百张网卡间亚10纳秒级同步。实测示波器显示±5纳秒精度，将松散连接的机器转变为紧密耦合的计算系统。

现存挑战

掉队节点检测：即使网络完美无缺，在上千个GPU中定位单个慢节点仍是最大难题。整体工作负载降速使得定位元凶几乎不可能。统计离群值分析噪音过大，目前仍在积极研究中。

核心结论：AI网络需要同步解决传输、可视化、同步和弹性问题。在AI应用具备更强容错性之前（短期内难以实现），基础设施必须提供近乎完美的表现。我们正从被动尽力而为的网络转向精准调度的网络，从软件传输转向硬件传输，从人工调试转向自动恢复。

---------------

In her AI Speaker Series presentation at Sutter Hill Ventures, Google Distinguished Engineer Nandita Dukkipati explained how AI/ML workloads have completely broken traditional networking. Here's my notes from her talk:

AI broke our networking assumptions. Traditional networking expected some latency variance and occasional failures. AI workloads demand perfection: high bandwidth, ultra-low jitter (tens of microseconds), and near-flawless reliability. One slow node kills the entire training job.

Why AI is different: These workloads use bulk synchronous parallel computing. Everyone waits at a barrier until every node completes its step. The slowest worker determines overall speed. No "good enough" when 99 of 100 nodes finish fast.

Real example: Gemini traffic shows hundreds of milliseconds at line rate, but average utilization is 5x below peak. Synchronized bursts with no statistical multiplexing benefits. Both latency sensitive AND bandwidth intensive.

Three Breakthroughs

Falcon (Hardware Transport): Existing hardware transports assumed lossless networks: fundamentally incompatible with Ethernet. Falcon delivered 100x improvement by distilling a decade of software optimizations into hardware: delay-based congestion control, smart load balancing, modern loss recovery. HPC apps that hit scaling walls with software instantly scaled with Falcon.

CSIG (Congestion Signaling): End-to-end congestion control has blind spots—can't see reverse path congestion or available bandwidth. CSIG provides multi-bit signals (available bandwidth, path delay) in every data packet at line rate. No probing needed. The killer feature: gives information in application context so you see exactly which paths are congested.

Firefly: Jitter kills AI workloads. Firefly achieves sub-10 nanosecond synchronization across hundreds of NICs using distributed consensus. Measured reality: ±5 nanoseconds via oscilloscope. Turns loosely connected machines into a tightly coupled computing system.

The Remaining Challenges

Straggler detection: Even with perfect networking, finding the one slow GPU in thousands remains the hardest problem. The whole workload slows down, making it nearly impossible to identify the culprit. Statistical outlier analysis is too noisy. Active work in progress.

Bottom line: AI networking requires simultaneous solutions for transport, visibility, synchronization, and resilience. Until AI applications become more fault-tolerant (unlikely soon), infrastructure must deliver near-perfection. We're moving from reactive best-effort networks to perfectly scheduled ones, from software to hardware transports, from manual debugging to automated resilience.

设计团队如何应对AI带来的10倍开发生产力提升 || How Design Teams Are Reacting to 10x Developer Productivity from AI

2025-10-30T19:00:00+00:00

如今显而易见，AI编程助手能极大缩短软件开发周期。但当开发团队生产力爆发式增长时，设计团队该如何应对？以下是我观察到的最常见反应。

在我工作过的所有科技公司里，无论规模大小，总弥漫着"资源不足难以实现所有目标"的思维定式。无论这是否是借口，企业始终在追求更高效率。而现在，我们终于拥有了这种可能。

越来越多开发者发现，当代AI编程助手显著提升了他们的生产力。例如亚马逊的Joe Magerramov近期阐述其"团队10倍吞吐量提升并非理论，而是可量化的"。在质疑"氛围编程，垃圾代码"之前，他的文章详细展示了开发者如何在200英里时速下保持高质量，并重构流程以实现每日100次提交而非10次。

当开发团队交付速度提升10倍时，软件设计团队将面临什么？我观察到三种典型反应：

我们的角色已改变
我们也更快了
只是更快的垃圾

我们的角色已改变

设计师不再将主要精力放在制作工程师后续需要实现的线框图上，而是越来越关注功能开发后的UX整合。即确保开发者快速编码的大量功能能融入连贯的产品体验。这种角色反转重塑了设计与开发的关系。

多年来设计团队始终"领先"于工程团队，不受技术债务和架构限制。设计师通过线框图和原型在开发前构思可能性，而开发者则需在实现时处理各种边界情况、状态和技术问题。

如今开发团队"反超"设计团队，新功能以惊人速度转化为代码。UX优化及与产品结构目标的深度融合成为后续必要的"收尾工作"。

我们也更快了

越来越多设计师开始使用AI编程工具制作原型甚至交付功能。既然开发者能借AI提速，设计师为何不可？这使他们更贴近实际产品而非抽象线框图。Perplexity的设计师与工程师直接协作将提示词作为编程语言。Sigma的设计师则使用Augment Code等工具在生产环境修复UX问题。

只是更快的垃圾

第三种反应更具怀疑精神：AI提升开发速度不等于能创造优质产品。虽然占据道德高地令人愉悦，但现实是软件开发正在变革。开发者短期内不会回归1倍生产力。

"天下事物九成糟"——斯特金定律

值得重温斯特金定律的由来：当被问及为何90%科幻小说都很糟时，这位科幻作家答道：任何领域90%的作品都是糟粕。

AI生成的代码大多不够优秀？确实，但传统代码同样如此。无论使用何种工具，创造优秀产物始终艰难。对设计师和开发者而言，工具在变，本质工作未变。

---------------

At this point it's pretty obvious that AI coding agents can massively accelerate the time it takes to build software. But when software development teams experience huge productivity booms, how do design teams respond? Here's the most common reactions I've seen.

In all the technology companies I've worked at, big and small, there's always been a mindset of "we don't have enough resources to get everything we want done." Whether that's an excuse or not, companies consistently strive for more productivity. Well, now we have it.

More and more developers are finding that today's AI coding agents massively increase their productivity. As an example, Amazon's Joe Magerramov recently outlined how his "team's 10x throughput increase isn't theoretical, it's measurable." And before you think "vibe coding, crap" his post is a great walkthrough on how developers moving at 200 mph are cognizant of the need to keep quality high and rethink a lot of their process to effectively implement 100 commits a day vs. 10.

But what happens to software design teams when their development counterparts are shipping 10x faster? I've seen three recurring reactions:

Our Role Has Changed
We're Also Faster Now
It's Just Faster Slop

Our Role Has Changed

Instead of spending most of their time creating mockups that engineers will later be asked to build, designers increasingly focus on UX alignment after things are built. That is, ensuring the increased volume of features developers are coding fit into a cohesive product experience. This flips the role of designers and developers.

For years, design teams operated "out ahead" of engineering, unburdened by technical debt and infrastructure limitations. Designers would spend time in mockups and prototypes envisioning what could be build before development started. Then developers would need to "clean up" by working out all the edge cases, states, technical issues, etc that came up when it came time to implement.

Now development teams are "out ahead" of design, with new features becoming code at a furious pace. UX refinement and thoughtful integration into the structure and purpose of a product is the "clean up" needed afterward.

We're Also Faster Now

An increasing number of designers are picking up AI coding tools themselves to prototype and even ship features. If developers can move this fast with AI, why can't designers? This lets them stay closer to the actual product rather than working in abstract mockups. At Perplexity, designers and engineers collaborate directly on prompting as a programming language. At Sigma, designers are fixing UX issues in production using tools like Augment Code.

It's Just Faster Slop

The third response I hear is more skeptical: just because AI makes developers faster doesn't mean it makes good products. While it feels good to take the high ground, the reality is software development is changing. Developers won't be going back to 1x productivity any time soon.

"Ninety percent of everything is crap" - Sturgeon's law

It's also worth remembering Sturgeon's Law which originated when the science fiction writer was asked why 90% of science fiction writing is crap. He replied that 90% of everything is crap.

So is a lot of AI-generated code not great? Sure, but a lot of code is not great period. As always it's very hard to make something good, regardless of the tools one uses. For both designers and developers, the tools change but the fundamental job doesn't.

用AI攻克常见用户体验障碍 || Tackling Common UX Hurdles with AI

2025-10-23T00:00:00+00:00

某些用户体验问题伴随我们已久，以至于我们不再相信自己能做得更好。需要从用户那里收集数据？网页表单。用户不理解你的应用如何运作？新手引导。但新技术带来了新机遇，包括解决长期存在的用户体验挑战的方法。

如今AI在软件应用中主要表现为附加在用户界面侧边的聊天面板。虽然这通常有用，但并非利用AI改善应用体验的唯一方式。我们还可以利用AI模型的强项来解决存在多年的常见用户痛点。

我曾撰文探讨过其中一些方法，但认为总结几点来说明更高层次的见解会很有帮助。

重新思考新手引导

大多数应用从空白状态开始，通过新手流程教导用户如何使用。展示界面、解释功能、演示示例，期待用户坚持到发现价值的那一刻。

AI颠覆了这个模式。它可以从生成内容开始，而非空白页面。让用户第一天就有可编辑的内容，使他们能优化而非从零创造。区别在于即时参与感与延迟满足感。用户能立即使用产品，因为已有现成内容可供操作。他们通过观察可能性、修改和实践来学习。

更多内容见让AI负责新手引导...

重新思考搜索

传统搜索界面意味着关键词框、下拉菜单、分面筛选器。想找特定内容？先学习我们的分类法，理解归类体系，点击多重筛选选项。

AI模型内嵌世界知识，理解上下文，能将自然语言问题转化为多步查询。"给我看90年代高评分动作片"不需要分别选择类型、年代和评分阈值的下拉菜单。AI会解析查询结构，组合筛选条件，返回结果。

用户的搜索方式千差万别，AI比僵硬的UI控件更能应对这种多样性。

更多内容见世界知识提升AI应用...

重新思考表单

网页表单的存在是为了将信息结构化存入数据库。字段标签、输入类型、验证规则，表单强迫用户将信息塞入预设框架。

但AI擅长处理非结构化输入。用户只需上传图片、PDF文件或URL，AI就能提取结构化数据，自动填充数据库字段。机器代替人类完成格式化工作，将负担从用户转移至系统。人们自然表达，软件处理结构。

更多内容见AI应用中的非结构化输入替代网页表单...

这些案例有个共同点：AI能力让我们重新思考人机交互方式。不是给现有模式添加AI功能，而是基于AI的可能性重构模式本身。塑造当前UX惯例的限制条件正在改变，是时候重新审视我们的解决方案了。

---------------

Some UX issues have been with us so long that we stopped thinking we could do better. Need to collect data from people? Web forms. People don't understand how your app works? Onboarding. But new technologies create new opportunities including ways to tackle long-standing UX challenges.

Today AI mostly shows up in software applications as a chat panel bolted onto the side of a user interface. While often useful, it's not the only way to improve an application's user experience with AI. We can also use what AI models are good at to address common user pain points that have been around for years.

I've written about some of these approaches but thought it would be useful to summarize a few in order to illustrate the higher level point.

Rethinking Onboarding

Most apps start with empty states and onboarding flows that teach people how to use them. Show the UI. Explain the features. Walk through examples. Hope people stick around long enough to see value.

AI flips this. Instead of starting with nothing, AI can generate something for people to edit. Give people working content from day one. Let them refine, not create from scratch. The difference is immediate engagement versus delayed gratification. People can start using your product right away because there's already something there to work with. They learn by seeing what's possible, by modifying, by doing.

Rethinking Search

Search interfaces traditionally meant keyword boxes, dropdown menus, faceted filters. Want to find something specific? Learn our taxonomy. Understand our categorization scheme. Click through multiple refinement options.

AI models have World knowledge baked in. They understand context. They can translate a natural question into a multi-step query without making people do the work. "Show me action movies from the 90s with high ratings" doesn't need separate dropdowns for genre, decade, and rating threshold. The AI figures out the query structure. It combines the filters. It returns results.

People search in many different ways. AI handles that variety better than rigid UI widgets ever could.

More in World Knowledge Improves AI Apps...

Rethinking Forms

Web forms exist to structure information for databases. Field labels. Input types. Validation rules. Forms force people to fit their information into our predetermined boxes.

But AI works with unstructured input. People can just drop in an image, a PDF file, or a URL. The AI extracts the structured data. It populates the database fields. The machine does the formatting work instead of the human. This shifts the burden from users to systems. People communicate naturally. Software handles the structure.

More in Unstructured Input in AI Apps Instead of Web Forms...

These examples share a common thread: AI capabilities let us reconsider how people interact with software. Not by adding AI features to existing patterns, but by rethinking the patterns themselves based on what AI makes possible. The constraints that shaped our current UX conventions are changing so it's time to start revisiting our solutions.

嵌入式AI应用：现已登陆ChatGPT || Embedded AI Apps: Now in ChatGPT

2025-10-14T00:00:00+00:00

上周我列出了一系列用户体验和技术障碍清单，这些障碍使得在AI模型"内部"使用应用变得颇具挑战。随后OpenAI的ChatGPT应用商店公告承诺将消除大部分障碍。鉴于AI正在深刻改变应用的构建和使用方式，我认为有必要探讨这些挑战以及OpenAI提出的解决方案。

无论你称之为聊天应用、远程MCP服务器应用还是嵌入式应用，当你的AI应用运行在ChatGPT中时，ChatGPT的能力就成为了你应用的能力。ChatGPT能进行网络搜索，你的应用也能；ChatGPT可以连接Salesforce，你的应用同样可以。这些听起来都是构建嵌入式AI应用的绝佳理由，但是...需要权衡取舍。

此前，嵌入式应用必须通过连接远程MCP服务器添加到Claude.ai或ChatGPT（开发者模式下），而输入字段可能深藏在客户设置的数层点击之后。这个过程对大多数人而言就像一堵墙，完全阻碍了应用发现。

为解决这个问题，OpenAI宣布了正式的应用程序提交审核流程和质量标准。没错，就是一个应用商店。通过审核后，安装嵌入式应用将变成一键操作（需通过隐私同意流程），不再需要手动服务器配置。

如果你曾在Claude.ai或ChatGPT等聊天客户端添加过嵌入式应用，使用过程基本以文本交互为主。嵌入式应用无法渲染图像，更不用说用户界面控件了，用户只能阅读和打字。

现在，ChatGPT应用能够渲染"运行在iframe内的React组件"，这不仅支持内嵌图片、地图和视频，还能实现自定义用户界面控件。这些iframe还可以扩展至全屏模式，为应用提供更多专属UI空间，并支持画中画(PIP)模式进行持续会话。

这并不意味着嵌入式应用的发现性问题已完全解决。用户仍需在ChatGPT中通过名称搜索应用，通过"+"按钮访问，或依赖模型判断是否/何时使用特定应用。

运行嵌入式应用的服务器与AI客户端之间的交互也有改进空间。与桌面和移动操作系统不同，ChatGPT（目前）不支持自动后台刷新、服务器发起通知，甚至无法从前端向服务器传递文件（仅能传递上下文）。这些功能对现代应用至关重要，或许支持即将到来。

为任何平台构建应用都需要权衡分发能力与技术特性或限制。ChatGPT每周8亿用户量是个诱人的分发机会，而ChatGPT应用商店已经解决了嵌入式AI应用的许多用户体验和技术问题。

这足以让MCP远程服务器从开发者专属协议转变为真正的软件应用吗？这遵循所有平台迁移的相同模式：先有底层技术，再有使其易用的用户体验层。上周我们确实在这个方向上迈出了重大步伐。

---------------

Last week I had a running list of user experience and technical hurdles that made using applications "within" AI models challenging. Then OpenAI's ChatGPT Apps announcement promised to remove most of them. Given how much AI is changing how apps are built and used, I thought it would be worth talking through these challenges and OpenAI's proposed solutions.

Whether you call it a chat app, a remote MCP server app, or an embedded app, when your AI application runs in ChatGPT, the capabilities of ChatGPT become capabilities of your app. ChatGPT can search the web, so can your app. ChatGPT can connect to Salesforce, so can your app. These all sound like great reasons to build an embedded AI app but... there's tradeoffs.

Embedded apps previously had to be added to Claude.ai or ChatGPT (in developer mode) by connecting a remote MCP server, for which the input field could be several clicks deep into a client's settings. That process turned app discovery into a brick wall for most people.

To address this, OpenAI announced a formal app submission and review process with quality standards. Yes, an app store. Get approved and installing your embedded app becomes a one-click action (pending privacy consent flows). No more manual server configs.

If you were able to add an embedded app to a chat client like Claude.ai or ChatGPT, using it was a mostly text-based affair. Embedded apps could not render images much less so, user interface controls. So people were left reading and typing.

Now, ChatGPT apps are able to render "React components that run inside an iframe" which not only enables inline images, maps, and videos but custom user interface controls as well. These iframes can also run in an expanded full screen mode giving apps more surface area for app-specific UI and in a picture-in-picture (PIP) mode for ongoing sessions.

This doesn't mean that embedded app discoverability problems are solved. People still need to either ask for apps by name in ChatGPT, access them by through the "+" button, or rely on the model's ability to decide if/when to use specific apps.

The back and forth between the server running an embedded app and the AI client also has room for improvement. Unlike desktop and mobile operating systems, ChatGPT doesn't (yet) support automatic background refresh, server-initiated notifications, or even passing files (only context) from the front end to a server. These capabilities are pretty fundamental to modern apps, so perhaps support isn't far away.

The tradeoffs involved in building apps for any platform have always been about distribution and technical capabilities or limitations. 800M weekly ChatGPT users is a compelling distribution opportunity and with ChatGPT Apps, a lot of embedded AI app user experience and technical issues have been addressed.

Is this enough to move MCP remote servers from a developer-only protocol to applications that feel like proper software? It's the same pattern from every platform shift: underlying technology first, then the user experience layer that makes it accessible. And there was definitely big steps forward on that last week.

让AI负责入职培训 || Let the AI do the Onboarding

2025-10-09T21:00:00+00:00

任何设计过软件的人都可能遇到过"空白状态"问题。虽然应用程序能创造有用的内容，但让用户跨越初始创作障碍却非易事。如今借助AI技术，模型可以替人们跨越这道创作鸿沟。

设计电子表格应用时，你需要"新建表格"页面；设计演示工具时，你需要空白状态来承载新演示文稿。文档编辑器、设计工具、项目管理应用...它们都面临同样的难题：当用户面对空白画布时，如何帮助他们迈出第一步？

设计师们尝试过多种解决创作鸿沟的方法，形成了各类常见模式：

引导标记：指导用户可修改内容和操作方式
预设模板：提供现成的起点
功能导览：逐步介绍核心功能
高亮浮层：突出重要界面元素
演示视频：展示应用使用方法

这些方法要求用户先学习后操作。但实际上，大多数人会直接尝试，仅在失败时才回头学习。事实证明，要求用户先阅读手册根本行不通。

而现代AI模型让我们能采取全新方式：AI可以通过实际创作过程向用户演示产品用法，用户只需调整结果即可。AI负责创作，人类负责优化。

新模式将"先学后做"转变为"观察后优化"，让用户从AI生成的雏形开始打磨，而非从零起步。我们不再教用户如何创作，而是直接展示创作过程——换言之，AI承担了（引导）工作。

这在ChatDB中得到完美体现，它能帮助用户即时理解、可视化并分享数据。上传数据后，ChatDB会：

自动创建仪表盘
命名仪表盘
编写描述文案
选择图标与配色
生成初始图表
固定首选图表

整个过程直观展示ChatDB的功能与用法，无需任何引导教程。

仪表盘创建后，编辑标题（点击输入）、更换图标颜色等操作都极其简单。AI提供起点，用户接力完善。亲自试试看。

这种方式将软件从"教导使用"转变为"示范可能"，把传统空白状态问题从"如何帮助开始"转化为"如何帮助优化"。软件通过实际行动而非说明书，向用户展示可能性。

---------------

Anyone that's designed software has likely had to address the "empty state" problem. While an application can create useful stuff, getting users over the initial hurdle of creation is hard. With today's technology, however, AI models can cross the creation chasm so people don't have.

If you're designing a spreadsheet application, you'll need a "new spreadsheet" page. If you're designing a presentation tool, you'll need an empty state for new presentations. Document editors, design tools, project management apps... they all face the same hurdle: how do you help people get started when they're staring at a blank canvas?

Designers have tried to address the creation chasm many times resulting in a bunch of common patterns you'll encounter in any software app you use.

Coach marks that educate people on what they can change and how
Templates that provide pre-built starting points
Tours that walk users through key features
Overlays that highlight important interface elements
Videos that demonstrate how to use an application

These approaches require people to learn first, then act. But in reality most of us just jump right into doing and only fall-back on learning if what we try doesn't work. Shockingly, asking people to read the manual first doesn't work.

But with the current capabilities of AI models, we can do something different. AI can model how to use a product by actually going through the process of creating something and letting people watch. From there, people can just tweak the result to get closer to what they want. AI does the creation, people do the curation.

Rather than learning then doing, people observe then refine. Instead of starting from nothing, they start from something an AI builds for them and making it their own. Instead of teaching people how to create, we show them creation in action. In other words, the AI does the (onboarding) work.

You can see this in action on ChatDB which allows people to instantly understand, visualize, and share data. When you upload a set of data to ChatDB it, will:

make a dashboard for you
name your dashboard
write a description for it
pick an icon and color set for it
make you a series of initial charts
pin one to your dashboard

All this happens in front of your eyes making it clear how ChatDB works and what you can do with it, no onboarding required.

Once your dashboard is made, it's trivial to edit the title (just click and type), change the icon, colors, and more. AI gives you the starting point and you take it from there. Try it out yourself.

With this approach, we can shift from applications that tell people how to use them to applications that show people what they can do by doing it for them. The traditional empty state problem transforms from "how do we help people start?" to "how do we help people refine?" And software shows people what's possible through action rather than instruction.

包装式与嵌入式AI应用 || Wrapper vs. Embedded AI Apps

2025-10-06T22:00:00+00:00

正如我们在PC、网络和移动技术变革中所见，软件应用的构建、使用和分发方式将因AI而改变。目前大多数AI应用采用封装式方案，但越来越多人也开始选择嵌入式方案。让我们来对比这两种模式...

封装式AI应用

封装式应用围绕AI模型构建定制化软件体验。这类应用本质仍是传统程序，只是将AI作为核心任务的主要处理引擎。它们拥有熟悉的用户界面，但功能实现依赖一个或多个AI模型来处理用户输入、信息检索、内容生成等。开发者需要自行构建应用的输入输出系统、集成方案、用户界面，并协调这些模块与AI模型的交互。

优势在于可以完全按照目标用户需求设计界面，并通过全链路用户交互数据持续优化。代价则是需要不断开发维护用户对AI应用日益增长的期待功能：如图像识别输入、网络搜索、生成PDF/Word文档等。

嵌入式AI应用

嵌入式AI应用运行在Claude.ai或ChatGPT等现有AI客户端内。这类应用无需独立开发完整体验，而是直接调用宿主客户端的输入、输出和工具能力，以及持续扩展的集成生态。

具体来说，如果ChatGPT支持将对话内容转为播客，你的应用输出也能一键转播客；而在封装式应用中这个功能需要自行开发。同理，若Claude.ai已集成Google Drive，你的嵌入式应用便天然拥有该能力。ChatGPT能做的深度研究，你的应用也... 你懂的。

代价是什么？首先应用界面受限于宿主客户端规范。例如ChatGPT应用需遵守特定的设计准则和开发要求，与其他应用商店类似。其次应用的分发和使用完全由宿主平台掌控。由于任务上下文由客户端管理，开发者无法利用这些数据优化产品。

双轨并行

就像移动互联网时代原生应用与移动网页的选择... 你可以全都要。原生应用擅长复杂交互，移动网页利于广泛触达。多数软件都能从两者获益。AI应用同样如此。ChatDB就展示了这种双模式及其权衡。这项服务能帮助用户轻松将数据集转化为聊天仪表盘。

用户既可使用独立的ChatDB封装应用，也能嵌入Claude或ChatGPT等AI客户端（如两个演示视频所示）。封装版支持创建图表、固定到仪表盘、自由排版等功能，专注于提供极致的可视化仪表盘制作与分享体验。

嵌入式ChatDB则能调用"搜索浏览"等工具，或通过Linear等数据源集成发现新数据并添加到仪表盘。这些功能在封装版中并不存在——考虑到开发和维护成本，可能永远不会实现。但它们天然存在于Claude.ai和ChatGPT中。

两种模式的能力与限制都将快速演进，今天的权衡可能明天就大不相同。但可以确定的是，AI正在重新定义应用形态、分发方式和使用场景。这意味着所有人都会像PC、网络和移动时代那样重写应用，而这将为设计和开发带来大量创新机遇。

---------------

As we saw during the PC, Web, and mobile technology shifts, how software applications are built, used, and distributed will change with AI. Most AI applications built to date have adopted a wrapper approach but increasingly, being embedded is becoming a viable option too. So let's look at both...

Wrapper AI Apps

Wrapper apps build a custom software experience around an AI model(s). Think of these as traditional applications that use AI as their primary processing engine for most core tasks. These apps have a familiar user interface but their features use one or more AI models for processing user input, information retrieval, generating output, and more. With wrapper apps, you need to build the application's input and output capabilities, integrations, user interface, and how all these pieces and parts work with AI models.

The upside is you can make whatever interface you want, tailored to your specific users' needs and continually improve it through your visibility of all user interactions. The cost is you need to build and maintain the ever-increasing list of capabilities users expect with AI applications like: the ability to use image understanding as input, the ability to search the Web, the ability to create PDFs or Word docs, and much more.

Embedded AI Apps

Embedded AI apps operate within existing AI clients like Claude.ai or ChatGPT. Rather than building a standalone experience, these apps leverage the host client for input, output, and tool capabilities, alongside a constantly growing list of integrations.

To use concrete examples, if ChatGPT lets people to turn their conversation results into a podcast, your app's results can be turned into a podcast. With a wrapper app, you'd be building that capability yourself. Similarly, if Claude.ai has an integration with Google Drive, your app running in Claude.ai has an integration with Google Drive, no need for you to build it. If ChatGPT can do deep research, your app can ... you get the idea.

So what's the price? For starters, your app's interface is limited by what the client you're operating in allows. ChatGPT Apps, for instance, have a set of design guidelines and developer requirements not unlike those found in other app stores. This also means how your app can be found and used is managed by the client you operate in. And since the client manages context throughout any task that involves your app, you lose the ability to see and use that context to improve your product.

Doing Both

Like the choice between native mobile apps and mobile Webs sites during the mobile era... you can do both. Native mobile apps are great for rich interactions and the mobile Web is great for reach. Most software apps benefit from both. Likewise, an AI application can work both ways. ChatDB illustrates this and the trade-offs involved. ChatDB is a service that allows people to easily make chat dashboards from sets of data.

People can use ChatDB as a standalone wrapper app or embedded within their favorite AI client like Claude or ChatGPT (as illustrated in the two embedded videos). The ChatDB wrapper app allows people to make charts, pin them to a dashboard, rearrange them and more. It's a UI and product experience focused solely on making awesome dashboards and sharing them.

The embedded ChatDB experience allows people make use of tools like Search and Browse or integrations with data sources like Linear to find new data and add it to their dashboards. These capabilities don't exist in the ChatDB wrapper app and maybe never will because of the time required to build and maintain them. But they do exist in Claude.ai and ChatGPT.

The capabilities and constraints of both wrapper and embedded AI apps are going to continue evolving quickly. So today's tradeoffs might be pretty different than tomorrow's. But it's clear what an application is, how it's distributed, and used is changing once again with AI. That means everyone will be rewriting their apps like they did for the PC, Web, and Mobile and that's a lot of opportunity for new design and development approaches.

我们（依然）未能充分重视数据的价值 || We're (Still) Not Giving Data Enough Credit

2025-10-02T00:00:00+00:00

以下是Alexei Efros演讲内容的中文简化总结，采用Markdown格式：

我们（依然）低估了数据的重要性

在Sutter Hill Ventures的AI演讲系列中，UC Berkeley的Alexei Efros提出：视觉计算领域的AI进步由数据而非算法驱动。核心观点如下：

数据的核心作用

长期被忽视的真相：
- 学术界长期推崇算法创新，临近研究尾声才匆忙收集数据
- 这种"科学自恋"阻碍了进步——人类总将突破归功于自身智慧而非数据
认知科学的启示：
- 人脑依赖海量经验数据（如能从莫奈模糊笔触中识别蒸汽机，因童年记忆补全了细节）
- "意识本质是数据的涌现属性" ——Lance Williams
关键案例佐证：
- 三篇人脸检测里程碑论文：算法完全不同（神经网络/朴素贝叶斯/级联分类器），性能却相近——真正突破在于引入"非人脸"负样本数据
- Efros团队用200万Flickr图片+最近邻搜索实现图像补全："最笨的方法却有效"
- 相同数据下，复杂神经网络与简单最近邻算法表现相当——解决方案始终藏在数据中

插补≠智能

人类记忆的局限性：
- 对自然图像记忆超强，但对随机纹理几乎无法识别
- 我们只记住"有意义经验流形"上的信息
AI的本质反思：
- 现有人工智能更像是"文化技术"（如印刷术），通过压缩人类知识实现高效交互
- "高维空间中的插补与魔法无异"——但魔力来自数据而非算法
- 视觉/文本空间可能比想象中更小（例如200个PCA主成分就能建模全人类面孔）
发展现状评估：
- 文本数据充足→性能优异；图像数据较少→逐步提升；视频/机器人数据稀缺→进展缓慢
- 当前AI只是"蒸馏机器"，将人类数据压缩为模型

真正智能的未来

核心挑战：
- 需摆脱对人类文明产物的依赖，从原始驱动力（饥饿/嫉妒/快乐）自主演化
- "AI不是计算机会写诗，而是计算机会想要写诗"

关键结论：突破性进展将来自高质量数据集的构建，而非算法层面的微创新

---------------

在Sutter Hill Ventures举办的AI演讲系列中，加州大学伯克利分校的Alexei Efros提出，推动视觉计算领域AI发展的核心是数据而非算法。以下是我对其演讲《我们（依然）未能充分重视数据》的笔记：

海量数据是必要条件但非充分条件。我们必须学会保持谦逊，给予数据应有的认可。视觉计算领域长期存在的算法偏见掩盖了数据的基础性作用，认清这一现实对预判AI突破方向至关重要。

数据的核心地位

学术界直到近年才重视数据，研究者常年钻研算法，最后关头才匆忙收集数据集
这种思维模式长期阻碍着领域发展
AI领域的科学自恋：我们更愿将成就归功于人类智慧而非数据作用
人类理解力高度依赖存储的经验，而不仅是即时感官数据
人们能从莫奈模糊笔触中辨识蒸汽机细节——蒸汽机其实存在于观者脑中，童年经历导致每人看到不同版本
人类大脑能轻松解析高度像素化的画面，自动补全缺失信息
"心智本质上是数据的涌现属性"——Lance Williams
三篇人脸检测里程碑论文用完全不同算法（神经网络、朴素贝叶斯、级联增强）达到了相近效果
真正突破在于意识到需要负样本（不含人脸的图像），但25年后我们仍在称赞花哨算法
Efros团队用200万Flickr图片配合基础最近邻查找实现图像补洞："最笨的方法却奏效了"
相同数据集对比显示，复杂神经网络与简单最近邻方法表现相当
所有解决方案都藏在数据里，精密算法往往只是快速查找器——因为查找本身就包含问题答案

插值与智能

MIT的Aude Oliva实验揭示人类记忆自然图像的惊人能力
但记忆具有选择性：对自然有意义图像识别率极高，对随机纹理则接近随机猜测
我们不具备照相记忆，只记住符合自然经验流形的事物
这表明人类智能高度依赖数据，但聚焦于有意义经验
心理学家Alison Gopnik将AI重构为文化技术：如同印刷机般收集人类知识并优化交互界面
它们并非创造新事物，而是精密的插值系统
"高维空间的插值与魔法无异"，但魔法存在于数据而非算法
或许视觉与文本空间比想象中小，这解释了数据的强大效力
200个人脸PCA主成分就能建模全人类面部，此原理可扩展至像素之外的模型权重线性子空间
初创算法黄金问："该问题是否有足够数据？"文本：数据充足，表现优异；图像：数据较少，逐步提升；视频/机器人：数据稀缺，进展缓慢
现有系统是将人类数据压缩成模型的"蒸馏机器"
真正智能或需从零开始：剥离人类文明痕迹，从饥饿、嫉妒、快乐等原始欲望启动
"AI不是计算机会写诗的时刻，而是计算机渴望写诗的时刻"

---------------

In his AI Speaker Series presentation at Sutter Hill Ventures, UC Berkeley's Alexei Efros argued that data, not algorithms, drives AI progress in visual computing. Here's my notes from his talk: We're (Still) Not Giving Data Enough Credit.

Large data is necessary but not sufficient. We need to learn to be humble and to give the data the credit that it deserves. The visual computing field's algorithmic bias has obscured data's fundamental role. recognizing this reality becomes crucial for evaluating where AI breakthroughs will emerge.

The Role of Data

Data got little respect in academia until recently as researchers spent years on algorithms, then scrambled for datasets at the last minute
This mentality hurt us and stifled progress for a long time.
Scientific Narcissism in AI: we prefer giving credit to human cleverness over data's role
Human understanding relies heavily on stored experience, not just incoming sensory data.
People see detailed steam engines in Monet's blurry brushstrokes, but the steam engine is in your head. Each person sees different versions based on childhood experiences
People easily interpret heavily pixelated footage with brains filling in all the missing pieces
"Mind is largely an emergent property of data" -Lance Williams
Three landmark face detection papers achieved similar performance with completely different algorithms: neural networks. naive Bayes, and boosted cascades
The real breakthrough wasn't algorithmic sophistication. It was realizing we needed negative data (images without faces). But 25 years later, we still credit the fancy algorithm.
Efros's team demonstrated hole-filling in images using 2 million Flickr images with basic nearest-neighbor lookup. "The stupidest thing and it works."
Comparing approaches with identical datasets revealed that fancy neural networks performed similarly to simple nearest neighbors.
All the solution was in the data. Sophisticated algorithms often just perform fast lookup because the lookup contains the problem's solution.

Interpolation vs. Intelligence

MIT's Aude Oliva's experiments reveal extraordinary human capacity for remembering natural images.
But memory works selectively: high recognition rates for natural, meaningful images vs. near-chance performance on random textures.
We don't have photographic memory. We remember things that are somehow on the manifold of natural experience.
This suggests human intelligence is profoundly data-driven, but focused on meaningful experiences.
Psychologist Alison Gopnik reframes AI as cultural technologies. Like printing presses, they collect human knowledge and make it easier to interface with it
They're not creating truly new things, they're sophisticated interpolation systems.
"Interpolation in sufficiently high dimensional space is indistinguishable from magic" but the magic sits in the data, not the algorithms
Perhaps visual and textual spaces are smaller than we imagine, explaining data's effectiveness.
200 faces in PCA could model the whole of humanity's face. Can expand this to linear subspaces of not just pixels, but model weights themselves.
Startup algorithm: "Is there enough data for this problem?" Text: lots of data, excellent performance. Images: less data, getting there. Video/Robotics: harder data, slower progress
Current systems are "distillation machines" compressing human data into models.
True intelligence may require starting from scratch: remove human civilization artifacts and bootstrap from primitive urges: hunger, jealousy, happiness
"AI is not when a computer can write poetry. AI is when the computer will want to write poetry"

播客：现实世界中的生成式AI || Podcast: Generative AI in the Real World

2025-09-30T00:00:00+00:00

最近我有幸在O'Reilly的《现实世界中的生成式AI》播客中与Ben Lorica对话，探讨了AI时代下软件应用的变革。我们讨论了以下主题：

- **应用定义的重构**：从"代码+数据库"模式转向"URL+模型"的新范式  
- **技术演进的规律**：这一转变类似Web和移动端的平台迁移，初期应用看似简陋但会持续进化  
- **AI原生数据库**：专为AI智能体而非人类设计的数据库系统运作方式  
- **开发流程的颠覆**：AI编程助手使团队能先快速构建原型，再完善设计与产品集成  
- **职业影响**：设计师和工程师需要新技能，但同时也获得更多创作机会  
- **人文价值**：审美品味与人类监督在AI系统中的重要性  
- **其他洞见**：更多精彩观点...

完整29分钟访谈《Generative AI in the Real World: Luke Wroblewski谈数据库的智能体语言》可在O'Reilly官网收听。感谢节目组的邀请。

---------------

最近我有幸在O'Reilly的《现实世界中的生成式AI》播客节目中与Ben Lorica畅谈AI时代下软件应用的变革。我们探讨了多个话题，包括：

应用程序定义从"运行代码+数据库"向"URL+模型"的转变
这种转型如何呼应了早期网络和移动平台的发展轨迹——初始应用看似简陋，但随时间推移实现了质的飞跃
专为AI代理而非人类设计的数据库系统运作机制
"逆向"软件开发流程：AI编程助手让团队能先快速构建可行原型，再进行产品化设计与集成
这对设计和工程角色的影响：需要新技能组合，但也创造了更多创作机会
AI系统中审美判断与人类监督的重要性
以及其他洞见...

您可以在O'Reilly官网收听这期29分钟的播客《现实世界中的生成式AI：Luke Wroblewski谈数据库的智能体语言》。感谢节目组的邀请。

---------------

I recently had the pleasure of speaking with Ben Lorica on O'Reilly's Generative AI in the Real World podcast about how software applications are changing in the age of AI. We discussed a number of topics including:

The shift from "running code + database" to "URL + model" as the new definition of an application
How this transition mirrors earlier platform shifts like the web and mobile, where initial applications looked less robust but evolved significantly over time
How a database system designed for AI agents instead of humans operates
The "flipped" software development process where AI coding agents allow teams to build working prototypes rapidly first, then design and integrate them into products
How this impacts design and engineering roles, requiring new skill sets but creating more opportunities for creation
The importance of taste and human oversight in AI systems
And more...

You can listen to the podcast Generative AI in the Real World: Luke Wroblewski on When Databases Talk Agent-Speak (29min) on O-Reilly's site. Thanks to all the folks there for the invitation.

未来产品日：驱动用户行为的隐形力量 || Future Product Days: Hidden Forces Driving User Behavior

2025-09-26T00:00:00+00:00

# 利用行为科学设计用户体验的核心要点

在"未来产品日"的演讲《揭示驱动用户行为的隐藏力量》中，Sarah Thompson分享了如何运用行为科学打造更有效用户体验的见解：

## 人类大脑的永恒特性
* 尽管AI技术呈指数级发展，但人类大脑已有约4万年未经历实质性进化
* 我们仍在为"穴居人大脑"做设计
* 这种不变的人类特性为设计提供了稳定的基础

## 行为科学的重要性
* 在技术快速迭代的今天，行为科学比以往更重要
* 现代工具让我们能以空前速度扩大产品影响

## 决策的神经科学原理
* 所有决策本质都是感性的
* 大脑的感性系统（系统1）会先于意识10秒做出决定
* 系统1思维特点：
  - 快速/自动化
  - 依赖直觉反应（进化遗留）
  - 运用180+种认知偏差简化复杂判断

## 成本收益评估机制
* 每次决策时，感性大脑会即时评估：
  - 成本>收益 → 放弃行动
  - 收益>成本 → 采取行动
* 评估仅关注6个核心维度：

| 维度       | 特征                      | 设计启示                          |
|------------|---------------------------|-----------------------------------|
| **心理**   | "思考是耗能的"            | 减少选择项/采用默认设置/即时理解  |
| **社交**   | "归属需求根深蒂固"        | 营造安全感/避免尴尬或排斥感       |
| **情感**   | "视觉刺激触发最快"        | 通过图像设定情感基调              |
| **体力**   | "天生节约体力"            | 推行刷脸支付/可穿戴设备等便利设计 |
| **物质**   | "稀缺思维模式"            | 运用稀缺话术/提供对等回报         |
| **时间**   | "渴望即时满足"            | 减少等待/强调省时效果             |

## 核心结论
**设计者无法改变穴居人大脑，但可以为其设计。**

---------------

在Future Product Days大会的演讲《揭示驱动用户行为的隐藏力量》中，Sarah Thompson分享了如何运用行为科学打造更高效用户体验的洞见。以下是我的演讲笔记：

虽然AI技术呈指数级发展，但人类大脑已有约4万年未经历重大更新，因此我们仍在为"穴居人大脑"做设计
这种不变的人类特质为设计提供了稳定基础，不会随技术浪潮更迭而改变
行为科学比以往更重要，因为我们现在的工具能实现前所未有的规模化
所有决策都是情绪化的——大脑中负责决策的系统一（情绪部分）会提前10秒激活，早于人们意识到自己已做决定
系统一思维快速、自动，曾帮我们凭直觉生存。如今它仍主导决策，但会使用捷径和180多种已知认知偏差应对复杂性
每次决策时，情绪脑会立即预判行动带来的成本与收益。成本更高？放弃。收益更大？前进
情绪脑只关注六类直觉成本与收益：内在（心理、情感、身体）和外在（社交、物质、时间）
心理成本："思考很费力"进化让我们节省脑力——选项过多时人们会放弃，倾向默认设置。用户能否立即明白该做什么？
社交成本："我们天生需要归属"进化使我们将社交成本视为生死问题。设计让用户感到安全、被关注、有归属吗？还是引发尴尬或被排斥？
情感成本："自动触发"图像是设定情感基调的最快方式。这个设计可能触发什么积极/消极的自动反应？
身体成本："我们本能节省体力"拍付、人脸识别、可穿戴设备带来身体收益。能否消除实际或感知的体力消耗？
物质成本："大脑形成于匮乏环境""Bob三分钟前预订了"等稀缺策略能促发行动。我们在要求用户放弃什么？或给予什么回报？
时间成本："我们渴望即时回报"任何等待都会导致流失。能否提供即时奖励或让用户感觉节省时间？
你无法摆脱穴居人大脑，但可以为它设计。

---------------

In her talk Reveal the Hidden Forces Driving User Behavior at Future Product Days, Sarah Thompson shared insights on how to leverage behavioral science to create more effective user experiences. Here's my notes from her talk:

While AI technology evolves exponentially, the human brain has not had a meaningful update in approximately 40,000 years so we're still designing for the "caveman brain"
This unchanging human element provides a stable foundation for design that doesn't change with every wave of technology
Behavioral science matters more than ever because we now have tools that allow us to scale faster than ever
All decisions are emotional because there is an system one (emotional) part of the brain that makes decisions first. This part of the brain lights up 10 seconds before a person is even aware they made a decision
System 1 thinking is fast, automatic, and helped us survive through gut reactions. It still runs the show today but uses shortcuts and over 180 known cognitive biases to navigate complexity
Every time someone makes a decision, the emotional brain instantly predicts whether there are more costs or gains to taking action. More costs? Don't do it. More gains? Move forward
The emotional brain only cares about six intuitive categories of costs and gains: internal (mental, emotional, physical) and external (social, material, temporal)
Mental: "Thinking is hard" We evolved to conserve mental effort - people drop off with too many choices, stick with defaults. Can the user understand what they need to do immediately?
Social: "We are wired to belong" We evolved to treat social costs as life or death situations. Does this make users feel safe, seen, or part of a group? Or does it raise embarrassment or exclusion?
Emotional: "Automatic triggers" Imagery and visuals are the fastest way to set emotional tone. What automatic trigger (positive or negative) might this design bring up for someone?
Physical: "We're wired to conserve physical effort" Physical gains include tap-to-pay, facial recognition, wearable data collection. Can I remove real or perceived physical effort?
Material: "Our brains evolved in scarcity" Scarcity tactics like "Bob booked this three minutes ago" drive immediate action. Are we asking people to give something up or are we giving them something in return?
Temporal: "We crave immediate rewards" Any time people have to wait, we see drop off. Can we give immediate reward or make people feel like they're saving time?
You can't escape the caveman brain, but you can design for it.

未来产品日：如何用AI解决正确的问题 || Future Product Days: How to solve the right problem with AI

2025-09-26T00:00:00+00:00

以下是整理后的中文摘要，采用Markdown格式：

# 如何用AI解决正确的问题——Dave Crawford演讲笔记

在Future Product Days的演讲中，Dave Crawford分享了如何有效将AI整合到现有产品中并避免常见陷阱的见解：

## 核心观点
* 许多团队接到"加入AI"的指令时，容易陷入"AI万能锤"陷阱——把每个问题都看作AI钉子
* 关键不是"能用AI做什么"，而是"用AI做什么有意义"，应聚焦AI能为用户创造最大价值的场景

## AI交互模式
用户主要通过四种方式接触AI：
1. **发现型AI**：替代搜索功能，帮助发现信息和建立连接
2. **分析型AI**：通过数据分析提供洞察（如医疗影像诊断）
3. **生成型AI**：创造图像、文本、视频等内容
4. **功能型AI**：直接执行任务或与其他服务交互

### 交互上下文谱系（用户负担由高到低）：
- **开放式对话框**（如ChatGPT）：需用户提供全部上下文，负担高
- **侧边栏体验**：掌握部分应用上下文，仍需切换
- **嵌入式**：深度融入用户工作流
- **后台型**：自主运行无需交互

## AI产品开发原则
* **简约思维**：提供明确价值，让用户清楚能获得什么
* **情境思维**：利用现有上下文定制体验
* **全局思维**：从大处着眼，逆向推导
* **挖掘-推理-推断**：充分利用用户提供的信息
* **主动思维**：预测用户需求提前行动
* **责任思维**：考虑环境成本影响
* **价值优先于体验**：核心是交付价值

## 适合AI解决的问题
✓ 用户厌烦的枯燥任务  
✓ 当前外包的复杂活动  
✓ 耗时的冗长流程  
✓ 造成痛苦的挫败体验  
✓ 可自动化的重复工作  

## 重要提醒
× 不要用AI解决已有简单方案的问题  
× 并非所有AI都需要聊天界面，传统UI可能更优  
⚠️ 用户对AI的容错率极低：一次糟糕体验平均需要8个月恢复信任  
🔍 当前重点应是"寻找正确问题"而非"寻找完美解决方案"，应解决真实痛点而非炫技

（注：根据中文阅读习惯对部分案例表述进行了本地化调整，保留了核心术语的英文标注，采用分层缩进结构增强可读性）

---------------

在Future Product Days大会的《如何用AI解决正确的问题》演讲中，Dave Crawford分享了如何有效将AI整合到成熟产品中而不落入常见陷阱的见解。以下是我的演讲笔记：

许多团队都接到过"去给产品添加些AI功能"的指令。对于AI这项技术，很容易陷入"拿着AI锤子看什么都是钉子"的陷阱
我们需要专注于在能为用户带来最大价值的地方使用AI。重点不是我们能拿AI做什么，而是用AI做什么才合理

AI交互模式

人们通常通过四种主要交互类型接触AI
发现型AI：帮助人们查找、连接和学习信息，常替代搜索功能
分析型AI：通过数据分析提供洞察，例如从医学扫描中检测癌症
生成型AI：创建图像、文本、视频等内容
功能型AI：直接执行操作或与其他服务交互，实际完成任务
AI交互模式存在于从高用户负担到低用户负担的上下文谱系中
开放式文本框聊天：用户必须提供所有上下文（如ChatGPT、Copilot）- 用户操作成本高
侧边体验：对应用其他部分有一定上下文感知，但仍需切换上下文
嵌入式：高度情境化的AI，直接出现在用户工作流程中
后台型：自主执行任务而无需用户直接交互的智能体

AI产品开发原则

简单思考：打造有意义且能提供明确价值的产品。用户需要清楚能从你的AI体验中获得什么
情境思考：能否利用现有上下文让体验更相关？在用户工作流中定制体验
大胆思考：AI能力强大，不妨先设想大场景再逐步收敛
挖掘·推理·推断：充分利用用户提供的信息
前瞻思考：在用户开口前，你能为他们做些什么？
责任思考：考虑使用AI对环境和经济成本的影响
我们应优先关注价值交付而非愉悦体验

适合AI解决的问题

用户觉得枯燥乏味的任务
用户当前外包给其他服务的复杂活动
耗时过长的冗长流程
造成用户困扰的挫败体验
可自动化的重复性工作
不要用AI解决已有简单方案的问题
并非所有AI都需要聊天界面。有时传统UI比AI更合适
用户对AI的容忍度和宽容度极低。一次糟糕体验后，用户平均需要8个月才会再次尝试AI产品
我们现在应该寻找正确的问题去解决，而非为问题寻找正确的解决方案。构建真正解决问题的产品，而非仅仅展示AI能力

---------------

In his How to solve the right problem with AI presentation at Future Product Days, Dave Crawford shared insights on how to effectively integrate AI into established products without falling into common traps. Here are my notes from his talk:

Many teams have been given the directive to "go add some AI" to their products. With AI as a technology, it's very easy to fall into the trap of having an AI hammer where every problem looks like an AI nail.
We need to focus on using AI where it's going to give the most value to users. It's not what we can do with AI, it's what makes sense to do with AI.

AI Interaction Patterns

People typically encounter AI through four main interaction types
Discovery AI: Helps people find, connect, and learn information, often taking the place of search
Analytical AI: Analyzes data to provide insights, such as detecting cancer from medical scans
Generative AI: Creates content like images, text, video, and more
Functional AI: Actually gets stuff done by performing actions directly or interacting with other services
AI interaction patterns exist on a context spectrum from high user burden to low user burden
Open Text-Box Chat: Users must provide all context (ChatGPT, Copilot) - high overhead for users
Sidecar Experience: Has some context about what's happening in the rest of the app, but still requires context switching
Embedded: Highly contextual AI that appears directly in the flow of user's work
Background: Agents that perform tasks autonomously without direct user interaction

Principles for AI Product Development

Think Simply: Make something that makes sense and provides clear value. Users need to know what to expect from your AI experience
Think Contextually: Can you make the experience more relevant for people using available context? Customize experiences within the user's workflow
Think Big: AI can do a lot, so start big and work backwards.
Mine, Reason, Infer: Make use of the information people give you.
Think Proactively: What kinds of things can you do for people before they ask?
Think Responsibly: Consider environmental and cost impacts of using AI.
We should focus on delivering value first over delightful experiences

Problems for AI to Solve

Boring tasks that users find tedious
Complex activities users currently offload to other services
Long-winded processes that take too much time
Frustrating experiences that cause user pain
Repetitive tasks that could be automated
Don't solve problems that are already well-solved with simpler solutions
Not all AI needs to be a chat interface. Sometimes traditional UI is better than AI
Users' tolerance and forgiveness of AI is really low. It takes around 8 months for a user to want to try an AI product again after a bad experience
We're now trying to find the right problems to solve rather than finding the right solutions to problems. Build things that solve real problems, not just showcase AI capabilities

未来产品日：AI采纳的鸿沟 || Future Product Days: The AI Adoption Gap

2025-09-25T10:00:00+00:00

# 凯特·莫兰谈AI功能使用率低的三大原因

在哥本哈根"未来产品日"的演讲《AI采用鸿沟：为何优秀功能无人问津》中，NN/g副总裁凯特·莫兰揭示了用户不使用数字产品中AI功能的深层原因：

## 核心洞察
1. **用户研究的黄金法则**  
   - 理解用户的最佳方式：直接交流+观察产品使用行为
   - 大多数用户并不主动寻找或期待AI功能，他们只关注完成任务

2. **AI功能遭冷遇的三大主因**  
   - ❌ 缺乏使用动机（"我为什么要用？"）
   - ❌ 功能不可见（"根本没注意到"）
   - ❌ 使用门槛高（"不知道怎么用"）

3. **企业场景的特殊性**  
   - 信任问题成为额外障碍（但非主要原因）

## 关键发现
- **技术≠价值**：用户只关心结果，而非底层技术。"AI驱动"不是卖点，解决问题才是
- **亚马逊案例**：购物助手"Rufus"虽功能强大（能结合购买历史提供建议），但因命名晦涩和入口隐蔽导致使用率低
- **交互设计误区**：
  - 用户将对话界面当作"智能搜索"，仅输入关键词而非完整上下文
  - 设计师常犯基础错误（命名/图标/呈现方式），这些问题与AI无关

## 专业概念区分
- **可发现性** vs **可寻性**  
  - 可寻性：定位已知目标的能力  
  - 可发现性：意外发现未知功能的可能性

## 成功案例
- **轻量级AI功能**表现最佳（如自动摘要），因其：  
  - 零交互成本  
  - 无缝融入现有工作流

## 重要结论
⚠️ 这些采用障碍并非AI独有，适用于所有新功能。因此传统设计技能在AI时代依然至关重要！

（注：NN/g指Nielsen Norman Group，全球知名用户体验咨询公司，凯特·莫兰现任该公司副总裁）

---------------

在哥本哈根未来产品日大会上的演讲《AI采用鸿沟：为何卓越功能无人问津》中，凯特·莫兰分享了用户为何不使用数字产品中AI功能的洞见。以下是我的演讲笔记：

理解数字产品目标用户的最佳方式，就是与他们交谈并观察他们使用产品的过程。
多数用户并不主动寻找或期待AI功能。他们专注于完成任务，只想尽快解决问题然后离开。
用户不使用AI功能的三大主因：没有使用理由、看不见功能入口、不知如何使用。
企业级应用中还存在信任等问题，但上述三点最为关键。
用户不关心技术原理，只在乎结果。"AI驱动"不是增值，解决用户问题才是。
亚马逊测试购物助手时很受欢迎，因其掌握丰富上下文：历史购买、当前浏览记录等
但用户既找不到这个功能，也不会使用。按钮标注"Rufus"——没人联想到这是购物问答助手
可寻性指定位目标功能的能力，可发现性指偶然遇见非目标功能的能力
在熟悉界面中，用户常错过新功能——尤其是混在众多操作中的单一入口
设计师常犯与AI无关的基础错误（命名、图标、呈现方式）
虽然用户声称对话界面最易用，但实际会将开放文本框当作智能搜索，未能发挥AI全部潜力
用户已习惯在文本框输入简略关键词，而非提供充足上下文让AI生成更优结果
无需交互的小型AI功能（如自动摘要）表现良好，因其无缝融入现有工作流
这些采用挑战并非AI独有，适用于所有新功能。因此现有设计技能对AI功能依然极具价值

---------------

In her The AI Adoption Gap: Why Great Features Go Unused talk at Future Product Days in Copenhagen, Kate Moran shared insights on why users don't utilize AI features in digital products. Here's my notes from her talk:

The best way to understand the people we're creating digital products for is to talk to them and watch them use our products.
Most people are not looking for AI features nor are they expecting them. People are task-focused, they're just trying to get something done and move on.
Top three reasons people don't use AI features: they have no reason to use it, they don't see it, they don't know how to use it.
There are other issues like in enterprise use cases, trust. But these are the main ones.
People don't care about the technology, they care about the outcome. AI-powered is not a value-add. Solving someone's problem is a value-add.
Amazon introduced a shopping assistant that when tested, people really liked because the assistant has a lot of context: what you bought before, what you are looking at now, and more
However, people could not find this feature and did not know how to use it. The button is labeled "Rufus" people don't associate this with something that helps them get answers about their shopping.
Findability is how well you can locate something you are looking for. Discoverability is finding something you weren't looking for.
In interfaces that people use a lot (are familiar with), they often miss new features especially when they are introduced with a single action among many others
Designers are making basic mistakes that don't have anything to do with AI (naming, icons, presentation)
People say conversational interfaces are the easiest to use but it's not true. Open text fields feel like search, so people treat them like smarter search instead of using the full capability of AI systems
People have gotten used to using succinct keywords in text fields instead of providing lots of context to AI models that produce better outcomes
Smaller-scope AI features like automatic summaries that require no user interaction perform well because they integrate seamlessly into existing workflows
These adoption challenges are not exclusive to AI but apply to any new feature, As a result, all your existing design skills remain highly valuable for AI features.

未来产品日：产品创作者的未来 || Future Product Days: Future of Product Creators

2025-09-25T09:00:00+00:00

以下是Tobias Ahlin在哥本哈根"未来产品日"演讲《产品创造者的未来》的核心观点总结（Markdown格式）：

### 核心论点
AI系统实现有效产出的关键缺失因素并非单纯能力，而是**分歧观点与辩论机制**。

---

### 现状观察
- **2025年趋势预测**：自主AI工作流将成为产品开发日常，AI在标准化测试（阅读/写作/数学/编程等）已超越人类
- **百名实习生困境**：AI个体虽更聪明，但集体缺乏方向性

---

### 当前AI系统局限性
1. **基础逻辑缺陷**  
   - 例：能计算石头剪刀布概率，却无法理解后出手的固有劣势
2. **现实应用失误**  
   - 建议用有毒胶水粘披萨、推荐吃石头补矿物质等致命错误
3. **性能天花板**  
   - 人类能持续进步，AI在初期成功后即陷入平台期
4. **基准测试 vs 现实表现**  
   - Monitor研究显示：63%AI生成代码测试失败，0%无需人工干预即可运行

---

### 推理的社会性本质
- **辩论优于孤立思考**：法庭对抗制证明冲突能优化结论
- **团队张力创造价值**  
  - 设计师（拓展性）vs 开发者（效率）vs 产品经理（平衡）的天然冲突驱动创新
- **AI创造力悖论**  
  - 康奈尔研究：GPT-4创意能力超越90.6%人类，其想法进入TOP10%概率是人类的7倍
  - **新瓶颈**：人类评估/整合海量创意的能力成为天花板

---

### 未来AI代理发展方向
1. **从生产到评估**  
   - 当前AI侧重生产，未来需同等重视评估与整合
2. **制度化否定机制**  
   - 建立类似学术同行评审的争议驱动系统
3. **对抗性代理循环**  
   - 设计相互质疑的AI工作流（如：生成代理 vs 批判代理），突破性能平台期
4. **超越线性思维**  
   - 真正的推理将来自设计性对抗循环，而非简单链式思考

---------------

在哥本哈根未来产品日大会的演讲《产品创造者的未来》中，托比亚斯·阿林提出：要实现人工智能系统的有效产出，除了原始能力外，分歧观点与辩论才是当前缺失的关键要素。以下是我的演讲笔记：

许多人描绘的未来图景中，并行智能体将按需创造产品和功能
2025年标志着代理工作流成为日常产品开发的一部分。AI代理在标准化测试中量化超越人类：阅读、写作、数学、编程乃至专业领域
但我们面临"百名实习生问题"：管理这些个体更聪明却"毫无方向感"的代理

现有系统的局限性

基础推理缺陷：AI模型存在根本性推理漏洞。例如能计算石头剪刀布概率，却无法理解后出手的固有劣势
现实应用中的致命错误：建议用有毒胶水粘披萨、推荐吃石头补充矿物质
性能停滞问题：与人类持续进步不同，AI代理初期成功后便进入平台期，即使给予更多时间也无法实质性突破
现实表现与基准测试差距：Monitor研究显示63%的AI生成代码测试失败，0%能在无人干预下直接运行

推理的社会属性

真正的推理本质上是社会性功能，"为辩论交流优化，而非孤立思考"
司法体系就是例证：对抗性论证通过冲突相互磨砺提升
当经过批判性审视系统结构化时，个体偏见可以相互补充
团队天然产生利益冲突：设计师追求丰富性，开发者倾向效率，产品经理平衡范围——这种张力催生更好结果
AI在创造力测试中显著超越人类。康奈尔研究中GPT-4在创意生成上优于90.6%的人类，其创意进入前10%的可能性是人类的七倍
创意生成成本正趋近于零，但人类评估与整合这些创意的能力仍是瓶颈

AI代理的未来

当前代理主要辅助生产，但未来生产力需要同等投入评估与整合环节
制度化证伪：建立类似科学同行评审的争议驱动澄清机制
设计循环争议代理：一个代理编写代码，另一个评估代码，形成能突破性能瓶颈的反馈系统
真正的推理将来自设计为循环争议的代理，而非简单的思维链方法

---------------

In his talk The Future of Product Creators at Future Product Days in Copenhagen, Tobias Ahlin argued that divergent opinions and debate, not just raw capability, are the missing factors for achieving useful outcomes from AI systems. Here are my notes from his presentation:

Many people are exposing a future vision where parallel agents creating products and features on demand.
2025 marked the year when agentic workflows became part of daily product development. AI agents quantifiably outperform humans on standardized tests: reading, writing, math, coding, and even specialized fields.
Yet we face the 100 interns problem: managing agents that are individually smarter but "have no idea where they're going"

Limitations of Current Systems

Fundamental reasoning gaps: AI models have fundamental reasoning gaps. For example, AI can calculate rock-paper-scissors odds while failing to understand it has a built-in disadvantage by going second.
Fatal mistakes in real-world applications: suggesting toxic glue for pizza, recommending eating rocks for minerals.
Performance plateau problem: Unlike humans who improve with sustained effort, AI agents plateau after initial success and cannot meaningfully progress even with more time
Real-world vs. benchmark performance: Research from Monitor shows 63% of AI-generated code fails tests, with 0% working without human intervention

Social Nature of Reasoning

True reasoning is fundamentally a social function, "optimized for debate and communication, not thinking in isolation"
Court systems exemplify this: adversarial arguments sharpen and improve each other through conflict
Individual biases can complement each other when structured through critical scrutiny systems
Teams naturally create conflicting interests: designers want to do more, developers prefer efficiency, PMs balance scope.This tension drives better outcomes
AI significantly outperforms humans in creativity tests. In a Cornell study, GPT-4 performed better than 90.6% of humans in idea generation, with AI ideas being seven times more likely to rank in the top 10%
So the cost of generating ideas is moving towards zero but human capability remains capped by our ability to evaluate and synthesize those ideas

Future of AI Agents

Current agents primarily help with production but future productivity requires and equal amount of effort in evaluation and synthesis.
Institutionalized disconfirmation: creating systems where disagreement drives clarity, similar to scientific peer review
Agents designed to disagree in loops: one agent produces code, another evaluates it, creating feedback systems that can overcome performance plateaus
True reasoning will come from agents that are designed to disagree in loops rather than simple chain-of-thought approaches

定义聊天应用 || Defining Chat Apps

2025-09-16T00:00:00+00:00

# AI时代的新应用形态：聊天应用（Chat Apps）的崛起

随着每次技术平台的更迭，软件应用的定义都会发生巨变。尽管我们仍处于AI技术转型期，但已能观察到一类新兴AI应用——我们称之为"聊天应用"——正逐渐形成独特形态。

## 平台决定应用形态
应用的本质特征由其运行环境决定：
- **PC时代**：本地编译的二进制软件，依赖键盘/鼠标输入、显示器输出，使用本地计算/存储资源
- **AI时代**：应用的核心是AI模型，模型能力即应用能力（如语言翻译/PDF生成）

## 什么是聊天应用？
核心特征：
1. **以AI模型为计算引擎**：模型能力（语言理解/图像识别/网络搜索等）直接转化为应用功能
2. **运行在模型平台内**：如在Claude/ChatGPT中运行，类似PC程序在Windows运行
3. **技术实现**：可能由多个AI模型+提示词+工具组合而成，但对用户呈现为统一界面

## 聊天应用的运作原理
当前实现方式（以Claude为例）：
1. **部署架构**：通过远程Model Context Protocol（MCP）服务器连接
2. **能力继承**：
   - 输入：继承模型的自然语言/图像理解能力
   - 输出：继承模型的PDF生成/数据写入等能力
   - 处理：直接调用模型的各类工具链

## 应用开发者的新角色
当AI模型已提供核心能力时，开发者主要贡献：
- **专用数据库**：存储应用特定数据（如音乐会信息）
- **动态指令集**：指导AI模型如何操作这些数据

## 实例：音乐会追踪应用
1. **数据层**：通过AgentDB创建音乐会数据库
2. **功能实现**：
   - 上传海报截图→利用Claude图像解析
   - 生成日历视图→调用Claude编程能力
   - 查询演出信息→启用Claude网络搜索
3. **用户体验**：所有功能通过自然语言交互完成

## 未来展望
当前开发流程（MCP链接配置）尚不直观，但预计将很快进化到：
- 类似移动应用商店的简易体验
- 可能引发聊天应用生态爆发式增长

该摘要采用Markdown格式，使用标题层级清晰呈现内容结构，重点部分使用加粗强调，技术术语保持英文原词并附中文说明，通过项目符号列表提高可读性，完整保留了原文的核心观点和技术细节。

---------------

每当技术平台发生更迭，软件应用的定义就会发生翻天覆地的变化。尽管我们仍处于AI转型的浪潮中，但已显现出某些新兴特性，这些特性正在塑造至少一部分AI应用——我们暂且称之为聊天应用——的未来形态。

从宏观来看，应用程序的特性由其所处的发现与运行系统决定。这框定了它们能调用的功能、主要输入输出方式等。听起来有些抽象，让我们具体说明：PC时代的应用程序是编译后的二进制文件，以塑封软件形式出售，使用本地计算与存储资源，显示器作为输出设备，鼠标键盘作为输入设备。

这些能力不仅定义了它们的界面（图形用户界面），也定义了其功能边界。 AI时代诞生的应用同样如此。它们的发现方式和运行环境也将定义其本质。这一点在"聊天应用"上体现得尤为明显。

那么什么是聊天应用？最关键的是，聊天应用的计算引擎是AI模型，这意味着模型的所有能力都会转化为应用的能力。如果模型能进行语言翻译，应用就能翻译；如果模型可生成PDF文件，应用就能生成。值得注意的是，"模型"可能是多个AI模型（带路由功能）、提示词和工具的组合体。但对终端用户而言，它呈现为ChatGPT或Claude这样的单一实体。

当应用运行在Claude或ChatGPT中时，就像运行在操作系统里（PC时代的Windows，移动时代的iOS）。那么如何在Claude这类AI模型中运行聊天应用？具体会发生什么？目前可以通过"连接器"形式将应用添加到Claude，通常作为远程模型上下文协议（MCP）服务器运行。这个过程需要填写一些表单对话框，但设置完成后，Claude就能代表添加者使用该应用。

如前所述，Claude的能力现在就是应用的能力。Claude能接收自然语言输入，应用也可以；当用户向Claude上传图片时它能理解内容，应用同样具备该能力；Claude能进行网络搜索浏览，应用亦然。输出方面也是如此：Claude能将信息转为PDF，应用就能转换；Claude可将信息添加至Salesforce，应用也能做到。道理显而易见。

那么应用还需要做什么？如果AI模型已提供输入、输出和处理能力，聊天应用的职责是什么？最简单的形式下，聊天应用可以是一个存储输入输出及处理所需信息的数据库，外加一组指导AI模型如何使用的动态指令集。

具体案例总能说明问题。假设我要开发一个追踪演唱会的聊天应用。通过AgentDB这类服务，我可以从现有的演唱会追踪文件开始，或让模型创建新文件。这样就建立了一个由演唱会信息数据库支持的远程MCP服务器，以及持续指导AI模型如何使用的动态模板。

当我把这个远程MCP链接添加到Claude后，就能：利用Claude的图像解析能力上传待追踪演唱会的截图；运用Claude的编码能力生成所有演唱会的日历视图；通过Claude的网络搜索工具查找演出附加信息等等。Claude"模型"的所有这些能力，都与我的数据库加模板（即我的聊天应用）协同工作。

你可以立即使用AgentDB创建远程MCP链接，将其作为连接器添加到Claude来打造自己的聊天应用。目前这个过程还不够直观，但在不远的将来，其简便程度很可能会媲美使用手机应用商店。到那时，聊天应用或将呈现爆发式增长。

---------------

With each new technology platform shift, what defines a software application changes dramatically. Even though we're still in the midst of the AI shift, there's emergent properties that seem to be shaping what at least a subset of AI applications, let's call them chat apps, might look like going forward.

At a high level, applications are defined by the systems they're discovered and operated in. This frames what capabilities they can utilize, their primary inputs, outputs, and more. That sounds abstract so let's make it concrete. Applications during the PC era were compiled binaries sold as shrink-wrapped software that used local compute and storage, monitors as output, and the mouse and keyboard as input.

These capabilities defined not only their interfaces (GUI) but their abilities as well. The same is true for applications born of the AI era. How they're discovered and where they operate will also define them. And that's particularly true of "chat apps".

So what's a chat app? Most importantly a chat app's compute engine is an AI model which means all the capabilities of the model also become capabilities of the app. If the model can translate from one language to another, the app can. If a model can generate PDF files, the app can. It's worth noting that "model" could be a combination of AI models (with routing), prompts and tools. But to an end user, it would appear as a singular entity like ChatGPT or Claude.

When an application runs in Claude or ChatGPT, it's like running in an OS (windows during the PC era, iOS during the mobile era). So how do you run a chat app in an AI model like Claude and what happens when you do? Today an application can be added to Claude as a "connector" probably running as a remote Model Context Protocol (MCP) server. The process involves some clicking through forms and dialog boxes but once setup, Claude has the ability to use the application on behalf of the person that added it.

As mentioned above, the capabilities of Claude are now the capabilities of the app. Claude can accept natural language as input, so can the app. When people upload an image to Claude, it understands its content, so does the app. Claude can search and browse the Web, so can the app. The same is true for output. If Claude can turn information into a PDF, so can the app. If Claude can add information to Salesforce, so can the app. You get the idea.

So what's left for the application to do? If the AI model provides input, output, and processing capabilities, what does a chat app do? In it's simplest form a chat app can be a database that stores the information the app uses for input, output, and processing and a set of dynamic instructions for the AI model on how to use it.

As always, a tangible example makes this clear. Let's say I want to make a chat app for tracking the concerts I'm attending. Using a service like AgentDB, I can start with an existing file of concerts I'm tracking or ask a model to create one. I then have a remote MCP server backed by a database of concert information and a dynamic template that continually instructs an AI model on to use it.

When I add that remote MCP link to Claude, I can: upload screenshots of upcoming concerts to track using Claude's image parsing ability); generate a calendar view of all my concerts (using Claude's coding ability); find additional information about an upcoming show (using Claude's Web search tools); and so on. All of these capabilities of the Claude "model" work with my database plus template, aka my chat app.

You can make your own chat apps instantly by using AgentDB to create a remote MCP link and adding it to Claude as a connector. It's not a very intuitive process today but will very likely feel as simple as using a mobile app store in the not too distant future. At which point, chat apps will probably proliferate.

让机器学习 || Letting the Machines Learn

2025-09-12T00:00:00+00:00

以下是对原文的中文简化总结（Markdown格式）：

# 关于AI与知识产权的思考：为什么我不担心AI"窃取"我的作品？

每次我演讲AI产品设计时，总被问到AI与知识产权的问题："难道不担心AI模型'偷走'你的作品吗？"我的回答始终如一：

## 人类与AI的学习本质相同
- 过去30年我撰写了3本书和2000多篇数字产品设计文章  
- 但同期我吸收的知识呈指数级更多：无数书籍、推文、产品使用经验  
- 我无法追溯每个观点的具体来源——知识的融合发生在不可分解的层面  
- **AI只是在做人类相同的事**：阅读、学习、综合，只是规模更大

## 机器为何不能学习？
- 我的书籍付费出售，文章免费公开  
- 如果允许人类设计师从我的作品中学习，为何要禁止AI学习？  
- 当AI产出受我作品影响的内容，这与人类读者应用我的理念有何本质区别？

## 关于"不公平获利"的回应
- 支付20美元/月，就能获得一个：  
  ✓ 阅读量远超人类的即时助手  
  ✓ 能处理从代码审查到战略分析的各种任务  
- 同等价格甚至买不到两小时初级人力协助  
- 从集体知识库（包含我的微小贡献）中获得的收益，远超作为训练数据的"损失"

## 最终观点
- 我的思想能成为数十亿人使用的知识库的一部分，反而令我感到荣幸  
- **让机器像人类一样学习吧**——训练有素的AI带给我的价值，远超过我的贡献

这个总结保留了原文的核心论点，采用分层结构突出了逻辑链条，并使用Markdown的列表和强调格式提升可读性。关键数据（30年/3本书/2000篇文章）和比喻（20美元的价值对比）等具象论据都得到了保留。

---------------

每次我进行关于AI产品设计的演讲时，总会被问到AI与知识产权的问题。具体来说：难道你不担心AI模型"窃取"你的成果吗？我的回答始终是：如果我要指控AI模型剽窃，那我也得同样指控自己。让我解释一下...

过去30年里，我撰写了三本书和两千多篇关于数字产品设计与战略的文章。但在这同样的30年里？我吸收的知识量呈指数级增长。数不清的书籍、文章、推文。成千上万次对话。使用过的产品，分析过的解决方案。所有这些塑造了我的认知和写作方式。

如果你要我追溯下一句话的来源，准确归因那些促成特定表述的影响因素，我做不到。这种知识融合发生在无法完全解构的层面。

AI模型在做和我们相同的事。阅读、观察、学习、整合。唯一的区别是规模。它们处理的信息量远超任何人类。当它们生成文本时，正是在调用这些积累的知识。听起来很熟悉吧？

所以当AI模型产出受我文章影响的内容时，这与读过我书并应用其中原则的设计师有何不同？我出版书籍就是供人购买学习。我的文章？任何人都可以免费阅读。为什么机器就该被排除在这种学习机会之外？

"但AI公司利用你的内容训练模型不是不公平获利吗？"

每月支付20美元，我从AI模型公司获得了一位助手——它阅读量远超我毕生所能，即时响应，能从代码审查到战略分析提供全方位帮助。同样的20美元连两小时初级人力协助都买不到。

这些模型基于数百万贡献者（包括我微不足道的付出）的集体知识训练，带给我的收益远超内容作为训练数据可能造成的假设性损失。事实上，想到自己的思想能成为数十亿人使用的知识库的一部分，我深感谦卑。

所以就让机器像人类一样学习吧。对我而言，训练有素的AI模型提供的价值，远超过我的贡献投入。

---------------

Every time I present on AI product design, I'm asked about AI and intellectual property. Specifically: aren't you worried about AI models "stealing" your work? I always answer that if I accused AI models of theft, I'd have to accuse myself as well. Let me explain…

I've spent 30 years writing three books and over two thousand articles on digital product design and strategy. But during those same 30 years? I've consumed exponentially more. Countless books, articles, tweets. Thousands of conversations. Products I've used, solutions I've analyzed. All of it shaped what I know and how I write.

If you asked me to trace the next sentence I type back to its sources, to properly attribute the influences that led to those specific words, I couldn't do it. The synthesis happens at a level I can't fully decompose.

AI models are doing what we do. Reading, viewing, learning, synthesizing. The only difference is scale. They process vastly more information than any human could. When they generate text, they're drawing from that accumulated knowledge. Sound familiar?

So when an AI model produces something influenced by my writings, how is that different from a designer who read my book and applies those principles? I put my books out there for people to buy and learn from. My articles? Free for anyone to read. Why should machines be excluded from that learning opportunity?

"But won't AI companies unfairly profit from training on your content?"

From AI model companies, for $20 per month, I get an assistant that's read more than I ever could, available instantly, capable of helping with everything from code reviews to strategic analysis. That same $20 couldn't buy me two hours of entry-level human assistance.

The benefit I receive from these models, trained on the collective knowledge of millions of contributors, including my microscopic contribution, dwarfs any hypothetical loss from my content being training data. In fact, I'm humbled that my thoughts could even be part of a knowledge base used by billions of people.

So let machines learn, just like humans do. For me, the value I get back from well-trained AI models far exceeds what my contribution puts in.

AI应用中的非结构化输入替代网页表单 || Unstructured Input in AI Apps Instead of Web Forms

2025-09-09T00:00:00+00:00

# AI如何终结传统网页表单时代

## 表单的本质困境
- **核心作用**：网页表单是将人类信息输入数据库的桥梁
- **设计矛盾**：表单字段和格式规则本质是让人适应数据库结构
- **17年未解的痛点**：自2006年《网页表单设计》出版时"表单很糟糕"的论断至今依然成立

## AI带来的范式革命
- **颠覆性改变**：AI允许人类以任意形式提供信息
- **技术实现**：
  - 动态上下文系统替代强制格式的表单
  - 如AgentDB的模板系统指导AI读写数据库
  - 支持多媒体输入（图片/PDF/音频/视频/网页截图等）

## 实际应用场景
1. **演唱会信息录入示例**：
   - 用户上传Instagram截图
   - AI自动提取「演出名称/日期/场馆/城市/时间/票价」等结构化数据
   - 信息缺失时可主动询问或智能补全

2. **全媒介支持**：
   - 照片/网页链接/Word/PDF/语音等均可作为输入源
   - AI+AgentDB自动完成数据库填充

## 设计哲学进化
- **终极形态**："无表单"成为最好的表单设计
- **权力反转**：从"人适应机器"变为"机器理解人"

该摘要采用Markdown格式，通过层级标题组织内容，突出技术变革的关键点，并使用列表形式清晰展示AI解决方案的优势和实现方式。最后以设计哲学收尾，呼应开头提出的历史性问题。

---------------

网页表单的存在是为了将人们的信息存入数据库。在线表单中的输入字段和格式规则旨在确保信息符合数据库所需的结构。但在支持AI的应用中，非结构化输入意味着机器可以代替人类完成这项工作。

17年前，我写了一本关于网页表单设计的书，开篇就是"表单糟透了"。快进到今天，这种情绪依然成立。没人喜欢填写表单，但表单仍然无处不在，因为它们强迫人们按照应用程序数据库存储信息的方式提供信息。你懂的：名、姓、地址第二行、州缩写，诸如此类。

使用网页表单时，适应数据库的负担落在人们身上。而当今的AI模型可以颠覆这一要求。也就是说，它们允许人们以任何喜欢的形式提供信息，并利用AI完成必要工作，将这些信息转化为适合数据库的正确结构。

这是如何实现的？动态上下文系统可以替代强制要求数据库输入规范的网页表单。实现方式之一是通过AgentDB的模板系统，它为AI模型提供读取和写入数据库信息的指令。

当AgentDB连接到AI模型（通过MCP服务器），人们只需说"添加这个"并提供图片、PDF、音频、视频等任何内容。模型将使用AgentDB的模板决定从非结构化输入中提取哪些信息，以及如何为数据库格式化这些信息。如果缺少或不完整，模型可以要求澄清或使用工具（如搜索）寻找可能的答案。

在上面的例子中，我上传了一张Instagram上宣布音乐会的截图，并要求AI模型将其添加到我的音乐会追踪器中。AgentDB模板告诉模型每个数据库条目需要演出名称、日期、场馆、城市、时间和票价。因此AI模型从非结构化输入（截图）中提取这些信息，如果完整，就将其转化为数据库所需的结构化格式。

当然，非结构化输入也可以是照片、网页链接、Word文档、PDF文件，甚至只是你说出想要添加内容的音频。在每种情况下，AI模型和AgentDB的组合都将为你填充数据库。

无需网页表单。而无表单正是最好的网页表单设计。

---------------

Web forms exist to put information from people into databases. The input fields and formatting rules in online forms are there to make sure the information fits the structure a database needs. But unstructured input in AI-enabled applications means machines, instead of humans, can do this work.

17 years ago, I wrote a book on Web Form Design that started with "Forms suck." Fast forward to today and the sentiment still holds true. No one likes filling in forms but forms remain ubiquitous because they force people to provide information in the way it's stored within the database of an application. You know the drill: First Name, Last Name, Address Line 2, State abbreviation, and so on.

With Web forms, the burden is on people to adapt to databases. Today's AI models, however, can flip this requirement. That is, they allow people to provide information in whatever form they like and use AI do the work necessary to put that information into the right structure for a database.

How does this work? Instead of a Web form enforcing the database's input requirements a dynamic context system can handle it. One way of doing this is with AgentDB's templating system, which provides instructions to AI models for reading and writing information to a database.

With AgentDB connected to an AI model (via an MCP server), a person can simply say "add this" and provide an image, PDF, audio, video, you name it. The model will use AgentDB's template to decide what information to extract from this unstructured input and how to format it for the database. In the case where something is missing or incomplete, the model can ask for clarification or use tools (like search) to find possible answers.

In the example above, I upload a screenshot from Instagram announcing a concert and ask the AI model to add it to my concert tracker. The AgentDB template tells the model it needs Show, Date, Venue, City, Time, and Ticket Price for each database entry. So the AI model pulls this information from the unstructured input (screenshot) and, if complete, turns it into the structured format a database needs.

Of course, the unstructured input can also be a photo, a link to a Web page, a Word document, a PDF file, or even just audio where you say what you want to add. In each case the combination of AI model and AgentDB will fill in the database for you.

No Web form required. And no form is the best kind of Web Form Design.

世界知识提升AI应用 || World Knowledge Improves AI Apps

2025-09-03T00:00:00+00:00

# 基于大模型的应用开发范式变革

## 核心优势
- **零代码调用模型能力**：应用可直接继承底层AI模型的所有内置能力（世界知识/多语言支持/图像解析等），开发者无需额外编码
- **数据逻辑融合**：模型参数权重中编码的信息自动成为应用能力的一部分，突破传统「代码+数据库」的局限

## 典型案例：NBA数据智能应用
1. **传统开发模式**：
   - 需手动编写前端UI和查询逻辑
   - 功能受限于预设的搜索过滤条件（日期/球队/球员等）
   - 无法处理非常规查询（如「奥本山宫殿事件最后5个回合」需特殊字段或代码）

2. **AI驱动模式**：
   - 结合AgentDB数据库与Claude模型
   - 自然语言直接查询：模型自动将「蚁人选秀届最高球员」分解为多步查询
     - 识别「蚁人」指代球员
     - 确定选秀年份
     - 比对同届球员身高
   - 利用模型的世界知识处理非结构化查询

## 能力扩展
除世界知识外，大模型还内置：
- 🌐 多语言支持
- 👁️ 图像识别
- 🛠️ 工具调用
- 更多持续涌现的新能力

> 视频教程：[如何用AgentDB构建AI应用](视频链接)

该摘要采用Markdown格式，通过对比论证突出AI应用的范式突破，使用emoji图标增强可读性，并保留关键示例和技术术语的英文原名（如AgentDB/Claude）确保专业性。

---------------

基于大规模AI模型构建的应用程序，能够直接利用模型内置的能力，无需开发者编写额外代码。本质上，只要AI模型能实现的功能，构建其上的应用同样可以实现。为了说明这一点，让我们看看模型的世界知识对应用的影响。

多年来，软件应用由运行代码和数据库构成。因此，它们的能力受限于编码功能和数据库内容。但当运行代码被大语言模型（LLM）取代时，模型权重中编码的信息即刻成为应用程序能力的一部分。

在AI应用中，终端用户不再受限于开发者有时间或远见编写的代码。AI模型中的所有世界知识（及其他能力）都成为应用逻辑的一部分。这听起来有些抽象，让我们看一个具体例子。

我通过AgentDB创建了一个AI应用，上传了涵盖77年NBA历史的1360万条比赛记录数据库。当我将AgentDB生成的MCP链接添加到Anthropic的Claude模型时，就形成了一个由AI优化数据库和作为"应用大脑"的AI模型（Claude）组成的应用。这里有一个视频教程教你如何操作。

过去，开发者需要编写代码来为这类数据库构建前端用户界面。这些代码决定了用户能获得哪些问题的答案。通常这意味着需要大量UI输入元素来按日期、球队、球员等搜索筛选比赛。NBA数据页面（下图）就是这类界面的典型代表。

但无论开发者编写多少代码，都无法涵盖人们可能想了解NBA77年历史的所有方式。例如"‘奥本山宫殿事件’比赛的最后5个回合是什么？"这样的问题，要么需要编写能将事件名称转换为具体日期和比赛的代码，要么需要在数据库中额外添加比赛别名字段。

当大语言模型作为应用的计算核心时，就无需编写额外代码。"奥本山宫殿事件"与2004年11月19日的关联已存在于AI模型权重中，它能将自然语言问题转换为关联数据库可回答的形式。

AI模型可以利用其世界知识，将人们的问题转化为回答看似简单问题所需的多步查询。比如下面这个例子："蚁人同届选秀中身高最高的球员是谁？"我们需要确定蚁人指代哪位球员、他的选秀年份、同期新秀名单、获取所有球员身高并进行比较。手动编写这样的查询并不简单，但当AI作为应用大脑时...一切变得快速而轻松。

当然，世界知识并非大语言模型唯一的内置能力。多语言支持、图像解析的视觉能力、工具使用等新兴功能，当你在AI模型上构建应用时，这些都将成为应用程序的天然能力。

---------------

Applications built on top of large-scale AI models benefit from the AI model's built-in capabilities without requiring app developers to write additional code. Essentially if the AI model can do it, an application built on top of it can do it as well. To illustrate, let's look at the impact of a model's World knowledge on an app.

For years, software applications consisted of running code and a database. As a result, their capabilities were defined by coded features and what was inside the database. When the running code is replaced by a large language model (LLM), however, the information encoded in model's weights instantly becomes part of the capabilities of the application.

With AI apps, end users are no longer constrained by the code developers had the time and foresight to write. All the World knowledge (and other capabilities) in an AI model are now part of the application's logic. Since that sounds abstract let's look at a concrete example.

I created an AI app with AgentDB by uploading a database of NBA statistics spanning 77 years and 13.6 million play-by-play records. When I add the MCP link AgentDB makes for me to Anthropic's Claude, I have an application consisting of a database optimized for AI model use, and an AI model (Claude) to use as the application's brain. Here's a video tutorial on how to do this yourself.

In the past a developer would need to write code to render the user interface for an application front-end to this database. That code would determine what kind of questions people could get answers to. Usually this meant a bunch of UI input elements to search and filter games by date, team, player, etc. The NBA's stats page (below) is a great example of this kind of interface.

But no matter how much code developers write, they can't cover all the ways people might want to interact with information about the NBA's 77 years. For instance, a question like "What were the last 5 plays in the Malice in the Palace game?" requires either running code that can translate malice in the palace to a specific date and game or an extra field in the database for game nicknames.

When a large language model is an application's compute, however, no extra code needs to be written. The association between Malice in the Palace and November 19, 2004 is present in an AI model's weights and it can translate the natural language question into a form the associated database can answer.

An AI model can use its World knowledge to translate people's questions into the kind of multi-step queries needed to answer what seem like simple questions. Consider the example below of: "Who was the tallest player drafted in Ant-Man’s NBA draft class?" We need to figure what player Ant-Man refers to, what year he was drafted, who else was drafted then, get all their heights, and then compare them. Not a simple query to write by hand but with AI acting as an application's brain... it's quick and easy.

World knowledge, of course, isn't the only capability built-in to large-language models. There's multi-language support, vision (for image parsing), tool use, and more emerging. All of these are also application capabilities when you build apps on top of AI models.

聊天：未来还是糟糕的用户界面 || Chat is: the Future or a Terrible UI

2025-08-28T00:00:00+00:00

# AI聊天界面的两面性：未来交互革命还是糟糕设计？

随着AI对话界面在软件中的爆发式增长，业界逐渐形成两种对立观点：**聊天即所有UI的未来** vs **聊天是糟糕的UI**。这两种观点其实都有其合理性。

## 历史印证：聊天功能的必然性
2013年有人提出移动应用发展的"聊天包容定律"：
> 每个移动应用都会扩张直至包含聊天功能。不具备该功能的应用终将被替代。

如今从社交网络、电商到打车软件，所有主流应用都内置了聊天功能。这种普及性使其成为AI时代理想的交互入口。

## 支持观点：聊天即未来
* **零学习成本**：用户已熟悉聊天界面，可快速上手AI系统
* **意图直译**：空白输入框能精准捕捉用户需求（如Google搜索）
* **自然交互**：无需学习UI规则，像真实世界一样用语言沟通
* **无形界面**：符合"最佳界面是无界面"的设计哲学
* **动态组合**：对话式界面可灵活切换话题，满足个性化需求
* **语音解放**：通过语音输入实现免提交互
* **范式转移**：从人适应计算机转变为计算机理解人类语言

## 反对观点：聊天的缺陷
* **隐形UI困境**：缺乏可视操作提示，用户难以预知功能边界
* **信息呈现短板**：纯文本难以有效展示复杂数据（如图表/表格）
* **历史追溯困难**：长对话中定位关键信息效率低下
* **上下文丢失**：多轮对话易造成信息断层，文字输入效率低
* **视觉表达局限**：语言难以描述空间、时间和视觉关系
* **语音交互缺陷**：语音比图形界面更不适合传递视觉信息
* **早期阶段局限**：AI交互形式仍处发展初期，将有更多创新模式涌现

## 未来展望
值得注意的是：
1. 聊天并非AI集成的唯一方式
2. 基于智能体(Agent)的应用已开始超越纯聊天方案
3. 交互形式将持续进化

这场关于交互范式的辩论远未结束，技术创新将不断重塑人机交互的边界。

---------------

随着AI驱动的聊天界面在软件中日益普及，人们逐渐分化为两个阵营：聊天是未来所有用户界面的终极形态，抑或聊天是一种糟糕的交互方式。事实证明这两种观点都有其合理性，以下是详细分析。

早在2013年，我曾提出对Jamie Zawinski著名"软件吞噬定律"的改编版本：

每个移动应用都会不断扩张直至包含聊天功能。那些不具备此功能的应用终将被具备该功能的应用取代。

如今每个主流移动应用都配备了某种形式的聊天功能，无论是社交网络、电子商务还是拼车服务等。因此聊天已无处不在且为人熟知，这使其成为引领AI时代的理想界面。但这真的是AI的终极形态吗？

「聊天是软件的未来」

人们已熟悉聊天界面操作，这种认知惯性让用户能快速上手强大的AI系统
空白文本框极擅长捕捉用户意图：人们可以直接告知聊天应用需求。"看看谷歌就知道"
自然语言允许人们像现实世界那样表达需求，无需学习特定界面操作
最佳界面就是...没有界面，隐形界面等等
对话式界面能灵活切换话题和目标，为特定需求提供精准的信息与操作组合方式
语音输入意味着用户无需打字，仍能与强大系统简单对话
AI模型的聊天界面实现了根本性转变：从人类学习计算机语言，转为计算机理解人类语言

「聊天是糟糕的用户界面」

聊天界面面临经典的"隐形UI"问题：缺乏明确功能暗示，用户既不知道能做什么，也不清楚如何获得最佳结果
与图像、表格、图表等相比，文字墙在传达和展示复杂信息关系时效率低下
在冗长的对话记录中滚动查找关键信息体验极差
来回互动中上下文极易丢失，拖慢效率。逐字输入所有需求繁琐不堪
语言在描述视觉、空间和时间关系时表现拙劣
基于语音的界面更难传达本适合图像和UI展示的信息
AI驱动软件尚处早期发展阶段，未来必将涌现各种更优的交互界面

值得注意的是，聊天并非整合AI的唯一方式，越来越多的智能代理应用正超越纯聊天方案。因此变革仍将持续。

---------------

As the proliferation of AI-powered chat interfaces in software continues, people increasingly take one of two sides. Chat is the future of all UI or chat is a terrible UI. Turns out there's reason to believe both, here's a bunch of them.

Back in 2013, I proposed a variant of Jamie Zawinski's popular Law of Software Envelopment reframed as:

Every mobile app attempts to expand until it includes chat. Those applications which do not are replaced by ones which can.

Today every major mobile app has some form of chat function whether social network, e-commerce, ride-share, and so on. So chat is already pervasive and thereby familiar, which made it a great interface to usher in the age of AI. But is it AI's final form?

“Chat is the future of software.”

People already know how to use chat interfaces. This familiarity means people can jump right in and start using powerful AI systems.
An empty text box is great at capturing user intent: people can simply tell chat apps what they want to get done. “Just look at Google.”
Natural language allows people to communicate what they want like they would in the real World, no need to learn a UI.
The best interface is… no interface, an invisible interface, etc.
Conversational interfaces can shift topics and goals, providing a way to compose information and actions that’s just right for specific. needs.
Voice input means people don’t have to type but can still simply chat with powerful systems.
Chat user interfaces for AI models are a fundamental shift from forcing humans to learn computers to computers understanding human language.

“Chat is a terrible user interface.”

Chat interfaces face the classic "invisible UI" problem: without clear affordances, people don't know what they can do, nor how to get the best results from them.
Walls of text are suboptimal to communicate and display complex information and relationships unlike images, tables, charts, ad more.
Scrolling through conversation threads to find and extract relevant information is painful, especially as chat conversations run long.
Context gets lost in back and forth interactions which slow everything down. Typing everything you want to do is cumbersome.
Language is a terrible way to describe visual, spatial, and temporal things.
Voice-based interfaces make it even harder to communicate information better suited to images and user interfaces.
We’re very early in the evolution of AI-powered software and lots of different and useful interfaces for interacting with AI will emerge.

It's also worth noting that chat isn't the only way to integrate AI in software products and increasingly agent-based applications outperform chat-only solutions. So expect things to keep changing.

平台变迁重塑应用生态 || Platform Shifts Redefine Apps

2025-08-27T00:00:00+00:00

# 技术平台变迁下的应用形态演进

每当主流技术平台发生更迭时，人们总会低估"应用程序本质及其构建方式"的变革程度。从大型机到PC、互联网、移动互联网，再到如今的AI时代，计算平台的变迁不断重新定义软件，并为应用设计和开发创造新的机遇与限制。这些变革不仅影响应用功能，更重塑了其运行环境、交互形态、构建方式和用户体验。

## 各时代技术平台特征

**大型机时代**  
- 应用运行在恒温机房里的巨型共享计算机上  
- 用户通过终端输入纯文本指令  
- 所有计算智能集中于远端，用户仅获得文本反馈  

**PC时代**  
- 软件成为可购买的实体产品（软盘/CD安装）  
- 计算能力转移到个人桌面设备  
- 图形界面取代黑屏绿字的命令行  

**互联网时代**  
- 通过URL在浏览器中访问应用  
- 从安装版软件转变为自动更新的服务  
- 基于跨平台Web标准的UI组件  

**移动时代**  
- 应用进化为应用商店下载的垂直场景App  
- 为触控设计，具备位置/方向感知能力  
- 整合摄像头/GPS/传感器实现环境交互  

**AI时代**  
- 应用呈现为自然语言对话形式  
- AI模型理解意图并动态生成解决方案  
- 无需预设场景即可适应上下文  

## 技术转型期的典型现象

1. **路径依赖现象**  
企业倾向于简单移植旧平台模式（如早期网站是电子版宣传册，移动App是移植版网站），如同早期电视节目只是对着广播节目架设摄像机。

2. **认知滞后性**  
平台转型初期，应用形态的终局往往不清晰，需要时间沉淀新范式。多数企业最终会像应对Web/移动转型那样重建应用架构。

3. **创新机遇窗口**  
率先拥抱新平台特性的企业能建立先发优势，这正是技术变革期创业公司激增的根本原因——变化即机遇。

> 特别说明：虽然AI应用与大型机应用都存在"远程处理+本地交互"特征，但本质差异在于：大型机需要精确语法输入并返回预设结果，而AI能理解自然语言并动态生成解决方案，具备问题推理和按需构建UI的能力。

---------------

每次重大技术平台变革时，人们总会低估"应用是什么以及如何构建"的根本性改变。从大型机到个人电脑，再到互联网、移动端直至如今的AI时代，计算平台的变迁不断重新定义软件，并为应用设计和开发创造新的机遇与限制。

这些转变不仅影响了应用程序的运行方式，还改变了它们的运行环境、呈现形态、构建流程、交付方式以及用户体验。

大型机时代： 应用程序运行在恒温机房里的巨型共享计算机上，人们通过终端输入纯文本命令——这些终端本质上只是连接远方大脑的窗口。所有智能都存在于别处，你只能得到文字反馈。

PC时代： 软件变成装在盒子里的实体产品，通过软盘或光盘安装，完全运行在本地机器上。突然间计算能力就摆在你的办公桌下，应用程序可以使用丰富的图形界面，而不再是黑屏上的绿色文字。

互联网时代： 应用程序迁移到通过URL访问的浏览器中，从需要安装的软件转变为自动更新的服务。不再有版本号或安装向导，只需输入地址就能使用基于跨平台网页标准UI组件构建的最新版本。

移动时代： 应用程序浓缩成从精选商店下载的专注特定任务的APP，为手指而非鼠标设计，并能感知位置和方向。计算变成口袋里的存在，可以通过摄像头、GPS和设备传感器利用周围环境。

AI时代： 应用程序不再是屏幕和按钮的集合，而是与AI模型的对话——它能理解意图、执行复杂任务、适应上下文，无需为每个场景编写明确程序。而这仅仅是个开始。

虽然AI应用听起来很像旧式大型机应用，但后者需要精确语法且返回预设响应。AI应用能理解自然语言并实时生成解决方案，它们不仅处理命令，还会推理问题并按需构建用户界面。

每次平台变革时，企业的反应如出一辙：他们试图直接移植原有应用模式，而不深入思考如何拥抱差异。早期网站只是电子版海报和手册，早期移动APP是移植版网站，就像早期电视节目不过是加了摄像机的广播节目。

但在技术平台变革初期，应用程序将如何演变并不明朗。新形态需要时间发展。就像互联网和移动时代那样，多数企业最终都将重建应用。那些早期拥抱新能力和构建模式的企业能抢占先机实现增长，这正是技术变革总伴随创业浪潮的原因——变革即机遇。

---------------

With each major technology platform shift, people underestimate how much "what an application is and how it's built" changes. From mainframes to PCs, to Web, to Mobile and now AI, computing platform changes redefined software and created new opportunities and constraints for application design and development.

These shifts not only impacted how applications work but also where they run, what they look like, how they're built, delivered, and experienced by people.

Mainframe era: Applications lived on massive shared computers in climate-controlled rooms, with people typing text-only commands into terminals that were basically windows into a distant brain. All the intelligence sat somewhere else, and you just got text back.

PC era: Software became physical products you'd buy in boxes, install from floppy disks or CDs, and run entirely on your own machine. Suddenly computing power lived under your desk, and applications could use rich graphical interfaces instead of just green text on black screens.

Web era: Applications moved into browsers accessed through URLs, shifting from installed software to services that updated automatically. No more version numbers or install wizards, just type an address and you're using the latest version built out of cross-platform Web standards UI components.

Mobile era: Applications shrank into task-focused apps downloaded from curated stores, designed for fingers not mice, and aware of your location and orientation. Computing became something in your pocket that could make use of the environment around you through cameras, GPS, and on-device sensors.

AI era: Instead of screens and buttons, applications are conversations where AI models understand intent, execute complex tasks, and adapt to context without explicit programming for every scenario. And we're just getting started.

While it's true that AI applications sound a lot like the mainframe applications of old, those apps required exact syntax and returned predetermined responses. AI applications understand natural language and generate solutions on the fly. They don't just process commands, they reason through problems and build UI as needed.

During each of these platform shifts, companies react the same way. They attempt to port the application models they had without thinking through and embracing what's different. Early Web site were posters and brochures. Early mobile apps were ported Websites. Just like early TV shows were just radio shows with cameras pointed at them.

But at the start of a technology platform shift, how applications will change isn't clear. It takes time for new forms to develop. As they do most companies will end up rebuilding their apps like they did for the Web, mobile, and more. Companies that embrace new capabilities and modes of building early on can gain a foothold and grow. That's why technology shifts are accompanied by a surge of new start-ups. Change is opportunity.

解决机器人技术的五条路径 || Five Paths to Solving Robotics

2025-08-22T00:00:00+00:00

以下是Ted Xiao演讲内容的简体中文Markdown格式总结：

# Google DeepMind机器人技术五大世界观

在Sutter Hill Ventures的AI演讲系列中，Google DeepMind的Ted Xiao阐述了实现普及化实用机器人技术的五种世界观，并分享了其团队将Gemini等前沿模型直接集成到机器人系统中的工作。

## 当前机器人领域的独特时刻
机器人技术正处于没有明确共识的发展阶段，与其他AI领域快速收敛的技术路线不同，目前存在多条显示初步成功迹象的并行路径。

## 五大技术世界观

### 1. 行业守成派
核心观点：
- 通用机器人是错误的追求方向
- 专用解决方案已在实际应用（如工业自动化）
- 成功机器人终将成为"工具"

发展路径：基于数十年控制理论和硬件经验，针对具体用例直接优化

### 2. 人形机器人派
核心观点：
- 硬件是主要瓶颈
- 平台稳定后性能可快速提升（参考无人机发展史）
- 人形优势：适配人类环境/可利用人类数据

### 3. 机器人基础模型派
核心观点：
- 通用性不可妥协
- 关键挑战：建立"机器人数据互联网"
  
发展路径：
- 垂直路径：单领域突破后扩展
- 水平路径：先实现机器人界的"GPT-2时刻"

### 4. 苦涩教训派
核心观点：
- 前沿模型是处理互联网规模数据的唯一实证
- 机器人发展滞后前沿模型约两年
- 必须将这些"神奇产物"纳入探索过程

### 5. AGI激进派
核心观点：
- 直接攻克AGI后由其解决机器人问题
- 柏拉图表示假说：AI跨域能力提升将导致表征收敛
- 完美语言理解可能包含物理理解

## Gemini机器人实践
Ted团队采用"苦涩教训"路径，将机器人能力直接构建进Gemini模型：

### 两大能力突破
1. **具身推理增强**：
   - 2D杂乱场景理解
   - 3D深度/方向感知
   - 精确指向控制
   - 抓取角度计算

2. **底层控制学习**：
   - 50Hz控制频率
   - 250ms端到端延迟

### 三大技术飞跃
- **交互性**：动态场景响应（追踪移动物体/适应人为干扰）
- **灵巧性**：柔性物体操作（叠衣服/缠耳机线/系鞋带）
- **泛化性**：应对视觉分布变化/语义差异/空间尺寸调整

实际会议部署中，面对全新环境（人群/光照/桌面）仍能保持合理行为，展现出类似GPT-2的通用应对能力。

## 潜在颠覆性范式
- **视频世界模型**：通过动作条件视频生成学习物理
- **无机器人数据**：仿真或头戴摄像机采集
- **思维模型**：前沿模型的推理能力迁移
- **运动-操作统一**：强化学习运动+基础模型操作

## 行业现状展望
当前技术路线的多样性不是弱点，而是机器人史上最激动人心的特征——各路径均有合理论据和早期成果，最终胜出者尚未可知。

---------------

在Sutter Hill Ventures举办的AI演讲系列中，谷歌DeepMind的Ted Xiao阐述了关于如何实现实用、普及型机器人技术的五种世界观，并深入探讨了他的团队将Gemini等前沿模型直接集成到机器人系统中的工作。以下是我对他演讲的笔记：

我们正处在机器人技术的一个独特时刻，目前对前进路径尚未达成共识。与其他AI突破不同，那里的方法迅速统一，而机器人技术仍保持开放状态，多条合理路径都显示出初步成功的迹象。Ted提出了五种世界观，每种都有聪明的研究者和建设者坚定地追求：

行业守成派

这些研究者认为通用机器人是错误的追求目标。专用解决方案如今已经切实可行——从工业自动化到我们不再称之为机器人的家电。当机器人技术成功时，我们只称它们为工具。前进路径是：利用数十年的控制理论和硬件专业知识，直接针对特定用例进行优化。

人形机器人公司

这些研究者视硬件为主要瓶颈。一旦平台稳定，研究者就能充分发挥性能提取能力——无人机从脆弱的研究原型发展为消费产品，四足机器人成为稳健的商业平台。人形形态很重要，因为世界是为人类建造的，类人机器人能更好地利用互联网规模的人类数据。

机器人基础模型初创

这些研究者聚焦机器人数据和算法作为关键。通用性不容妥协——变革性技术本质上是通用的。核心挑战在于：要么纵向构建"机器人数据互联网"（完全解决一个领域后扩展），要么横向实现（先达成机器人技术的GPT-2时刻，再改进）。

苦涩教训信徒

这些研究者认为前沿模型是唯一证明能以人类水平建模互联网规模数据的技术存在。不将这些"神奇造物"纳入探索过程就无法解决机器人问题。前沿模型趋势和算力领先机器人技术约两年。

AGI激进派

这些研究者采取最激进立场：直接解决AGI然后让它解决机器人问题。柏拉图表征假说认为，随着AI模型在各领域进步，其内部表征会趋同。完美的语言理解可能天然包含物理理解。

Gemini机器人系统

Ted在谷歌DeepMind的团队采用苦涩教训路径，将机器人能力直接构建进Gemini，而非将前沿模型视为黑箱。

他们的Gemini机器人系统首先增强了具身推理能力——通过杂乱场景中的2D边界框、包含深度和方向的3D理解、精确定位指向以及抓取角度来提升模型对物理世界的理解。系统随后以50Hz控制频率和0.25秒端到端延迟学习多样化机器人动作的低级控制。这实现了三大突破：

交互性：机器人能响应动态场景，追踪移动物体并适应人为干扰
灵巧性：超越刚性物体，可折叠衣物、缠绕耳机线、操纵鞋带
泛化能力：应对视觉分布变化（新光照、干扰物）、语义变异（错别字、不同语言）和空间改变（需不同策略的不同尺寸物体）

当在完全陌生的会议场景（人群、不同照明、新桌子）中部署时，系统对任意用户请求保持合理行为，展现出类似GPT-2时刻的火花——无论输入如何都尝试合理应对。

黑马与新范式

几种新兴范式可能彻底颠覆现有方法
视频世界模型：通过动作条件视频生成无机器人参与学习物理
无机器人数据：来自模拟或佩戴头戴相机的人类
思维模型：将前沿模型推理能力应用于机器人技术
运动-操作统一：将基于强化学习的运动与基础模型操作结合

哪种路径会胜出尚无定论。每种方法都有合理论据和早期成功迹象。这种分歧不是弱点——正因如此，此刻才成为机器人史上最激动人心的时代。

---------------

In his AI Speaker Series presentation at Sutter Hill Ventures, Google DeepMind's Ted Xiao outlined five worldviews on how to achieve useful, ubiquitous robotics and dug into his team's work integrating frontier models like Gemini directly into robotic systems. Here' my notes from his talk:

We're at a unique moment in robotics where there's no consensus on the path forward. Unlike other AI breakthroughs where approaches quickly consolidated, robotics remains wide open with multiple reasonable paths showing early signs of success. Ted presented five worldviews, each with smart researchers and builders pursuing them with conviction:

Industry Incumbent

These researchers believe general-purpose robotics is the wrong goal. Purpose-built solutions actually work today - from industrial automation to appliances we don't even call robots anymore. When robotics succeeds, we just call them tools. The path forward: directly optimize for specific use cases using decades of control theory and hardware expertise.

Humanoid Company

These researchers see hardware as the primary bottleneck. Once platforms stabilize, researchers excel at extracting performance - drones went from fragile research prototypes to consumer products, quadrupeds became robust commercial platforms. Humanoid form factors matter because the world is built for humans, and human-like robots can better leverage internet-scale human data.

Robot Foundation Model Startup

These researchers focuses on robot data and algorithms as the key. Generality is non-negotiable - transformative technologies are general by nature. The core challenge: building an "internet of robotics data" either vertically (solve one domain completely, then expand) or horizontally (achieve robotics' GPT-2 moment first, then improve).

Bitter Lesson Believer

These researchers argue frontier models are the only existence proof of technology that can model internet-scale data with human-level performance. You can't solve robotics without incorporating these "magical artifacts" into the exploration process. Frontier model trends and compute lead robotics by about two years.

AGI Bro

These researchers take the most radical position: just solve AGI and ask it to solve robotics. The Platonic Representation Hypothesis suggests that as AI models improve across domains, their internal representations converge. Perfect language understanding might inherently include physical understanding.

Gemini Robotics

Ted's team at Google DeepMind pursued the Bitter Lesson approach, building robotics capabilities directly into Gemini rather than treating frontier models as black boxes.

Their Gemini Robotics system first enhanced embodied reasoning - teaching the model to understand the physical world better through 2D bounding boxes in cluttered scenes, 3D understanding with depth and orientation, pointing for granular precision, and grasp angles for manipulation. The system then learned low-level control with diverse robot actions, operating at 50Hz control frequency with quarter-second end-to-end latency. This unlocked three key advances:

Interactivity: The robot responds to dynamic scenes, following objects as they move and adapting to human interference
Dexterity: Beyond rigid objects, it can fold clothes, wrap headphone wires, and manipulate shoelaces
Generalization: Handles visual distribution shifts (new lighting, distractors), semantic variations (typos, different languages), and spatial changes (different sized objects requiring different strategies)

When deployed at a conference with completely novel conditions - crowds, different lighting, new table - the system maintained reasonable behavior for arbitrary user requests, showing sparks of that GPT-2 moment where it attempts something sensible regardless of input.

Dark Horses and Emerging Paradigms

Several emerging paradigms could completely upend current approaches.
Video World Models learning physics without robots through action-conditioned video generation
Robot-Free Data from simulation or humans with head-mounted cameras
Thinking Models applying frontier models' reasoning capabilities to robotics
Locomotion-Manipulation Unity bridging RL-based locomotion with foundation model manipulation

There's no consensus on which path will win. Each approach has reasonable arguments and early signs of success. The lack of agreement isn't a weakness - it's what makes this the most exciting time in robotics history.