银河美术馆 Gallery of Galaxy

Agents Need “State” Instead of “Memory”

2025-05-23T16:00:00.000Z

正如我在此文中所提及，当我们设计 agent 的时候，我们最终想要的是一个无所不能的超级函数：给定输入，获得输出，中间产生一些副作用。

但作为程序员我们都知道，函数还是一个比较低级的抽象，在业务场景下表达力不够强。这其中的一大原因是：函数自己是不维护状态的，它只能通过产生副作用来改变外部状态。为了解决这个问题，现代软件工程有两种解法：一是 continuation 或称 coroutine，二是 object 或称 OOP。后者显然由于其更贴近业务建模而更流行。

那么我们怎么把 agent 从“超级函数”升级为“超级object”？熟悉大模型应用的朋友们或许已经想到了，那就是给 agent 加上 memory。

使用 memory 一词的时候我们不妨想想，到底有什么是值得记忆的？比如，作为终端用户来说，我希望我的 chatbot 能记住我们之间每一次对话的每个细节，这是一种 memory。但是如果我的 agent 要处理一个特定的任务，它一轮一轮地调用 tools，它用 memory 记住每一个 tool 的每一个具体输出，这有价值吗？

由此看来，tools 的输出对于 agent 来说更像是 log，而 agent 真正需要的答案藏在 log 当中，那些将影响 agent 下一轮行动的信息，才是 agent 内在需要记忆的信息。

因此相比于 memory，我们应该换一个词称呼它：state。agent 在每一轮运行过程中，它首先根据自己的 state 做出决策、调用工具，然后根据 tool output 更新自己所处的 state，再进入下一轮运行。就像一个寻路机器人要知道自己在迷宫中的位置及先前探索过的路线，然后据此做出下一步移动。

而 memory 一词所描述的，指的大概是将 agent “所观察到的所有的历史”，完整地视为 agent “所处的 state”。当我们讨论在开放世界中活动的 AGI agent 时，这确实是最 general 的理解。

但如果我们想要一个解决问题的“工业机器人” agent，这种 memory 既不符合我们欣赏的软件哲学，在技术上也值得质疑。“工业机器人” agent 需要设计的是 state。

©️

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Tricks For Packup

2025-01-21T16:00:00.000Z

A. GitHub Actions

A.01 - Clone Private Repo

1. Create a Personal Access Token (PAT):

Go to your GitHub settings (https://github.com/settings/tokens).

2. Add the PAT as a GitHub Secret:

In your repository, go to "Settings" -> "Secrets" -> "Actions."

Give it a name (e.g., GITHUB_TOKEN).

3. Use the PAT in your GitHub Actions workflow:

Method A: Using actions/checkout with a token:

Method B: Using git clone directly:

A.02 - Generate Docker Image Tags

Each building have a latest tag and SHA tag.

Only git tag triggered building have semver tags.

B. Docker Build

B.01 - pip install cache

This command will mount a Docker-managed cache storage to the default pip cached folder

©️

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Roles of Member Var in An Object

2024-11-24T16:00:00.000Z

Here is a typical class definition to show different roles of member variables:

Config

Config is relatively simple. It is usually primitive values, like strings, numbers, booleans, etc. Or is is a composition of primitive values, like a JSON style object.

It is read-only and not changeable at runtime. Usually set up in the constructor.

Since it is read-only, it's safe to share across objects, like passing it to the constructor of other objects.

And since it is primitive, we can use it without worrying about memory management. Just pass it around and let GC to do its job.

Borrowed resources

Borrowed resources are usually shared across objects. They may be initialized globally, and shared with all objects. Or may be initialized in some object and borrowed by other objects.

When an object borrows a resource from another object, it does not own it. It does not need to call close() to destroy it.

However the object need to make sure the resource is available when it needs to use it, That means its lifecycle must be within the owner's lifecycle. The owner object must be alive when the borrower object is alive, and the borrower object must be destroyed before the owner object is destroyed.

Owned resources

As previously described, owning a resource means the object is responsible for its lifecycle. It needs to call close() to destroy it.

States

The word "state" can contain all informations that make up the object. However, we define it here to be a subset of owned resources. It is owned by the object and not shared with others.

The state only serves the object itself. It is frequently modified by the object's methods. In contrast, the configs and resources are usually set in the constructor, and not modified afterwards.

Although we may call a mutable method of a resource, it is changing the state of the resource, not the state of the object itself.

You may notice that the state of a owned resource is also a kind of state of the object itself. But we still separate them for clarity. It is a "sub-state".

Children objects

Children objects can also be considered as a kind of owned resources or states. The parent object is responsible for its lifecycle.

A child object can have its own states, and as mentioned above, they are sub-states of the parent object.

A child object can own its resources, or borrow resources from the parent object. That is intuitive, because its lifecycle is managed by the parent object, so it is naturally within the parent's lifecycle.

©️

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Only Two Methods Needed for Agents -- LLM as a "magic function" & LLM do "function calling"

2024-04-26T16:00:00.000Z

Generally speaking, an agent means the AI model that can interact with the environment.

But let's look into a little deeper. As I used to seen on Twitter, there were two experiments when ChatGPT was newly introduced:

You are playing like a terminal of a computer. When I type commands, you will give me the response.

I'm playing like a terminal of a computer. When you type commands, I will give you the response.

In practice, I find these two ways are basically all an agent needs to work. Let's take a look at the first one -- in a programming way:

The prompt is:

Then we write a (pesudo) code:

The main idea here is that we treat the LLM as a magic function. It can solve a problem with some unknown magic steps. We just need to give it the input and get the output.

You can find the similar idea and tools in LlamaIndex output parser.

Now let's start from the functions. A function is basically composed of three parts:

Input: you give arguments to the function.

Output: the function gives you the result.

Side Effects: the function may change the environment.

Side effect means something other than output will be produced, during the function call. For example, a function may change the value of a global variable, or print something to the screen. This is an important concept for all programming languages except for functional programming languages.

In the context of AI agent, one of the most important side effects may be from hardwares, which can create physical changes in the real world, resulting in the thing we call "robot".

However, since LLM is simply predicting the next token from the input, it doesn't have any side effects. If we want to build an agent which can interact with the environment, we need to use some methods to let the "magic function" to create side effects. OpenAI introduced the API "function calling" to achieve this goal.

Now let's define a function which can create side effects:

Now let's update the prompt:

The (pesudo) code:

It is still a magic function, but now the code make use of the function calling API to call the function send_haiku_to_twitter, so that it indirectly creates a side effect: Twitter's server recieved a new post.

You can find the tools in LlamaIndex function tools.

Now the magic function have the ability to call other functions, the purpose can be more than the indirect side effects.

It can also use the output of the function call, that can help the agent to gather more information, or to make decisions.

For example, the agent can call a function to get the current weather of the city, and then use the weather information to generate the haiku.

Modify the prompt:

The (pesudo) code:

Now we get a simple agent. It is a magic function without input and output, but it can call other functions to make the side effect to send a haiku to Twitter.

Following the concept of "magic function" and "function calling", we can find a suprising fact: a magic function can call another magic function.

This fact enables unlimited possibilities for the agent, because in this way a agent can command other agents, and again, the other agents can command more agents.

The unlimit controling structure is as powerful as the computer program itself -- do anything if the programmer can write the code.

Let's go one step further to the idea of "magic function".

In the programming world, one of the most powerful ablitity of a function is: Recursion.

If a magic function can call itself, what will happen? In theorey, it can do anything. It can be a general magic function which can solve any task, or so called: AGI.

Let's have a try!

Very promising, isn't it? But in practice, it just not work. The recursion is a powerful tool, but also dangerous. It can easily lead to infinite loops, and the AI model may not be able to handle it well.

The AI just dives in deeper and deeper, to find every definition of every word, never to return to solve the task.

This is a little disappointing, but not suprising. The professionalism and the universality can not be achieved at the same time by a simple recursion, at least for now.

Recently I'm working for NebulaGraph's GenAI team, to explore the possibility of AI together with graph database.

In this article I explained the idea of using only "magic function" and "function calling" to build any agent, which is what I learned from our team's work.

Code in this article is pesudo code, and you can use LlamaIndex to implement the idea in practice. Also some of my code is in AgentPath

Welcome to follow our further progress. You can find me on Twitter, Telegram, and GitHub with links in https://yanli.one

©️

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

关于异常处理的总结和思考

2023-11-14T16:00:00.000Z

最近聊到了关于异常处理的问题。最初是 yihong 在频道里分享了 Frost Ming 的文章，我也转发到了自己频道，于是引起了许多朋友的讨论。正好之前在字节的工作中一大痛点也是异常处理的问题，因此也总结一下讨论和我的看法。

两种不同的范式

💡

本节摘自 Frost Ming 的文章

两种风格的错误处理

醉后不知天在水，满船清梦压星河

https://frostming.com/error-handling/

以 Python 为例，抛出异常的方式是：

处理异常的方式是：

以 Go 为例，返回错误的方式是：

处理错误的方式是：

看上去它们完成的事情差不多，但如果我们去掉错误处理的代码，不管它，会变成这样：

Python:

Go:

两者造成的结果截然不同，Python 会上报异常，Go 会忽略错误。这代表了两种不同的哲学，前者若不处理错误即上报异常，让上层处理，而后者若不处理错误，则继续执行。

关于 back out （返回错误）的观点

支持 back out

对开发者来说足够明确

来自 Frost Ming

用 Go 语言的时候，你一看它返回了一个 err，脑子就永远有根弦，要么必须写 if err != nil，要么主动用 _ 忽略掉错误，采用任何一种方式，就算是再粗心的程序员，都清晰地知道自己在做什么，反而更有利于及时的处理错误。写 Go 的时候感觉自己一直在 if err != nil 正是因为每一个错误都被兜住了，不会漏掉。

反对 back out

无法处理的 exception 要层层 return

来自 Frost Ming

尴尬的是，不是所有错误在本函数中都能处理，对于无法处理的错误，只能把错误返回给上层，而上层也不一定能处理，于是就一直 return。

来自 t.me/kazurin

每次返回的时候都需要携带一个上下文信息帮助定位错误，否则由于没有 stack trace，错误分析工作会变得非常麻烦

对于非致命错误，go 语言比较通用的做法是在每一层返回的时候用 errors.Wrap() 给 err 附加一层上下文信息，例如 errors.Wrap(err, "failed to open config file")；rust 的做法是用 anyhow 库的 result.context() 扩展方法，或者自己创建一个用来包装的 enum，与 throw 时带上栈信息的目的是差不多的，实质上都是牺牲了程序员的便利来换取性能。

打断链式调用

来自 t.me/yeningxue

我个人的痛点其实就是文章里讲的，有些错误是需要在上层处理的，层层返回实在有点痛苦。我能接受Rust的?，但Go就完全是折磨人了。

有些操作写成链式就很自然，非要我给每一个中间值都想个变量名反而比较困难，把错误放在返回值里就很难写出这种代码了。

关于 bail out （抛出异常）的观点

支持 bail out

来自 Frost Ming：

一个例子是用户交互程序，你需要把一些关键错误信息显示在界面上，而这个错误的来源，可能是任意层级深度的，这时异常抛出的「直达天听」的优势就显现出来了。

反对 bail out

bail out 的 exception 没有合理的 catch

来自 Frost Ming：

调用者不知道调用的这段代码会不会报错，报什么错，这就导致程序永远会在无法预料的情况下崩溃。恰巧，现在两种主要的动态语言，Python 和 Javascript，都采用的这种方式。而一些开发者，为了保住 SLO 和 KPI，就会用 try: <一大坨> except: pass 的代码兜底。底看似兜住了，其实早已千疮百孔。

性能缺陷

来自 t.me/kazurin

Throw 会带来语言和编译层面的不必要的复杂度，例如 non-local control flow，stack unwinding 等等的。这两种错误处理在底层的实现方式真的会很不一样。

想要打印栈信息的话，编译型语言（指最终编译为机器码的语言）除了 stack unwinding 之外，应该没有能兼顾高效率和无感知的做法（go 和 rust 在发生 panic 的时候都会做 stack unwinding，所以才能看到错误栈）。C++ 的异常处理就是用 stack unwinding 实现的，而 go 和 rust 都只在发生致命错误的时候使用（为了打印栈信息和清理资源）。

“bail out 和 back out 各有优劣” 当然是正确的说法，不过我还想进一步讨论一下，异常处理在具体场景下应该怎么做。

batch 型任务和 pipeline 型任务

batch 型任务

batch 型任务指的是，每一个子任务都不影响其后的子任务，例如：

有一批“同质化”的任务，使用的参数是同类的、处理的方法是相同的：

一个大任务由一些互不相关的任务构成，且大任务应该“尽可能完成”：

这种任务中，我们希望任何一个任务的失败都不会后面的任务。这意味着每一个任务都要 catch 其产生的所有异常，以免打断其它的任务。

“同质化”的任务例如：一个 API 服务器，同一个 API 收到的每个 request 都会用相同的函数处理，一次函数运行如果产生了异常，通常框架会有兜底的 catch，对 request 返回 500 response。

更常见的又例如：一个 for 中写了很长的复杂逻辑，那么这一大段逻辑应该有兜底的 catch，打印日志后 continue 进行下一个 loop。

一个大任务由一些互不相关的任务构成，且大任务应该“尽可能完成”，举个例子比如：游戏中玩家登录的时候，我们要为玩家初始化一系列功能模块。如果在一个功能模块初始化的过程中抛出异常（bail out），那么整个登录流程就被打断，玩家无法进入游戏。如果为每一个模块初始化加上 catch，那么玩家可以正常登录游戏，只有部分异常模块的功能无法使用。

可想而知，对于 batch 型的任务，也许 back out 的写法对开发者更友好。因为这种任务意味着不应该抛出任何异常，使得整个任务失败。

pipeline 型任务

pipeline 型任务则意味着，一个子任务的输出是另一个子任务的输入。此时一个子任务的失败就意味着整个任务的失败：

或者链式的调用：

对于这种类型的任务，bail out 就是比较方便的写法了。如果使用 back out 的语言，那么饱受诟病的 if err != nil 指的就是这种情况了。而 rust 提供的语法糖则相对友好一些。

服务型软件和工具型软件

服务型软件指的是：“开发者”和“用户”有着清晰的分界，抛出的异常不应该让用户感知到，但应该让开发者感知到。

例如一个 GUI 应用，任何情况下都不应该导致程序崩溃。好的做法比如用弹窗提醒用户“该功能出现异常，建议反馈给开发者”。

上文提及的 API 服务器也是同理，任何错误都不应导致服务器崩溃，至少要兜底给用户返回 500 response。

而工具型软件通常意味着，“开发者”和“用户”不是明确分界的，因而异常本身也是接口的一部分。

这可能是一种语言特定的 SDK，例如 Python 的 requests 就会抛出多种 Exception 。这种软件往往允许抛出异常，交给用户（下游开发者）来处理。

当然还有中间地带：允许抛出异常，但只允许事先规定好的异常类型。例如陈皓前辈所提倡的，尽量使用 HTTP status code 表示错误：

(archive.today 备份)

我做系统架构的一些原则 | 酷壳 - CoolShell

工作 20 多年了，这 20 来年看到了很多公司系统架构，也看到了很多问题，在跟这些公司进行交流和讨论的时候，包括进行实施和方案比较的时候，都有很多各种方案的比较和妥协，因为相关的经历越来越多，所以，逐渐形成了自己的逻辑和方法论。今天，想写下这篇文章，把我的这些个人的经验和想法总结下来，希望能够让更多的人可以参考和借鉴，并能够做出更好的架构来。另外，我的这些思维方式和原则都针对于现有市面上众多不合理的架构和方案，所以，也算是一种“纠正”……（注意，这篇文章所说的这些架构上的原则，一般适用于相对比较复杂的业务，如果只是一些简单和访问量不大的应用，那么你可能会得出相反的结论）

https://coolshell.cn/articles/21672.html#原则五：制定并遵循服从标准、规范和最佳实践

世界并不总是非黑即白

现实中，我们写的函数可能有一个链式调用语句，又有一个 for 循环。它不是纯粹的 batch 或者 pipeline。抛出异常有时候可以直接抛出，有时候应该包装为预先规定好的另一种异常，有时又必须兜底处理所有异常。

个人看来，back out 风格的 rust 提供的语法糖似乎是比较好的做法，兼顾了不同的需求。bail out 风格的 Java 似乎也处理的比较好。

Java 也是用第一种异常抛出的方式，但由于它有完善的异常标注和静态检查，异常也不会随意泄漏导致程序崩溃。—— Frost Ming

在现实情况下，不同的语言各自又有什么样的 best practice？本文仅当抛砖引玉。

©️

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

花时来信 | 工牌厂打工体验 | #0x04

2023-10-13T16:00:00.000Z

大家好！最近这封信拖延的时长比我想象的长许多… 当写下这封信的时候，我已经在工牌厂干了三个月并且已经离职了！

在离职前看到的网图

我刚入职时开始第三季度的 OKR 制定，到离职时最后的会议就是第四季度 OKR 会议，也算是完整经历了一个季度的工作。这个季度对项目来说似乎也称得上重要时期，组里持续了两个月周六加班的高强度工作，终于在我离职当天划上句号。项目正式公布了国服上线时间：

《星球：重启》11月16日公测定档！1500万预约达成！福利限时追加，留言有礼~

尽管网上对字节的工作氛围多有批评，但我这三个月来的实习体验相当不错。

首先在公司层面上，字节确实是相当“现代化”的一家企业：始终创业、多元兼容、坦诚清晰、求真务实、敢为极致。其中或许有很大的原因在于公司有庞大的海外业务。不管怎么说，我在办公室里确实看到了字节范的每一条。

在部门层面上，虽然不及抖音那样光鲜，但毕竟有了啊手这一大王牌，还称得上体面。主管相当靠谱，常谈论字节范的价值观，对我的实习安排也十分坦诚。同事也很愿意给我提供帮助。这大概就是我能获得良好的实习体验的原因。

不可否认，赶工上线国服的工作强度的确很大，不过我实习期间做的是代码的静态检查，相对独立于主项目，因此也没有承受什么压力。公司的免费三餐和健身馆也很大程度上减少了日常的负担。

说到自己身上。作为我的第一次工作经历，我也由此接触了一个大型项目的运作逻辑和流程，当然也见识了这样庞大的代码库（屎山）为什么是能跑的。而正好我的工作内容也是对代码做静态检查。

真实的工作经历也展现出了我的很多不足。在接手静态检查的工作时，前辈给我开了一批工单，对应需要实现的检查规则。然而在离职时，这些工单大部分没有完成——我的主要工作其实是设计实现了一套底层的类型分析，无法直接对应到工单上。

这一方面可以说工单的设置其实不合理，另一方面也可以说我对实际落地效果不够重视。而这件事本身其实也反映出，我对这套基于工单的工作流程不够重视，也缺少跟主管的沟通。

虽然工单流程在我这里有些失效，但我在前辈的催促下确实制订了一个工作分解和估时的时间表。我一直觉得制定时间表是很痛苦的，毕竟我自己也不知道这项工作要做多久。但这可能也说明了能力的不足：如果我对一项工作的实现路径很清晰，那应该是能做出一个靠谱的时间表的。而这也是团队协作的基础。

在答辩时，评委对我的静态检查工作提了一个开放式的问题：“这里成百上千个检查出来的问题，你要怎么推动它们的修复。”乍一听这个问题很没道理，毕竟我只负责静态检查工具的开发，修复当然是交给责任人去修复呀。但细细一想确实如此：很多代码写的不太对，但是能跑，因而也不怎么受重视；而我应该负责的并不只是“静态检查工具”，而是“团队的代码质量”。另一个角度来说，推动别人去做一件我重视的事情，大概也是一项有用的能力。

因此在离职前的最后三天，我又写了个 VSCode 的插件来显示静态检查报告。一来在编辑器里抬头不见低头见的报错，大约更能让人警觉；二来也期望这能帮助大家更轻松地修复问题。

谈论软件工程，我学到的一大道理是：屎山其实是能跑的。

对于自己写的小玩具，往往规模是可以控制的，确保代码的逻辑是充分完整正确的——如果不行，那这个小玩具就会被抛弃。

但对于一个庞大的工程，代码失控是不可避免的，屎山必然会产生，而且我也不能一走了之，更不可能“用 Rust 重写”。因此必须接受一个事实：屎山是能跑的。而对屎山的管控要从一开始就做起——我们在编码时预设这段代码会成为屎山，因此要插入足够多的 log 打印等信息，这大概也就是所谓的“可观测性”。

也给自己设计一个新的季度 OKR：

投入开发
pgvecto.rs
tensorchord • Updated Oct 14, 2023
，学习理解SQL、数据库和向量。获得自我管理的能力。

运营一个社团的博客催稿小组，并且能够固化成为社团的定番。获得运营的能力。

完成 xv6 最后一章文件系统的学习，在 OpenDAL 提一个 feature 的 PR。

学习基本的 PL ，理解数据流和控制流分析。

祝大家生活愉快！

Re: Thread and Process

2023-07-27T16:00:00.000Z

Hello everyone, in this article I would like to talk about the difference between process and thread.

This is a commonly discussed topic and perhaps every programmer has been asked about it during interviews or exams. However, when I first learned about these concepts and searched for information online, I found that most explanations were not detailed enough or didn't cover the key points. So I would like to share my understanding on this topic and hopefully express it clearly.

1. Threads

1.1 What is a thread?

Let's start with threads. What are threads? As the name suggests, a thread is like a line, referring to a sequence of program instructions executed in order. For example, if we have a+b, it has a linear logic flow: first input a, then input b, add a and b, and finally output the result. This sequential execution is also apparent in assembly language, representing what we call a "line" of execution.

1.2 Thread switching in an operating system

A thread refers to a linearly executing program, which is a logical definition of a thread.

From the perspective of the CPU or the operating system, the operating system is a program running on the CPU. Assuming we have a simple CPU that works by fetching instructions, decoding them, executing them, reading and writing to memory, and writing back to registers in a sequence, it executes one instruction at a time in the order specified by the programmer.

When the operating system starts, it has a main thread, known as the kernel thread, which goes through initialization and enters the scheduler loop. It is important to note that the process of starting and initializing the kernel thread until it enters the scheduler is linear, just like a regular program we write, one line after another. We can refer to this as the kernel thread. Once inside the scheduler loop, a second thread appears, which are the threads of user processes. Here, when I say "process", it may lead to some ambiguity, but I mean the threads of user-written application programs.

The user-written application program, such as a+b, is compiled into a series of instructions that represent a continuous logical flow independent of the kernel's logic. So user threads and kernel threads are two different logical entities, or two different threads.

During the scheduler loop, the kernel thread switches to a user thread. During this thread switch, the system saves the register state of the kernel thread and switches to the logic of the user thread. This thread switch enables the scheduler to run user-written threads, switching between the linear logic of the kernel and the linear logic of the user program. This is a functionality of the operating system kernel.

From the perspective of the CPU, it actually executes sequentially, but it first executes the kernel thread and then the kernel switches its thread to a user thread, achieving thread switching. Therefore, from the perspective of the CPU and the operating system, threads are defined in this way.

1.3 Multi-threading and parallelism

From a top-level perspective, from the viewpoint of writing multi-threaded programs, the purpose of having multiple threads is generally for parallel computing. The operating system can assign these threads to multiple CPUs to run simultaneously. For example, if I have two single-threaded CPUs, the operating system can assign each thread to one CPU for execution.

Although they use different sets of registers, they can share the same memory space. For example, if a variable written by the user is stored in a specific memory location, and now two threads need to read from this memory space, they are using the same address space. They can run on two different CPUs (assuming these two CPUs are single-threaded primitive CPUs; we know that modern CPUs have many technologies, and one CPU can actually run multiple threads simultaneously).

2. Processes

2.1 What is a process?

That was about threads. Now what about processes? A process is an isolated environment provided by the operating system to each application program, similar to a "virtual machine." Why do we call it a virtual machine? Because it creates an illusion that each application program has exclusive access to a computer, and it uses a simple, linear computer with only a sequentially executing CPU and a dedicated memory.

The essence of a process is to provide an isolated environment for each application program, like having its own dedicated computer. A process has at least one thread, right? It has at least one segment of continuous logic, which may be implemented with multiple threads but at least one thread.

A piece of code, according to the concept of Turing machines, is stored in a specific location in memory. Let's not think about the physical aspect of memory, let's focus on the virtual memory provided by a process for an application program.

This memory is quite simple. It starts with a fixed address containing a segment of continuous code. Below that, there is a heap starting from another fixed address, and above that, there is a stack starting from a different address. This concept is known as the heap and stack. A process has this memory space. As a programmer, whether writing in a high-level or assembly language, we can consider ourselves writing code on this virtual machine and running our code on it. So when we discuss the stack and heap in memory, we are referring to the stack and heap on this virtual space.

2.2 How does an operating system implement processes?

This is what a process is. How does it simulate the exclusive use of CPU and memory? This is one of the fundamental functions of the operating system.

First, we have the scheduling mechanism for CPU allocation. The scheduling mechanism aims to distribute CPU time as evenly as possible, allowing each application program to run as if it has exclusive access to the CPU. When the time slice for a process expires, the hardware triggers a timer interrupt. It saves the register data, switches all calculation-related registers to the scheduler, loads the register data of the next process, and finally switches these registers to the next process. This process is invisible to the programmer. An application program is "stunned" and then awakened, perceiving that the registers are the same, creating the "illusion" that "nothing happened during this time." Thus, this process perceives that it has the exclusive and continuous use of the CPU.

Secondly, we have the virtual memory mechanism for memory allocation. Virtual memory primarily operates through page table mapping. By following certain rules, such as dividing physical memory into segments and extracting a few segments, these segments are combined to form contiguous memory, creating virtual memory for process usage. There is an address mapping relationship between this physical memory and virtual memory, maintained by the kernel.

Therefore, a process is like a computer with its own dedicated processor and memory, allowing an application program or a programmer to consider themselves as writing and running code on a standalone computer. This is what a process is, essentially a simulation or abstraction of a computer.

3. Comparison between Threads and Processes

3.1 Relationship between multi-threading, multiple processes, and parallelism

Multi-processing refers to the simultaneous execution of multiple applications. Whether it is multi-threading or multi-processing, they can run in parallel on a modern computer. However, it doesn't mean that they always run in parallel.

For example, if I only have a single-threaded CPU, no matter how many threads I write, they cannot be executed simultaneously on that CPU; they have to take turns using the CPU. In this case, this is essentially coroutines, mentioned in the following section.

Alternatively, even if I have multiple physically excellent CPUs, but my operating system scheduler decides to allocate all CPU resources to one process at a time, in this situation, multiple processes cannot run in parallel.

Although it may sound strange, theoretically it is possible. Parallelism and multi-threading/multi-processing are not inherently interconnected.

In general, the purpose of multithreading is to achieve parallel computing, while the purpose of multiprocessing is not necessarily parallelism. The essence of multiprocessing is to provide isolated environments for multiple applications. Therefore, when multiple threads are used within a process, they utilize the same set of contexts, while multiple processes use different contexts and memory spaces. They are fundamentally different: with different meanings, goals, and implementation methods.

3.2 Threads are not scaled-down processes

We often hear statements like "a process contains multiple threads." This statement may lead to the misconception that a thread is a lower-level version of a process, a tiny process. However, this feeling is incorrect because threads and processes are two different concepts. Threads do not derive from processes. As mentioned earlier, they are two different dimensions of concepts.

4. Coroutines

Finally, there is something called coroutines. Conceptually, coroutines are indeed linear logic, as described at the beginning of this article.

However, unlike multithreading, coroutines do not support parallelism. Why not parallel? Because parallelism may lead to data race issues.

So, multiple coroutines refer to running multiple logical threads on a single thread, just like thread switching on a single-threaded CPU as mentioned earlier. It is physically sequential and does not involve parallel execution, but logically it has multiple threads.

This is my understanding of the differences between multi-processing, multithreading, and coroutines. Thank you for reading.

©️

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

再谈：线程和进程

2023-07-21T16:00:00.000Z

大家好，本文我想来谈谈进程和线程的区别，即 process 和 thread 的区别。

这是一个老生常谈的问题，可能每个程序员都会在面试或考试中被问到。但是在我刚学习这些概念时，在网上查找资料时，我发现大部分讲解并不够详细，或者没有说到关键点。所以我想用我的理解来谈谈这个问题，并希望能清晰地表达。

一、线程

1.1 线程是什么

首先，我们从线程开始说起，线程是什么呢？顾名思义，线程像一条线一样，指的是一个按顺序运行的程序逻辑。比如a+b。它是线性的逻辑，先输入 a，输入 b，然后 a 加 b，然后输出，顺序连接在一起。在汇编语言中也清晰地表现出这种顺序执行，这就是逻辑上说的一个”线”程。

1.2 操作系统中的线程切换

线程 (thread) 指的是一段线性执行的程序，这是逻辑意义上的线程。

从 CPU 的角度或操作系统的角度来看，操作系统是一个运行在 CPU 上的程序。我们假设 CPU 是一个朴素的 CPU，按照取指令、译码、运行、读写内存、写回寄存器的顺序工作，一次执行一条指令，按程序员给的指令序列顺序运行。

当操作系统启动时，它有一个主线程，即内核线程，经过初始化后进入调度器循环。需要注意的是，内核线程启动、初始化到进入调度器的过程是线性的，就像我们编写的普通程序一样，一行接一行，我们可以称之为内核线程。进入调度器循环之后，这时就会出现第二个线程，即用户进程的线程。这里说“进程”可能会引起一些歧义，我的意思是指用户编写的应用程序的线程。

用户编写的应用程序，比如a+b，编译为一系列指令，这些指令是一段连续的逻辑，与内核的逻辑无关。所以用户线程和内核线程是两个不同的逻辑，这是两个线程。

调度器循环中，内核线程会切换到用户线程。在线程切换时，系统会保存内核线程的寄存器状态，并切换到用户线程的逻辑。通过这个线程切换，调度器可以运行用户编写的线程，在内核的线性逻辑和用户程序的线性逻辑之间切换。这是操作系统内核的功能。

在 CPU 的角度，实际上它是按顺序执行的，但它先执行内核线程，然后内核会将自己的线程切换到用户线程，实现了线程切换。所以，从CPU和操作系统的角度来看，线程是这样一个东西。

1.3 多线程并行

而从顶层，从用户编写多线程程序的角度来看，多线程的目的一般是为了并行计算，操作系统可以将这些线程分配给多个CPU同时运行。如果我现在有两个单线程的CPU，操作系统就可以将这两个线程分别放到两个CPU上运行。

虽然它们使用不同的寄存器，但可以共享同一个内存空间，例如用户编写的变量存储在内存的某个位置，现在有两个线程都要读取这个内存空间，所以它们使用的是同一套地址空间。它们可以在两个不同的 CPU 上运行（假设这两个 CPU 都是单线程的原始 CPU。我们知道现代的 CPU 有许多的技术，一个 CPU 其实可以同时运行多个线程）。

二、进程

2.1 进程是什么

以上就是线程的概念。那么进程 (process) 是什么呢？进程是操作系统提供给每个应用程序的一个隔离环境，类似于一个虚拟机，是抽象的一台计算机。为什么说是一个虚拟机？因为它给每个应用程序创建了一种独立使用一台计算机的感觉，而且使用的是一台很朴素的计算机，只有顺序执行的 CPU 和一块独立使用的内存。

进程的本质是为每个应用程序提供一个隔离环境，类似独占一台计算机。一个进程至少有一个线程，对吧？至少有一段连续的逻辑，可能可以写多线程，但至少有一个线程。

一段代码，按照图灵机的说法，被存储在内存的某个位置。我们暂且不考虑物理内存是怎样的，我们要讨论的是进程为应用程序提供的虚拟机的内存。

这个内存比较朴素，从一个固定的地址开始是一段连续的代码，然后从另一个固定地址开始向上是堆，又一个地址开始向下是栈，也就是堆栈的概念。进程拥有这样一段内存。作为一个程序员，无论是写高级语言还是汇编语言，我们都可以将自己视为在这台虚拟机上编写代码，并在这台虚拟机上运行我们的代码。所以当我们讨论内存中的堆栈时，我们说的是这个虚拟空间上的堆栈。

2.2 操作系统如何实现进程

这就是进程，进程如何模拟独占的CPU和内存呢？这是操作系统的基本功能之一。

一是分配 CPU 的调度机制。调度机制是通过（简单来说）尽量均匀地分配 CPU，使每个应用程序都能运行，就像它拥有独占的 CPU 一样。等到一个进程的使用时间片到期了，硬件就触发一个定时器中断，先保存寄存器的数据，然后把所有计算用的寄存器切换到调度器上，调度器载入下一个进程的寄存器数据，最后把这些寄存器切换到下一个进程。这个过程是程序员不可见的，一个应用程序被“打晕”然后又被唤醒，它看到的寄存器是一样的，是内核给它维护了“这段时间无事发生”的“幻觉”，因此说这个进程看到的是一颗它独占使用、连续使用的 CPU。

二是分配内存的虚拟内存机制。对于虚拟内存来说，它主要通过页表映射的方式来操作。通过一定的规则，例如把物理内存切成片段，然后从中提取几个片段，并将这些片段组合成连续的内存，形成虚拟内存，供进程使用。这个物理内存和虚拟内存之间有一个地址映射的关系，由内核维护的。

因此，进程就像拥有独立处理器和独立内存的计算机，使得应用程序或者程序员可以认为自己是在一台独立计算机上编写和运行程序。这就是进程，它本质上是对计算机的一种模拟或者抽象。

三、线程和进程的对比

3.1 多线程、多进程和并行的关系

多进程指的是多个应用程序同时运行。无论是多线程还是多进程，在现代计算机上都可以并行运行。但并不是说它们都一定是并行的。

例如，如果我只有一颗单线程的CPU，那无论我写多少个线程，它们都无法同时使用这个CPU，必须轮流切换 CPU 的使用权。这个情况其实就是协程，下文再述。

或者，即使我有多颗物理上的优秀CPU，但我的操作系统调度器决定一次只给一个进程分配所有的 CPU 资源，那在这种情况下，多进程也不能并行运行。

虽然这听起来很奇怪，但理论上是可能的。并行和多进程、多线程并不是必然的联系。

通常来说，多线程的目的是实现并行计算，但多进程的目的不一定是。多进程的本质是为了给多个应用程序提供隔离的环境。因此，在一个进程里使用多线程使用的是同一套上下文，而多进程使用的是不同的上下文和内存空间。它们其实是本质不同的：含意不同、目的不同、实现方式也不同。

3.2 线程不是小型的进程

我们还常常听到这样的说法，一个进程中有多个线程。这种说法会让人误以为，线程是一个低级的进程，线程是一个微型的进程。但这种感觉是错误的，因为线程和进程是两个不同的概念。线程并不是从进程派生出来的，正如上文所述，它们是两个不同维度的概念。

四、协程

最后，还有一种称为协程的东西。协程在概念上也就是文初所述的、一个线性逻辑。

但它不像多线程一样支持并行。为什么不并行？因为并行可能导致数据竞争的问题。

所以多协程的含意就是说，在一个线程上进行多条逻辑线的运行，就像上文说的，在一颗单线程的 CPU 上做线程切换一样。它物理上是顺序执行的，不存在并行的情况，但逻辑上是有多条逻辑线。

以上就是我所理解的，关于多进程、多线程和协程的区别。谢谢阅读。

补充

5.1 多进程其实是一种并发需求

多进程其实是一种并发而非并行。例如一个单线程 CPU 上要跑一个支持多进程的操作系统，这其实是一个并发而不并行的需求。

5.2 和 ChatGPT 整理一遍

来自 handongxue

计算机中并发和并行的区别

ChatGPT: "在计算机科学中，并发和并行是两个相关但不同的概念。并发是指在同一时间段内处理多个任务或事件，这些任务或事件可能交替进行，但同时只有一个正在被处理。并发通常利用时间片轮转算法来实现，即将处理时间分成若干个小时间片，每个任务或事件依次占用一个时间片执行一定的操作，然后切换到下一个任务或事件。并行是指同时执行多个任务或事件，需要多个处理器或核心来实现。并行能够提高处理速度和系

https://poe.com/s/a20F6hxsUspRXn7tTdJh

5.3 从功能角度看待线程和协程的区别

来自 Ekstasis

“协程与线程的概念是很紧密的，它就是不能并行的多线程。”这句话似乎不是很准确。

通常情况下协程“coroutine”应该是指 cooperative multitasking (协作式多任务)的实现，而所谓线程就一般指的是 preemptive multitasking (抢占式多任务)的实现了。两种并发模型都为我们提供了独立执行流的抽象，但他们所提供抽象的不同之处在于“如何进行任务间的切换”。

单个CPU时，抢占式多任务中任务切换是由相应硬件机制强行终止一个任务再换上另一个任务来实现的；而协作式多任务中是任务自己通过yield，await等特殊操作来主动交出CPU控制权的。

抢占式多任务很容易扩展到多CPU上，因此也能很好得利用并行能力了；但协作式多任务由于只提供了主动交出控制权这样的并发模型，所以也就难以利用多CPU的并行能力了；

那么由上面所说的 “线程->抢占式->并行，“协程->协作式->无法并行” 两个关系来看，难道“多协程是不能并行的多线程”也对？但是！线程的定义中并没有强调“并行”，因此也有不能并行的线程模型！比如由于CPython臭名昭著的GIL(Global Interpreter Lock)，每个解释器线程进入指令分派循环时都需要获取这把锁，因此CPython提供的并发模型就是“线程->抢占式->无法并行，“协程->协作式->无法并行”了，，

当然这些话有些咬文嚼字之嫌，，，毕竟这些概念本来也就不是well-defined的。不过“协作”“抢占”两个概念以及“CPython”的实现这些东西我感觉还是挺有意思的。

©️

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

花时来信 | 新生活 | #0x03

2023-07-10T16:00:00.000Z

💡

点击这里订阅我的 Newsletter

大家好！

很久没有写信了，最近实在有些焦头烂额，原因是：我开始了我的打工生活。

昨天我回家的时候发现，已经没有快递要取了。非常开心，我感到新生活开始进入了正轨，因此决定该写写了。

上上周六经历了人生第一次误机。当时和小陈缠绵磨叽了一整天，晚上还跑去喝了一杯鸡尾酒。本来是九点五十五的飞机，我们颇为自信地喝到八点。然后回家取行李，再去机场——此时就是九点二十五了。值机窗口已经关闭——我们完全忘了坐飞机还要值机这回事。

破财消灾，我改签了第二天一大早的飞机。中午到达杭州，我先去西面的友人家里放下行李，然后立即往城东去找住处。对于第一次租房的我来说这房子也不太好找。晚上和家人复盘了一会儿，决定找个月租公寓。于是第三天就顺利了，搬进了一间还不错的公寓。地铁一趟 1h20m，那天坐了三趟。

接下来就是入职，然后焦头烂额地购买各种生活用品。每天都要对着淘宝和咸鱼查收快递。公司用的版本管理不是 git，也用不了 Copilot，这又是麻烦事。当然还有配置电脑、熟悉文档、入职培训。。。不一而足。

这一次旅行的心境是很不同的。不是上学、不是旅游，而是真的一个人到陌生城市来打工。一切都要自己处理，即使是洗衣做饭这样平常的家务，以前也是日常做的，到这里来也格外费力——还打破了一个新买的碗。周末试着在杭州逛了逛，不得不说这个气候实在不适合遛街。物价也比大连贵不少。好多次和小陈说，好想回大连。

屋漏逢雨，先前便宜买的 OpenAI 账号用完了，涩图书馆也不得不暂时停运。如果有便宜的账号来源请告诉我！

离开大连之前还高高兴兴地玩了几天。一是考了雅思，二是去玩了桨板，三是和小陈推了一遍《底特律：变人》

小陈发的视频：

https://www.xiaohongshu.com/explore/6497065300000000130365ab

https://www.xiaohongshu.com/explore/6494e73300000000130041e6

桨板实况：

New video by Alice Bob

https://photos.google.com/share/AF1QipMNA0Qey0J76MmvuN1COPVRlZZ25k2Up9OUJQKUEWQMEBARm6SABhxhwEyExwSvbw?key=WnJHSlJ1cmZtX2ViQW1DckFydkY3dFluODBEdGhn

在杭州的第一个周末，去了小河直街、大马弄，和 Morpheus 酒馆：

11 new items by Alice Bob

https://photos.google.com/share/AF1QipOpn25awcUT-8mOMSsZdW9WgVnV6p6GgjevflXiN3HpQEXs2neCSuOzuimnXbvoKQ?key=WDJ0V2hVaklwTnFyNld1M21KbWJkaHg5MERjWUl3

我第二次在地铁站遇到这对奇特的组合。一个是中国老头，另一个是金发红脖的白人老头。

中国老头有些瘦小，一侧腿脚不便，走路一瘸一拐。他戴一顶军绿色鸭舌帽，手里提着一只塑料袋，里面装着四五个白面馒头。

白人老头高高壮壮，看来应该健康，带一只略显老旧的单肩帆布包，挺大。

二人都戴着口罩。中国老头露出鼻子，声音有些沙哑。白人老头则紧密遮住半个面庞，有些气喘，声音略尖。两人在地铁站末端等车，一边谈笑。

这二人是谁？是做什么的？是什么关系？如何结识？要去哪里？我不得而知。

信息食谱

软件

sd-webui-qrcode-toolkit

antfu • Updated Nov 23, 2024

封装成插件的 QRcode ControlNet！

文章

精神上的糖尿病：信息过载却总是恐惧错过 | 搜信源 - 知乎

archived 11 Jul 2023 08:15:51 UTC

http://archive.today/2023.07.11-081551/https://zhuanlan.zhihu.com/p/636440117

小红书是当下时代精神的样本吗？

archived 11 Jul 2023 08:20:38 UTC

http://archive.today/2023.07.11-082038/https://mp.weixin.qq.com/s/9zByKJH87YM7IFa5mhhyyw

在大学里，消失的这三年

archived 11 Jul 2023 08:49:40 UTC

http://archive.today/2023.07.11-084940/https://mp.weixin.qq.com/s/XXmW97prE8Sv_YCwf4HoLw

播客

vol.393 我在澳洲当按摩师，攒了60万

https://hosting.wavpub.cn/wuliaozhai/2023/05/15/vol-393-我在澳洲当按摩师，攒了60万/

vol.384 我在德国赌场当荷官

https://hosting.wavpub.cn/wuliaozhai/2023/04/06/vol-384-我在德国赌场当荷官/

vol.399 我们想用这些沟通难题难为黄执中，没难住

https://hosting.wavpub.cn/wuliaozhai/2023/06/12/vol-399-我们想用这些沟通难题难为黄执中，没难住/

第一次听无聊斋的节目，很上头。

S2E6 垃圾

用五十年的音乐垃圾，分辨什么是创作。封面 by 小白免不在场由重轻主理。介绍和常见问题，请访问 buzaichang.xyz 。节目中使用的音乐： Gloomy Sunday - Billie Holiday Over the Rai...

https://buzaichang.xyz/episodes/trash

第一次听重轻的节目，也很上头。那段 KPM 1000 一下子就让我想起年初流行的这个流浪地球2配乐标题。

跳进兔子洞: 系统与人01丨如何「炒股」高考志愿？

https://therabbithole.fireside.fm/14

跳进兔子洞: 系统与人02丨在大学里，消失的这三年

https://therabbithole.fireside.fm/15

254 银行买手：单伟建谈收购深圳发展银行往事

听《忽左忽右》上小宇宙。「忽左忽右」是一档文化沙龙类播客节目，试图为中文播客听众提供基于经验视角的话题和内容。本节目由JustPod出品。曾荣获苹果播客2019年度最佳播客。

https://www.xiaoyuzhoufm.com/episode/649abbd32e8cc41505e86476

255 政海轶闻：程潜与民国政治中的湖南派系

https://www.xiaoyuzhoufm.com/episode/649eb1543e7f359f132f67ee

256 从瓦格纳谈私人军事服务商（PMC）今昔

https://www.xiaoyuzhoufm.com/episode/64a3f29c3ca7a656ff9617a1

当时第二天是要考雅思的。我硬着头皮熬到结局出炉。

谢谢阅读！

Beautyyu 言醴

A learner with a wide range of interests and curiosity. ✦ Python / C++ / JavaScript ✦ Chinese / English ✦ Backend / AIGC (On the job market!)

https://beautyyu.one/

我家曾经有两只鸭子

2023-06-17T16:00:00.000Z

当时我回家的时候，室友神秘兮兮地告诉我：“看看我们家多了什么。”

是两只鸭子🦆。当时，我家多了两只鸭子。

我深感震撼。两只鸭子是小鸭子，室友逛早市的时候从市场买来的，或许是一种冲动消费。两只鸭子放在一个大纸箱子📦里，箱子底部垫了一半的毛巾，另一半放着水碗和饭碗。箱子顶上又用毛巾盖住一半，两只小鸭子就在阴影里，抱作一团，瑟瑟发抖。

而我家，是一个三猫家庭。

我对室友说，希望明天起床能看到这两只小鸭子还在。室友说，没关系，它们已经友好地来过，和新住民打过招呼了。

两只小鸭子分别叫“沃”、“你真的会瘦”。这两个名字就和主人的身世一样随意而不经考量。这是室友在回家的出租车椅背上看到的广告。这两个名字，一个过于简短模糊、另一个又冗长拗口，因此我们从未用名字叫过小鸭子。其实我们也分不清这两个名字分别属于哪只鸭子，毕竟它们是同一品种的绿头鸭。但它们确实是有名字的。

第二天，它们依然在纸箱子里，精神还略好了一点。第三天也是。每天回家我都会去看望一眼小鸭子，然后欣慰地说：“还没被吃掉”。室友幽怨地表示，如果它们遭遇不测，那一定是我干的。室友悉心照料小鸭子，还给它们买了大笼子和饲料。尽管室友在冲动消费后曾懊恼地表示：宁愿它们活不大。

但它们确实健康地长大了，以肉眼可见的速度。长大的鸭子有两大变化，一是更能叫唤、二是更能拉屎。

它们和其他鸟类一样能叫唤，在天亮的时候叫得尤欢。现在我们家是一个三猫两鸭家庭，每天夜里总是生机盎然。室友不堪其扰，到夜里就把鸭笼子换到远离人烟的地方。原先是在客厅，于是先搬到餐厅，再然后是搬到厨房。我想下次大抵是要进锅子了，不过这一天终究没等到。

拉屎倒不是长大后才这么能拉，但成长使得它们拉屎的力和量都变大了。白天鸭子们在阳台，笼子周围半米是难以接近的。晚上鸭子们进了屋子，似乎多少也就有所节制。但不论如何，家里总会浮着一层或浓或淡的鸭味。

室友每天都给鸭子换屎垫子，同时也把它们放出笼子活动活动。但它们不会活动太久，因为同时参与活动的还有卡卡。卡卡是一只花猫。

当它看见鸭子出笼了，便高兴地凑上前去。两只鸭子也一打激灵，四条小腿使劲蹬起来。它们先是从笼子往客厅跑，然后再从客厅跑回笼子，一头钻进笼子里去。室友关上栅栏门，今天鸭子们和卡卡的活动时间就结束了。

两只鸭子越长越大，笼子也放不下了。到了春末夏初，室友决定将它们放生到劳动公园。

第二天，我和小陈在劳动公园转了转，池塘上有几只绿头鸭漂游着。但我们没有找到“沃”、“你真的会瘦”。

©️

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.