Rust 2026 经验谈 - 错误处理在库与应用中的分层设计

错误处理不是一个”选 thiserror 还是 anyhow”的技术选型问题，而是一个架构分层问题。库的错误类型、应用的错误聚合、错误与日志的协作——这三层各司其职，搞混任何一层都会让代码在迭代中快速腐化。本文从实战出发，讲清楚每一层该怎么做、常见的反模式是什么、以及 tracing span 如何让错误变得可追踪。

库级 error type 设计原则#

库的错误类型是公共 API 的一部分——它和公开的函数签名一样重要，甚至更难改。

原则 1：non-exhaustive 枚举——给未来留退路#

1
use thiserror::Error;
2

3
#[derive(Error, Debug)]
4
#[non_exhaustive]
5
pub enum DatabaseError {
6
    #[error("connection failed: {0}")]
7
    Connection(String),
8

9
    #[error("query execution failed: {0}")]
10
    Query(String),
11

12
    #[error("timeout after {0:?}")]
13
    Timeout(std::time::Duration),
14

15
    // 新变体可以随时添加，不会破坏下游的 match
16
}

#[non_exhaustive] 要求调用者必须写通配分支：

1
match db_err {
2
    DatabaseError::Connection(msg) => retry(msg),
3
    DatabaseError::Timeout(dur) => wait_and_retry(dur),
4
    _ => fallback(), // 必须处理未知变体
5
}

没有 #[non_exhaustive]，添加新变体是破坏性变更（semver incompatible）。有了它，库可以在 minor 版本中添加新的错误变体。

经验：几乎所有公开的错误枚举都应该 #[non_exhaustive]。 唯一的例外是你有 100% 的信心不会增加变体（这几乎不存在）。

原则 2：从低层错误抽象，而非透传#

这是最常见的反模式——把底层错误类型直接暴露在你的 API 中：

1
// 反模式：透传底层错误类型
2
#[derive(Error, Debug)]
3
pub enum CacheError {
4
    #[error("redis error: {0}")]
5
    Redis(#[from] redis::RedisError),  // 调用者被迫依赖 redis crate
6

7
    #[error("serialization error: {0}")]
8
    Serde(#[from] serde_json::Error),   // 调用者被迫依赖 serde_json
9
}

问题：

依赖泄露：调用者为了 match 你的错误，必须依赖 redis 和 serde_json
实现耦合：换掉 Redis 用 Memcached，错误类型就变了——但这是实现细节，不该影响 API
版本锁定：底层库升级改了错误类型，你的 semver 就被迫大版本跳

正确做法：抽象为自描述的错误

1
#[derive(Error, Debug)]
2
#[non_exhaustive]
3
pub enum CacheError {
4
    #[error("backend connection failed: {details}")]
5
    ConnectionFailed {
6
        details: String,
7
        #[source]
8
        source: Box<dyn std::error::Error + Send + Sync>,
9
    },
10

11
    #[error("serialization failed for key `{key}`: {reason}")]
12
    SerializationFailed {
13
        key: String,
14
        reason: String,
15
    },
16

17
    #[error("key `{0}` not found")]
18
    NotFound(String),
19
}
20

21
impl CacheError {
22
    pub fn from_redis(err: redis::RedisError) -> Self {
23
        CacheError::ConnectionFailed {
24
            details: err.to_string(),
25
            source: err.into(),
26
        }
27
    }
28
}

现在：

调用者不需要知道底层是 Redis 还是 Memcached
你可以自由替换底层实现
错误仍然保留了 source chain 供调试

原则 3：错误变体按”调用者关心什么”划分，而非”底层发生了什么”#

1
// 反模式：按底层事件划分
2
pub enum HttpError {
3
    DnsResolutionFailed,
4
    TcpConnectionRefused,
5
    TlsHandshakeFailed,
6
    HttpResponse500,
7
    HttpResponse429,
8
}
9

10
// 正确：按调用者需要的处理策略划分
11
pub enum HttpError {
12
    #[error("connection failed: {0}")]
13
    ConnectionFailed(String),      // 重试
14

15
    #[error("rate limited, retry after {0:?}")]
16
    RateLimited(std::time::Duration),  // 等待后重试
17

18
    #[error("server error: {0}")]
19
    ServerError(u16),              // 5xx，可能重试
20

21
    #[error("client error: {0}")]
22
    ClientError(u16),              // 4xx，不应重试
23
}

原则 4：error type 层级不要超过 3 层#

1
底层库错误 (io::Error, serde_json::Error)
2
    ↓ From 转换
3
中间库错误 (DatabaseError, CacheError)
4
    ↓ From 转换
5
应用错误 (AppError)

超过 3 层的错误转换链会导致：

From 实现爆炸（N×M 组合）
错误消息层层包装，原始信息被淹没
调试时需要展开多层 source chain

应用级错误聚合与展示#

应用是错误的终点——没有下游需要 match 你的错误，只需要展示给人看。

anyhow / eyre：应用级的统一错误类型#

1
use anyhow::{Context, Result};
2

3
fn run_app() -> Result<()> {
4
    let config = load_config()
5
        .context("failed to load configuration")?;
6

7
    let db = connect_db(&config.db_url)
8
        .context("failed to connect to database")?;
9

10
    let cache = connect_cache(&config.cache_url)
11
        .context("failed to connect to cache")?;
12

13
    serve(db, cache)
14
        .context("server failed")?;
15

16
    Ok(())
17
}

context() 是应用级错误处理的核心——它给错误附加”当时在做什么”的语义信息。

不同展示方式的分层#

同一个错误，CLI、HTTP API、日志需要不同的展示形式：

1
use anyhow::Error;
2

3
fn display_for_cli(err: &Error) -> ! {
4
    eprintln!("Error: {}", err);
5
    // 只显示最外层消息，不暴露内部细节
6
    std::process::exit(1)
7
}
8

9
fn display_for_http_api(err: &Error) -> HttpResponse {
10
    // 结构化 JSON，包含错误码
11
    let code = classify_error(err);
12
    HttpResponse::json(serde_json::json!({
13
        "error": {
14
            "code": code,
15
            "message": err.to_string(),
16
            // 生产环境不暴露 backtrace
17
        }
18
    })).with_status(code.http_status())
19
}
20

21
fn display_for_log(err: &Error) {
22
    // 完整的 error chain + backtrace
23
    tracing::error!(
24
        error = %err,
25
        error_chain = ?err.chain().map(|e| e.to_string()).collect::<Vec<_>>(),
26
        backtrace = ?err.backtrace(),
27
        "operation failed"
28
    );
29
}

关键洞察：错误展示不是错误类型的事，而是展示层的事。 不要为了”HTTP API 需要错误码”就把 HTTP 语义塞进库的错误类型里。

错误分类：给应用级错误加结构#

1
#[derive(Debug, Clone, Copy)]
2
enum ErrorClass {
3
    Config,     // 配置错误，无法启动
4
    Network,    // 网络错误，可重试
5
    Database,   // 数据库错误，可能重试
6
    Logic,      // 业务逻辑错误，不可重试
7
    Internal,   // 内部错误，需要告警
8
}
9

10
impl ErrorClass {
11
    fn http_status(self) -> u16 {
12
        match self {
13
            ErrorClass::Config => 500,
14
            ErrorClass::Network => 503,
15
            ErrorClass::Database => 503,
16
            ErrorClass::Logic => 400,
17
            ErrorClass::Internal => 500,
18
        }
19
    }
20

21
    fn is_retryable(self) -> bool {
22
        matches!(self, ErrorClass::Network | ErrorClass::Database)
23
    }
24
}
25

26
fn classify_error(err: &anyhow::Error) -> ErrorClass {
27
    if err.is::<io::Error>() {
28
        ErrorClass::Network
29
    } else if err.is::<DatabaseError>() {
30
        ErrorClass::Database
31
    } else {
32
        ErrorClass::Internal
33
    }
34
}

错误与 tracing 的协作#

错误和日志不是两套独立的系统——它们应该深度协作。tracing 的 span 机制让这成为可能。

在 span 中嵌入错误信息#

1
use tracing::{instrument, error, info, Span};
2
use anyhow::{Context, Result};
3

4
#[instrument(skip(db), fields(db_id = %db.id()))]
5
async fn process_order(db: &Database, order: Order) -> Result<()> {
6
    info!("processing order");
7

8
    let inventory = db.get_inventory(&order.item_id)
9
        .await
10
        .context("failed to fetch inventory")?;
11

12
    if inventory.quantity < order.quantity {
13
        // 错误被当前 span 上下文化
14
        error!(
15
            available = inventory.quantity,
16
            requested = order.quantity,
17
            "insufficient inventory"
18
        );
19
        return Err(anyhow::anyhow!("insufficient inventory"));
20
    }
21

22
    db.decrement_inventory(&order.item_id, order.quantity)
23
        .await
24
        .context("failed to update inventory")?;
25

26
    Ok(())
27
}

当错误发生时，tracing 输出会包含 span 上下文：

1
ERROR process_order{db_id="prod-1"}: available=3 requested=5: insufficient inventory

用 error! 记录 Result 的完整 chain#

1
fn report_error(err: &anyhow::Error) {
2
    // 使用 tracing 的 error! 宏，自动捕获当前 span
3
    tracing::error!(
4
        error.message = %err,                              // Display
5
        error.source_chain = ?err.chain()                  // Debug of full chain
6
            .map(|e| e.to_string())
7
            .collect::<Vec<_>>(),
8
        error.backtrace = ?err.backtrace(),                // Backtrace
9
        "operation failed"
10
    );
11
}

踩坑：span 与 error 的生命周期#

1
// 错误：span 在错误返回后才记录
2
async fn bad_example() -> Result<()> {
3
    let span = tracing::info_span!("operation");
4
    let _enter = span.enter();
5

6
    let result = risky_operation().await; // _enter 在这里还存活
7
    // _enter drop → span 退出
8

9
    if let Err(e) = result {
10
        // 此时已经不在 span 中了！错误日志丢失上下文
11
        tracing::error!(error = %e, "failed");
12
    }
13
    result
14
}
15

16
// 正确：使用 instrument 或 in_span
17
async fn good_example() -> Result<()> {
18
    risky_operation()
19
        .instrument(tracing::info_span!("operation"))
20
        .await
21
        .map_err(|e| {
22
            // 错误仍在 span 内被记录
23
            tracing::error!(error = %e, "failed");
24
            e
25
        })
26
}

tracing + anyhow 的最佳实践组合#

1
use tracing::instrument;
2
use anyhow::{Context, Result};
3

4
#[instrument(err)]  // err 属性：自动记录返回的 Err
5
async fn fetch_user(id: u64) -> Result<User> {
6
    let resp = http_client
7
        .get(&format!("/users/{}", id))
8
        .send()
9
        .await
10
        .context("HTTP request failed")?;  // context 附加位置信息
11

12
    let user: User = resp.json().await
13
        .context("JSON deserialization failed")?;
14

15
    Ok(user)
16
}

#[instrument(err)] 会自动在函数返回 Err 时以 ERROR 级别记录错误消息。配合 context()，每个 ? 都附加了”这一步在做什么”的信息。

错误转换的常见反模式#

反模式 1：From 实现中丢弃信息#

1
// 反模式：From 丢弃了原始错误
2
impl From<io::Error> for AppError {
3
    fn from(err: io::Error) -> Self {
4
        AppError::Io(err.to_string())  // 丢失了 io::Error 的 source chain！
5
    }
6
}
7

8
// 正确：保留 source
9
impl From<io::Error> for AppError {
10
    fn from(err: io::Error) -> Self {
11
        AppError::Io(err)  // io::Error 实现了 Error，source chain 完整
12
    }
13
}

反模式 2：过度嵌套的错误枚举#

1
// 反模式：俄罗斯套娃
2
pub enum AppError {
3
    Db(DbError),
4
    Cache(CacheError),
5
    Auth(AuthError),
6
}
7

8
pub enum DbError {
9
    Pool(PoolError),
10
    Query(QueryError),
11
}
12

13
pub enum PoolError {
14
    Timeout(TimeoutError),
15
    Connection(ConnectionError),
16
}
17
// 调用者需要 match AppError::Db(DbError::Pool(PoolError::Timeout(...)))

正确：扁平化 + context

1
use anyhow::{anyhow, Context, Result};
2

3
fn do_thing() -> Result<()> {
4
    db_query()
5
        .context("database query failed")?;  // anyhow 自动保留 source chain
6
    Ok(())
7
}

反模式 3：在 From 中做格式化#

1
// 反模式：From 中的格式化是死代码——直到实际转换时才执行
2
impl From<io::Error> for AppError {
3
    fn from(err: io::Error) -> Self {
4
        AppError::Other(format!("I/O error occurred: {}", err))  // 不必要的分配
5
    }
6
}
7

8
// 正确：让 Display 做格式化
9
#[derive(Error, Debug)]
10
pub enum AppError {
11
    #[error("I/O error: {0}")]
12
    Io(#[from] io::Error),  // thiserror 自动生成 Display，零运行时开销
13
}

反模式 4：对同一个错误类型实现多个 From#

1
// 多个 From 实现可能冲突，导致 ? 推导歧义
2
impl From<io::Error> for AppError { ... }
3
impl From<io::Error> for OtherError { ... }
4
// 如果函数返回类型不明确，? 操作符会不知道该用哪个

如果你的错误类型同时出现在多个上下文中，用显式转换而非 From：

1
let result = operation()
2
    .map_err(|e| AppError::Io(e))?;  // 显式，而非 ? 隐式转换

实战经验总结#

库的错误类型是 API 契约：non-exhaustive、抽象底层、按调用者关心的事划分
不要透传底层错误类型：抽象掉实现细节，用 Box 或 String 保留信息
应用用 anyhow + context：每一步 ? 都用 context() 附加语义
展示与类型分离：同一个错误，CLI/HTTP/日志有不同的展示策略
tracing span 是错误的最佳上下文：#[instrument(err)] + context() 是黄金组合
From 实现要保留 source chain：不要在 From 中 to_string() 丢弃原始错误
错误层级不超过 3 层：底层 → 中间库 → 应用，再多就开始失控

音乐

音乐