Skip to content

Conversation

@lh2debug
Copy link

@lh2debug lh2debug commented Nov 6, 2025

What problem does this PR solve?

Issue Number: resolve #3068

Problem Summary:

Span lifecycle management defect in distributed storage system:

Problem Scenario:

  • Server maintains a parent span for client Write requests
  • Two child spans are created for append_entries RPCs to followers
  • When the first follower responds, the parent span is prematurely destroyed along with all child spans
  • When the second follower responds, it attempts to access the already-freed child span, causing use-after-free

Root Cause:
a. Premature deallocation: Parent span destroyed while child spans still in use
b. Dangling pointer: Response callback accesses freed span objects

Reference counting relationship

  • Parent span holds strong references to child spans, while child spans hold weak references to parent span.
  • RPC Done holds a strong reference to the parent span (i.e., server span), ensuring the reference count is released only after trace recording is completed when RPC Done executes.
  • SpanContainer holds a strong reference to the parent span, ensuring the reference count is released only after the background collector thread completes dumping the trace to the database.
  • Controller holds weak references to parent span/child spans, which does not affect the lifecycle of related spans while avoiding access to dangling pointers.
graph LR
    subgraph "Parent Span (WriteChunk)"
        PS["Parent Span<br/>ref_count = 2<br/>trace_id: 12345<br/>span_id: 67890"]
    end
    
    subgraph "强引用持有者 (shared_ptr)"
        RD["RPC Done Callback<br/>SendRpcResponse<br/>shared_ptr<Span>"]
        SC["SpanContainer<br/>后台collector<br/>shared_ptr<Span>"]
    end
    
    subgraph "弱引用持有者 (weak_ptr)"
        PC["Controller<br/>_span<br/>weak_ptr<Span>"]
        CS1["Child Span 1<br/>_local_parent<br/>weak_ptr<Span>"]
        CS2["Child Span 2<br/>_local_parent<br/>weak_ptr<Span>"]
    end
    
    subgraph "Child Spans"
        CS1_DETAIL["Child Span 1<br/>ref_count = 1<br/>append_entries to Follower1<br/>trace_id: 12345<br/>span_id: 67891"]
        CS2_DETAIL["Child Span 2<br/>ref_count = 1<br/>append_entries to Follower2<br/>trace_id: 12345<br/>span_id: 67892"]
    end
    
    %% Parent Span的强引用
    RD -->|"+1 ref_count"| PS
    SC -->|"+1 ref_count"| PS
    
    %% Parent Span的弱引用
    PC -.->|"不增加ref_count"| PS
    CS1 -.->|"不增加ref_count"| PS
    CS2 -.->|"不增加ref_count"| PS
    
    %% Parent Span持有Child Spans
    PS -->|"_client_list<br/>shared_ptr<br/>+1 ref_count"| CS1_DETAIL
    PS -->|"_client_list<br/>shared_ptr<br/>+1 ref_count"| CS2_DETAIL
    
    %% Child Spans的弱引用
    CS1_DETAIL -.->|"_local_parent<br/>weak_ptr"| PS
    CS2_DETAIL -.->|"_local_parent<br/>weak_ptr"| PS
    
    %% Controller对Child Spans的弱引用
    C1["Controller 1<br/>weak_ptr"] -.-> CS1_DETAIL
    C2["Controller 2<br/>weak_ptr"] -.-> CS2_DETAIL
    
    %% 样式
    classDef parentSpan fill:#e1f5fe,stroke:#01579b,stroke-width:4px
    classDef childSpan fill:#f3e5f5,stroke:#4a148c,stroke-width:3px
    classDef strongRef fill:#e8f5e8,stroke:#2e7d32,stroke-width:3px
    classDef weakRef fill:#fff3e0,stroke:#e65100,stroke-width:2px
    
    class PS parentSpan
    class CS1_DETAIL,CS2_DETAIL childSpan
    class RD,SC strongRef
    class PC,C1,C2 weakRef
Loading

What is changed and the side effects?

Changed:

Side effects: NO

  • Performance effects: NO

  • Breaking backward compatibility: NO


Check List:

@yanglimingcn
Copy link
Contributor

When the first follower responds, the parent span is prematurely destroyed along with all child spans
Because you just need to wait the first response?

@lh2debug
Copy link
Author

lh2debug commented Nov 6, 2025

Yes. For example, in the Raft protocol, there is one leader and two followers. The leader only needs to wait for the first follower to complete before returning.

KeyTable* keytable;
void* assigned_data;
void* rpcz_parent_span;
std::weak_ptr<brpc::Span> rpcz_parent_span;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you 'd better not use brpc types in the bthread code.
In the brpc design, bthread is a basic library, and brpc is built on top of it. So bthread should no depend on brpc. This is why the original type is void*, not brpc::Span*.
My suggestion: you can keep void* rpcz_parent_span here, and in the brpc code, you can new a std::weak_ptr<brpc::Span>, then cast the pointer std::weak_ptr<brpc::Span>* to void*. This may be more complicated, but keep the brpc design clean.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@wwbmmm
Copy link
Contributor

wwbmmm commented Nov 6, 2025

@lh2debug Thank you for your contribution! This solve a very important issue. You have done a good job!

Currently I have two concerns:

  1. This implementation add some locks and shard_ptr,it may affect performance. Have you compare its performance with the original version?
  2. This implementation change some internal APIs (such as ControllerPrivateAccessor::span(). Although these will not be used by most brpc user, they may be used by some advance users, who have extended brpc functions (for example, add new protocols). So this PR will make a incompatible upgrade to these users. Maybe you can try to add some macros to controll this feature, for example, you can define alias types like SpanSharedPtr and SpanWeakPtr, when BRPC_ENABLE_SHARED_SPAN is on, they are std::shared_ptr<Span> and std::weak_ptr<Span>, when the flag is off, they are Span*.

@lh2debug
Copy link
Author

lh2debug commented Nov 6, 2025

Thank you for your suggestion. I will consider the above optimizations. @wwbmmm

@yanglimingcn
Copy link
Contributor

maybe we can add a reference counter in span, There may be much less code modification.

@lh2debug
Copy link
Author

lh2debug commented Nov 7, 2025

Thank you for your answer. This approach of implementing reference counting internally within a Span seems impractical.

Because reference Counting with void* rpcz_parent_span Fails Across Bthreads:
Raw pointers in LocalStorage cannot detect when the referenced Span is destroyed in another bthread. When a parent Span is deleted (after its reference count reaches zero), child bthreads still hold the same pointer value pointing to freed memory, causing use-after-free. Each Span's reference count only tracks its own operations and cannot prevent destruction when other bthreads still reference it through raw pointers.

Smart pointers solve this via control blocks that survive object destruction, allowing safe lifetime detection across threads.

@yanglimingcn

@yanglimingcn
Copy link
Contributor

Root Cause:
a. Premature deallocation: Parent span destroyed while child spans still in use
b. Dangling pointer: Response callback accesses freed span objects

The span information is a tree structure, and this relationship is maintained through the ref information. For example, ParentSpan initially has ref=1, and two ClientSpans are created, so ref becomes 3.

When one ClientSpan returns, ref becomes 2; when the other ClientSpan returns, ref becomes 1. At this point, it can be destructed.

Because a single call might be split into a tree, the bthreads generated by this call should follow the lifecycle principle of this call chain, which should be fine.

@yanglimingcn
Copy link
Contributor

I believe that spans are suitable for requesting related information and should not be more closely associated with threads or coroutines; they simply use thread-local storage to pass variables in their implementation.

@wwbmmm
Copy link
Contributor

wwbmmm commented Nov 7, 2025

The span information is a tree structure, and this relationship is maintained through the ref information. For example, ParentSpan initially has ref=1, and two ClientSpans are created, so ref becomes 3.

When one ClientSpan returns, ref becomes 2; when the other ClientSpan returns, ref becomes 1. At this point, it can be destructed.

Let me complement, the first reference to the ParentSpan is held by tls, when the bthread exit, ref_count will -1, when the ref_count become 0 (only after all ClientSpan return and the bthread exit) , it can be destructed.

Moreover, when a new bthread with BTHREAD_INHERIT_SPAN is created, the ref_count will +1; when this bthread exit, the ref_count will -1.

@lh2debug
Copy link
Author

lh2debug commented Nov 7, 2025

  1. In LocalStorage, rpcz_parent_span maintains a void* type. The span layer provides functions to manage the weak_ptr pointer using new/delete and assign it to rpcz_parent_span. I've considered and tested this approach, and it seems feasible.

  2. Regarding the method of embedding reference counting within the span:

a) It's more complex in principle and implementation than the smart pointer method. Imagine a Raft scenario: one server span creates two bthreads. Each bthread context initiates an RPC (i.e., creates a client span). That is, the server span is the parent of the bthread span, and the bthread span is the parent of the client span. In this case, manually adding references/releasing references to the server span, bthread span, and client span is much more complex than the smart pointer method.

b) It's not flexible enough for business trace collection. For example, if the business wants to collect some trace information in a background thread outside the brpc context, the smart pointer method simply passes the weak_ptr as a function parameter to the function of that background thread. When the background thread wants to collect traces, it only needs to use weak_ptr.lock (if the lock succeeds, the collection is successful; if the lock fails, it means the RPC has ended and collection is no longer needed). Implementing this logic manually using reference counting is not easy (it requires elegant manual add ref and release operations).

@lh2debug
Copy link
Author

lh2debug commented Nov 7, 2025

  1. Based on our system performance tests, the implementation using smart pointers showed minimal performance impact.

@lh2debug
Copy link
Author

lh2debug commented Nov 7, 2025

  1. To ensure compatibility with advanced clients who modify their protocols, using the BRPC_ENABLE_SHARED_SPAN macro to control the use of Span* or std::weak_ptr<Span> in the ControllerPrivateAccessor layer can resolve compilation issues. However, the Span* approach may still cause crashes in certain scenarios.

If the implementation follows the BRPC_ENABLE_SHARED_SPAN macro control, could it mislead them into believing that the Span* approach is stable and reliable for performance analysis?

@wwbmmm
Copy link
Contributor

wwbmmm commented Nov 7, 2025

  1. To ensure compatibility with advanced clients who modify their protocols, using the BRPC_ENABLE_SHARED_SPAN macro to control the use of Span* or std::weak_ptr<Span> in the ControllerPrivateAccessor layer can resolve compilation issues. However, the Span* approach may still cause crashes in certain scenarios.

If the implementation follows the BRPC_ENABLE_SHARED_SPAN macro control, could it mislead them into believing that the Span* approach is stable and reliable for performance analysis?

We can add some notes in the document and code comment. I think that's OK.
Moreover, it's not only for ControllerPrivateAccessor, if possible, please apply this approach to each public method you changed.

@lh2debug lh2debug force-pushed the master-lh2-fix-3068 branch 4 times, most recently from 83d573d to 629d247 Compare November 10, 2025 07:01
@lh2debug lh2debug changed the title Fix span lifecycle with smart pointers to prevent use-after-free in a… WIP: Fix span lifecycle with smart pointers to prevent use-after-free in a… Nov 10, 2025
@yanglimingcn
Copy link
Contributor

Yes. For example, in the Raft protocol, there is one leader and two followers. The leader only needs to wait for the first follower to complete before returning.

Under this implementation logic, the span function still only counts the time taken for the first response, right?

@lh2debug
Copy link
Author

Yes. For example, in the Raft protocol, there is one leader and two followers. The leader only needs to wait for the first follower to complete before returning.

Under this implementation logic, the span function still only counts the time taken for the first response, right?

Yes, in this scenario, when the server span is committed, only the first client span (i.e., only the terminated child span) will be counted; the second ongoing client span will not be counted.

When the server span is destroyed, the reference counts of the first and second client spans will be canceled. Shortly afterward, the reference counts of these two client spans will be reduced to 0, and they will be garbage collected.

@lh2debug lh2debug force-pushed the master-lh2-fix-3068 branch 2 times, most recently from 55b628d to 817e972 Compare November 10, 2025 10:05
@lh2debug
Copy link
Author

@wwbmmm @yanglimingcn The code for bthread that doesn't depend on the brpc module (i.e., rpcz_parent_span remains void*) has been modified and tested. Please take a look, tkx~

@lh2debug lh2debug force-pushed the master-lh2-fix-3068 branch 3 times, most recently from 67b67d0 to 241e6f3 Compare November 11, 2025 05:58
@lh2debug lh2debug changed the title WIP: Fix span lifecycle with smart pointers to prevent use-after-free in a… Fix span lifecycle with smart pointers to prevent use-after-free in a… Nov 11, 2025
os, span.sent_real_us(),
&last_time, extr, ARRAY_SIZE(extr))) {
os << " Responded(" << span.response_size() << ')' << std::endl;
&last_time, extr, ARRAY_SIZE(extr), &span)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please post some of the display information?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As shown below, the type of span (server span, client span, bthread span) is indicated at the beginning to facilitate troubleshooting.

Fetching new trace TRACE#B from HOST#X

Received request REQ#65674 from HOST#Y via PROTO#1 log=0 trace=TRACE#B span=SPAN#S1

[ServerSpan span=SPAN#S1] Processing the request in a new worker
[ServerSpan span=SPAN#S1] ParseStats[cut=?, queue=?, worker=?, rpc=?]
[ServerSpan span=SPAN#S1] Enter Ecom.OrderService.PlaceOrder

[ClientSpan span=SPAN#C1] Cart.AddItem item=ITEM# qty=QTY# price=PRICE# user=USER# session=SESSION#
[ClientSpan span=SPAN#C1] Cart.VerifyOperation item=ITEM# op=OP# user=USER# session=SESSION#
[ClientSpan span=SPAN#C1] Cart.ProposeUpdate item=ITEM# op=OP# user=USER# session=SESSION# attempt=SN#
[ClientSpan span=SPAN#C1] Orchestration.Manager: enqueue step=PlaceOrder first=I# last=I#
[ClientSpan span=SPAN#C1] Orchestration.Queue: pending step

Requesting Ecom.Consensus.Append@HOST#Z PROTO#1 call=CALL# trace=TRACE#B span=SPAN#C1
[ClientSpan span=SPAN#C1] Requested(REQ#65778) [1]
[ClientSpan span=SPAN#C1] Received response(RSP#) of request[1]
[ClientSpan span=SPAN#C1] Processing the response in a new worker
[ClientSpan span=SPAN#C1] Enter client callback

[ClientSpan span=SPAN#C1] Workflow.OnApply, ongoing tasks: 0
[ClientSpan span=SPAN#C1] TaskQueue status running=R# queues=Q# active=A#
[ClientSpan span=SPAN#C1] OrderPipeline.WriteOrder
[ClientSpan span=SPAN#C1] Payment.Authorize
[ClientSpan span=SPAN#C1] Inventory.Reserve
[ClientSpan span=SPAN#C1] Promotion.Apply
[ClientSpan span=SPAN#C1] Promotion.Apply_done
[ClientSpan span=SPAN#C1] Notification.Prepare
[ClientSpan span=SPAN#C1] Notification.Prepare_done
[ClientSpan span=SPAN#C1] Inventory.Reserve_done
[ClientSpan span=SPAN#C1] Payment.Authorize_done
[ClientSpan span=SPAN#C1] OrderPipeline.Sync
[ClientSpan span=SPAN#C1] OrderPipeline.WriteOrder done (~2.7ms)
[ClientSpan span=SPAN#C1] Latency stats: avg≈58us p90≈66us p99≈160us
[ClientSpan span=SPAN#C1] Protocol response before join
[ClientSpan span=SPAN#C1] Protocol response after join

[ClientSpan span=SPAN#C1] FulfillmentWorker: ongoing tasks=3
[ClientSpan span=SPAN#C1] FulfillmentWorker: global queue running=R# queues=Q# active=A#
[ClientSpan span=SPAN#C1] FulfillmentWorker: enter
[ClientSpan span=SPAN#C1] Batcher.append
[ClientSpan span=SPAN#C1] Batcher.flush queue_wait≈~6.7ms

Requesting Ecom.Consensus.Append@HOST#Y PROTO#1 call=CALL# trace=TRACE#B span=SPAN#C2
[ClientSpan span=SPAN#C2] Requested(REQ#65778) [1]
[ClientSpan span=SPAN#C2] Received response(RSP#) of request[1]
[ClientSpan span=SPAN#C2] Processing the response in a new worker
[ClientSpan span=SPAN#C2] Enter client callback

[ServerSpan span=SPAN#S1] Leave Ecom.OrderService.PlaceOrder
[ServerSpan span=SPAN#S1] Responded

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This format is a little bit tedious. How about [Server SPAN#S1] Processing the request in a new worker?

#if BRPC_SPAN_ENABLE_SHARED_PTR_API
return span_internal;
#else
return span_internal.get();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return Span* not hold a reference, This will cause problems, won't it?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the server span waits for the child span to complete before returning (as is the case in most business scenarios), returning a raw pointer won't cause any problems.

For example, in a Raft scenario, if the server span returns before the second follower completes (causing the second follower's client span to be destroyed), the second follower's RPC callback will retrieve the destroyed dangling pointer through ControllerPrivateAccessor::span.

RpcPBMessages* messages, const Server* server,
MethodStatus* method_status, int64_t received_us) {
MethodStatus* method_status, int64_t received_us,
std::shared_ptr<Span> span) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why SendRpcResponse need span parameter, It can be obtained from the accessor?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SendRpcResponse needs to hold a strong reference count; otherwise, if the backend spancontainer releases the reference, the server span will be unable to set_start_send_us and set_response_size.

src/brpc/span.h Outdated

class SpanContainer : public bvar::Collected {
public:
explicit SpanContainer(std::shared_ptr<Span> span) : _span(span) {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const std::shared_ptr& span ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I'll change this later.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

delete kt;
// After deletion: tls may be set during deletion.
tls_bls.keytable = NULL;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the relationship between rpcz_parent_span and key.cpp?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rpcz_parent_span is a void* protocol field (type controlled by brpc) in bthread TLS that points to weak_ptr<Span>.

key.cpp is responsible for its lifecycle triggering points (creation/destruction/end) and is decoupled from the specific implementation of brpc (span.cpp) through a global callback.

}

int bthread_set_span_funcs(bthread_create_span_fn create_fn,
bthread_destroy_span_fn destroy_fn,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be more reasonable to put this function in bthread.cpp?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your suggestion. Personally, I think it might be more appropriate to manage it under key.cpp, because span is a brpc layer concept, not a bthread concept. bthread.cpp should focus on the bthread lifecycle. key.cpp is responsible for span's lifecycle triggering points (creation/destruction/end) and is decoupled from the specific implementation of brpc (span.cpp) through a global callback.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think key.cpp is responsible for span's lifecycle triggering points.
key.cpp is for the implementation of bthread_get/setspecific, it manage keytable_pool and keytable, which has no relationship to span.
What you need is to call the span functions in the bthread lifecycle (create/end/...), the better place is in the task_group.cpp, you can get/set bthread::tls_bls.rpcz_parent_span there.
And the implementation of bthread_set_span_funcs is better placed into bthread.cpp, because most bthread interface functions are there.

if (!span) {
return NULL;
}
return new std::weak_ptr<Span>(span);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When is this newly created weak_ptr deleted? This part seems a bit obscure, and it's easy for memory leaks to occur.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My earliest version used the weak_ptr type for rpcz_parent_span within LocalStorage because your team offered a very reasonable optimization reason: bthreads shouldn't depend on brpc. Therefore, rpcz_parent_span needed to retain its original void* type. The brpc layer provides manual new/delete management of weak_ptr for the bthread layer to register and use.

Therefore, during the bthread's lifecycle, the weak_ptr lifecycle of rpcz_parent_span needs to be manually managed using new/delete (i.e., new when the bthread is created, and delete when the bthread ends).

@wwbmmm
Copy link
Contributor

wwbmmm commented Nov 13, 2025

2. Regarding the method of embedding reference counting within the span:

a) It's more complex in principle and implementation than the smart pointer method. Imagine a Raft scenario: one server span creates two bthreads. Each bthread context initiates an RPC (i.e., creates a client span). That is, the server span is the parent of the bthread span, and the bthread span is the parent of the client span. In this case, manually adding references/releasing references to the server span, bthread span, and client span is much more complex than the smart pointer method.

b) It's not flexible enough for business trace collection. For example, if the business wants to collect some trace information in a background thread outside the brpc context, the smart pointer method simply passes the weak_ptr as a function parameter to the function of that background thread. When the background thread wants to collect traces, it only needs to use weak_ptr.lock (if the lock succeeds, the collection is successful; if the lock fails, it means the RPC has ended and collection is no longer needed). Implementing this logic manually using reference counting is not easy (it requires elegant manual add ref and release operations).

@chenBright What do you think of these two approaches ?

  1. manually manage ref count of the parent (root) span
  2. use shared_ptr and weak_ptr to manage all spans like this PR

@lh2debug lh2debug force-pushed the master-lh2-fix-3068 branch from 241e6f3 to c5a8dee Compare November 14, 2025 03:20
@chenBright
Copy link
Contributor

  1. Regarding the method of embedding reference counting within the span:
    a) It's more complex in principle and implementation than the smart pointer method. Imagine a Raft scenario: one server span creates two bthreads. Each bthread context initiates an RPC (i.e., creates a client span). That is, the server span is the parent of the bthread span, and the bthread span is the parent of the client span. In this case, manually adding references/releasing references to the server span, bthread span, and client span is much more complex than the smart pointer method.
    b) It's not flexible enough for business trace collection. For example, if the business wants to collect some trace information in a background thread outside the brpc context, the smart pointer method simply passes the weak_ptr as a function parameter to the function of that background thread. When the background thread wants to collect traces, it only needs to use weak_ptr.lock (if the lock succeeds, the collection is successful; if the lock fails, it means the RPC has ended and collection is no longer needed). Implementing this logic manually using reference counting is not easy (it requires elegant manual add ref and release operations).

@chenBright What do you think of these two approaches ?

  1. manually manage ref count of the parent (root) span
  2. use shared_ptr and weak_ptr to manage all spans like this PR

Although we used approache 1 before, I now prefer approache 2 because it offers better maintainability and extensibility after being refactored using smart pointers.

os, span.sent_real_us(),
&last_time, extr, ARRAY_SIZE(extr))) {
os << " Responded(" << span.response_size() << ')' << std::endl;
&last_time, extr, ARRAY_SIZE(extr), &span)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This format is a little bit tedious. How about [Server SPAN#S1] Processing the request in a new worker?

}
return 0;
}
uint64_t Controller::span_id() const {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a new line

#if BRPC_SPAN_ENABLE_SHARED_PTR_API
accessor.set_span(span);
#else
accessor.set_span(span.get());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can add an overloaded function set_span(std::shared_ptr<Span> span),
so that you don't need to use BRPC_SPAN_ENABLE_SHARED_PTR_API everywhere.

if (ls.rpcz_parent_span) {
delete static_cast<std::weak_ptr<Span>*>(ls.rpcz_parent_span);
}
ls.rpcz_parent_span = new std::weak_ptr<Span>(span);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know why you need to delete and new the weak_ptr, can you directly reset the weak_ptr like *ls.rpcz_parent_span = span?

using namespace bthread;
LocalStorage ls = BAIDU_GET_VOLATILE_THREAD_LOCAL(tls_bls);
if (ls.rpcz_parent_span) {
delete static_cast<std::weak_ptr<Span>*>(ls.rpcz_parent_span);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you don't need to delete here, just ls.rpcz_parent_span->reset()

}

int bthread_set_span_funcs(bthread_create_span_fn create_fn,
bthread_destroy_span_fn destroy_fn,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think key.cpp is responsible for span's lifecycle triggering points.
key.cpp is for the implementation of bthread_get/setspecific, it manage keytable_pool and keytable, which has no relationship to span.
What you need is to call the span functions in the bthread lifecycle (create/end/...), the better place is in the task_group.cpp, you can get/set bthread::tls_bls.rpcz_parent_span there.
And the implementation of bthread_set_span_funcs is better placed into bthread.cpp, because most bthread interface functions are there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

span.cpp中destroy()方法存在内存重复释放问题,会导致服务崩掉

4 participants