Event driven system – shortcuts don’t pay off

Code example & diagrams: Github – procedures container

Setting up the trap

Some time ago I’ve encountered an interesting programming problem. It was a C++ project, one of the bigger ones. It had an event driven architecture. When implementing some feature a little trap caught my eye. The part of the code that was troublesome was responsible for such scenario:

We have an entity that have some resources. We want to delete the entity and those resources. For deletion of a single resource we have a specialized procedure that communicates with some remote storage.

We have a container on an application layer that can store those releasing procedures. On deletion of a resource we create such procedure, start it and we pass a callback that will be invoked on completion.

This is a simplified description. However – all the important elements of the problem are in place. The class diagram that shows the actors of this interaction could look like this:

C++ class design diagram

Fig 1. Class diagram

The problematic part was at the completion of ResourceDeleter procedure. How should we remove the finished procedure from the container?

Falling into the trap

There were some ideas, one of which looked innocent and elegant at first sight:

We will add a little piece of code to the callback in the ResourceDeleterContainer, that will erase the ResourceDeleter from the container.

In code it looked something like this:

struct ResourceDeleterContainer
{
    ...
    void startDeletion(Callback CALLBACK)
    {
        auto deleter = createAndStoreDeleterWithId(id);
        auto callback = [this, id = id, CALLBACK]()
        {
            CALLBACK();
            resourceDeleters.erase(id);       
         };
            deleter.startDeletion(callback);
        }
    }
    ...
};

Fig 2. Sample Code

From the first sight it looked kinda neat, however that was a ticking bomb that could explode anytime. Why? We create a situation where we could potentially use an object that is already deleted. There are several factors that cover this fact. Callback structure is a bit hard to follow. We see the code of the lambda in the ResourceDeletersContainer context. This automatically ensures as that this is the place where we can safely destroy the ResourceDeleter. We don’t think about the context in which we execute the code from this lambda expression. All of this shades the real situation in which we land after erasing the ResourceDeleter.

The situation is – we will execute the callback from the ResourceDeleter context. This object doesn’t know anything about what is happening in the callback – it just executes it when the job is done. After executing the callback we’re back in the code of just removed ResourceDeleter! This situation is shown on the flow diagram in Fig 3.

Fig 3. Diagram of a troublesome flow

You can download a sample code from . It is a proof of concept, so it does not have proper messaging built in. It works on very simple classes just to show the gist of the problem.

When using GDB I set the breakpoint at part 11. We can easily see that we are executing the callback, that removes the ResourceDeleter from the ResourceDeleter context. Look at Fig 4.

Fig 4. Callstack from the troublesome flow

Fig. 5 Segfault

In Fig. 5 we can see the segfault when accessing the resource in ResourceDeleter. What would happen if we didn’t access this resource? This is the tricky part. The problem would probably not show. The program would carry on. Development continues. Until someone writes a piece of code which access this resource. He would have a hell of a surprise there. Hopefully regression should discover it. Nevertheless, this design is a ticking bomb.

Trap Disarm

Different structure

One of the possible solutions would be to change the ownership structure in the class design. We only need to do keep this container clean from the procedures that are finished, because this container is at application level. Thus – it won’t be destroyed for the lifetime of an application.

However if we made the container a part of some procedure that we destroy after completion – we would have an automatic, virtually free removal of the procedures in the container. That seems to be a logical way to approach things. But sometimes there are other factors in play, which make such changes impossible or just impractical to introduce. Therefor we need to find another way.

Defer deletion to proper time

We’re going to focus on another type of solution. We will try to remove the ResourceDeleter from the proper context – ResourceDeleterContainer.

First we need to make sure that the ResourceDeleter is no longer on the callstack. In Fig. 4 we see that in order to do that we need to get back to the MsgDispatcher. ResourceDeleter is invoked by the callback that was registered in MsgDispatcher. We need to wait for the end of the handling and return to MsgDispatcher. After that, we no longer have ResourceDeleter on the callstack – we can safely delete the object.

The question is – how? We somehow have to invoke the code from the ResourceDeleterContainer. We will do it in the same way, we’ve done with ResourceDeleter. ResourceDeleterContainer will register for special kind of messages that will indicate that particular ResourceDeleter has finished its work. Handling of this message will remove the ResourceDeleter from the container.

We’ve got handling covered, but who should send the message? There are two options.

  1. ResourceDeleter could send it when the work is done. Disadvantage is that this class now will act a bit as an Observable. It knows that on some occasions it has to send some notifications.
  2. ResourceDeleterContainer can prepare a similar callback as before, however instead of removing the ResourceDeleter it would send the message to the MsgDispatcher.

I like the second option better. In a sequence diagram this solution looks like this:

Fig. 6 Flow with safe removal of ResourceDeleter

This way is a bit more complicated, than the previous, danger one. Nevertheless – we avoid working on deleted object.

(Un)Safe grounds

This is part of the charm of working with callbacks – it is hard to track who, where and why executes a piece of code. We should always be very careful and cautious when working on such code. Passing references to lambdas, this pointers is potentially dangerous. We can be surprised by dangling reference or pointer when we least expect it.

This is doable, most systems have to deal with it in one way or another. It is easier when we have crystal clear ownership hierarchy. Solid regression also can make things easier. We just cannot cut corners, because as the title says:

Shortcuts don’t pay off 🙂

Leave a Reply

Your email address will not be published. Required fields are marked *