# Random thoughts on topological sort, backpropagation and stuff you learn in school

I learned the topological sort algorithm about 10 years ago in my undergrad, and for me it was pretty bizarre. I remember it was in one of the Introduction to Algorithms modules, where we were dealing with algorithms for graphs, like Binary search trees, RB trees and so on. Then suddenly, out of nowhere, we got to learn topological sort – an algorithm for sorting partially ordered sets.

Erhh.. what sets were that?

I mean I got RB tree because it is beautiful, and even though you probably won’t need to directly implement it at all in your whole career, learning the thing will improve your algorithmic mindset. But giving an order to a set of things which might not have a well-defined total order? Sorry but how useful it is gonna be?

Turns out the algorithm is neither fancy nor beautiful. Like most of other algorithms in textbooks, it is beautiful to some degree, but in a module where they taught you algorithms, it was difficult to pay any particular attention to topological sort alone.

Now in retrospect, I think topo sort is the single algorithm from those modules that I have been using most often. Okay, maybe I have been using quick sort more often, but I don’t need to re-implement it (except in interviews, which don’t count).

Imagine you want to run backpropagation in a neural network of multiple layers. It would be easy if your network is a simple chain of layers, but if you allow a layer to have multiple inputs (something like multi-modal nets), then how are you going to do it in a generic way?

Another example is when you have a pipeline of stuff, let it be a business workflow, a data flow, etc… where every operation might depend on other operations. The generic way to figure out an order to execute those operations is via topological sort.

If you have been using Tensorflow, then yes, they implement it for the graphs. In tensorflow, there are many different kinds of connection, an Op can be input to other Ops, or its execution simply needs to be happens before some other Ops (when some variables need to be initialized, for instance), but the general idea doesn’t change.

If there is any textbook algorithm that I am re-implementing again and again, I believe it is topological sort, and I couldn’t be more thankful to the professors who taught me this thing (fundamentally).

In fact, I realize it’s super difficult to fully anticipate the beauty and potential applications of stuff they taught you when you were in school. I am pretty sure most of them are useless (like I got to learn XML databases and object oriented databases somewhere in my education), but some will turn out to be unexpectedly valuable at some point in your career. This applies to many other things you learn outside of school as well. The trick is to know what to learn to maximize the profit in the long run.

Unfortunately I don’t know the answer to that. My strategy so far is to learn as much as I could, from as many people as I could. But we will see how far that will work (although I think it wouldn’t work for too long).

Before you ask, the next most popular textbook algorithm I have been using is state machines. When I was implementing my toy compiler around a decade ago, I didn’t imagine I would need to do it again and again for various kinds of parsers I had been doing.

And of course, Viterbi, I had some good time playing around with various aspects of it.