Implement the Depth-First Search Algorithm

I think there is a discrepancy in the “Understanding Graphs and Trees” theory I just want to confirm before opening an Issue.

https://www.freecodecamp.org/learn/python-v9/lecture-understanding-graphs-and-trees/how-do-depth-first-and-breadth-first-search-work

In the “Depth-First Search (DFS)” section:

Step 2:

We pop node A from the stack.

Then, we add its unvisited children, node B and node C, to the stack. We’ll add them in reverse order, C then B, so that B is on top (LIFO) and will be processed next. We also mark them as visited.

  • Stack: [C, B]
  • Visited: {A, B, C}

So in step 2, B and C are both added to the “Visited” set. However, they are at the same Depth.

If we skip to the end, step 8:

  • Stack: []
  • Visited: {A, B, C, D, E, F, G}

When the stack is empty, the traversal is completed and all nodes have been visited.

The algorithm visited the nodes in this order:

A → B → D → E → C → F → G

If B and C were added to the visited set first, how to we know that D and E were visited first?

This is a problem if I’m implementing this in the Implement the Depth-First Search Algorithm lab

https://www.freecodecamp.org/learn/python-v9/lab-depth-first-search/lab-depth-first-search

If I add items to a visited list in the order described by this algorithm, the order will be {A, B, C, D, E, F, G} , not A → B → D → E → C → F → G.

I can fix this, but it just seems strange to add B and C to visited in Step 2 if C isn’t visited yet?

From the test in the lab:

  1. dfs([[0, 1, 0, 0], [1, 0, 1, 0], [0, 1, 0, 1], [0, 0, 1, 0]], 1) should return [1, 2, 3, 0].`

Following the above algorithm, Node 1 will be added to visited, then 0 and 2 will be added. However, [1, 2, 3, 0] indicates that 0 should be visited after 3.

Am I correct the algorithm in the theory is mis-leading? Or it assumes the order of visited is tracked somewhere else and just doesn’t mention it?

EDIT: (This is all on top of the fact that a set is not the correct thing to use for the implementation since it won’t preserve order and also I noticed it sorts the numbers)

In the DFS Section:

While the stack is not empty, the current node is popped (removed). This is when we “visit” or process it (for example, by printing its value). Then, all of its unvisited neighbors are marked as visited and added to the stack.

So, a node is “visited” when it’s popped from the queue, but then all of it’s unvisited neighbours are added to “visited”.

It seems strange that it’s the same process for BFS?

While the queue is not empty, the current node is removed from the queue (dequeued). Then, for each one of its neighbors, if the neighbor has not been visited, it is marked as visited and added to the queue.

There’s also this section in BFS:

The order in which nodes at the same level are added to the queue is defined by the implementation of the data structure and the order in which the edges (connections) are stored in the graph representation.

If the implementation is consistent, the specific order in which the nodes at the same level are traversed will not affect the correctness of the algorithm. It will still visit every node level by level.

Which seems to indicate that the order that the nodes are added to the “visited” list is not necessarily the order in which they are visited?

Looking at this example

https://www.programiz.com/dsa/graph-dfs

All of node 0 adjecent nodes are not immediately added to visited

In my implementation I’m not adding unvisited children to visited and it seems to work well and the output passes the tests.

[1, 0, 1, 0] 1
visited [1]
queue [0, 2]
visited [1, 2]  #if I added all children here it would be [1, 0, 2]
queue [0, 3]
visited [1, 2, 3]
queue [0]
visited [1, 2, 3, 0]
queue []
[1, 2, 3, 0]

So, I think the theory lesson is wrong.

I’ll wait for any comments here before opening an Issue

This is nice write-down.

I don’t think visited set is used to keep track of the order of traversal. Both in BFS and DFS check if node was visited should be fast, while using list for it would get slower and slower.

The exact time when node is added to visited might depend on the implementation (I’m not sure if the algorithm actually states when that should happen), and time when the check in the set happens. Ie. if check if node was visited would be done at the start of processing node (after poping it from stack), then adding nodes to visited when they are discovered (and they are added to stack) would prevent algorithm from going anywhere.

Similarities between DFS and BFS in text are not that surprising. Essentially the difference in implementation between the two could be simplified to using different order - order based on stack (DFS) or queue (BFS).

1 Like

I can see this being the case. We are just throwing it into visited preemptively because now that it’s discovered, and in the queue, we know that it will be visited eventually.

However, is the final order in the output important? If it is, then the algo MUST keep track of the order and it’s not clear how it does that in the theory example.

In the lab, the output order seems to be important?
2. dfs([[0, 1, 0, 0], [1, 0, 1, 0], [0, 1, 0, 1], [0, 0, 1, 0]], 1) should return [1, 2, 3, 0].

If I put it into a set this will not be the output, even if the traversal order is correct. The required output is also a list.

So it needs to be tracked separately, which will eliminate the performance gain of the set. I could store visited as a set but then I’ll also need a list or dict to keep track of the order.

If order isn’t important then the test should just look for a set of the required nodes?

Or is this lab just a specific requirement so we can test traversal order?

In the pseudo code examples here each node seems to be marked as discovered one at a time as it is processed:
https://en.wikipedia.org/wiki/Depth-first_search

Here as well https://www.programiz.com/dsa/graph-dfs

In my own lab implementation I used a list (as that is the required output) and did not add it to visited until they are popped from the queue.

My implementation

def dfs(matrix: list, node: int) -> list:
    print(matrix[node], node)
    queue = [node]
    visited = [node]

    while queue:
    # for _ in range(5):

        pop = queue.pop()
        
        if pop not in visited:
            visited.append(pop)
        
        for i in range(len(matrix[pop])):
            if matrix[pop][i] and i not in queue and i not in visited:
                queue.append(i)

        print("visited", visited)
        print("queue", queue)
        
        
    return visited


I guess in the end, I was trying to implement the lab by using the theory as a guide and it did not seem harmonious to me. I started off using a set, as it is in theory, but then discovered this wasn’t going to work for the required output.

I don’t know what were the specific learning goals for this lab, so I can’t comment on that. Polishing some wording to make it clearer always would be good.

For the usage of the both algorithms, depending on the specific case, keeping track of the actual order (Ie. going from A first to B, instead of C) might not matter at all. As long as the algorithm itself visits nodes in desired - depth or breadth - order.

1 Like

Yes, this makes sense for me, thanks. I think it still makes the theory and the lab at odds with each other if you try to implement the algorithm directly from the lab notes.

Maybe a note like this in the lab:

“Use a list for visited to preserve traversal order. Only add a node to visited when it is removed from the queue.”

I think this accounts for the differences in the theory and the requirement to preserve traversal order.

Thanks for your help talking this through!

EDIT: I notice here as well that the theory section says we might indicate the traversal order by printing the node as it’s popped from the queue

While the stack is not empty, the current node is popped (removed). This is when we “visit” or process it (for example, by printing its value).

This seems to indicate that the visited set isn’t meant to track traversal order as well.