ACCorner: PWPII

Programming Without Problems II: When Some Problems Arise

$A \lor B \neq B \lor A$

In the first entry of Programming Without Problems I said possible bug spots are induced when the intended use does not match the possible uses. Moreover I hinted a source of problems with delems_v2 if the predicates were rearranged. It turns out that this property should be a warning flag because the clauses that make up a predicate are combined using the logical or. Hence, if the clauses can not be interchanged, it means we are not respecting or's commutativity. Let's see the first example:


         delems_dwalk(A,Acc,E):-
         	dwalk(A,[A],B),
         	\+ member(B,Acc),
         	delems_dwalk(A,[B|Acc],E).
         delems_dwalk(_,Acc,Acc).

Let's assume we have the d/2 database composed of the two facts: d(a,b), d(b,c). If we execute delems_dwalk(a,[],E). Prolog will answer: E = [c,b] ?. The ? at the end of the answer means there might be more answers that can be found with backtrack. To ask for the next possible answer we use ;. If we ask for the next answer Prolog will output E = [b] ?. And this is clearly wrong, how did this happen? To be complete, these are all the answers Prolog can find: E = [c,b], E = [b], E = [b,c], E = [c], E = [].

Let's break out how the program is executed to develop a better understanding. The first line asks Prolog to unify the variable B in the goal dwalk(a,[a],B). Prolog searches in the database, presumably from top to bottom and finds the first fact d(a,b). Hence it unifies B with b. However, there are more possibilities to explore in dwalk(a,[a],B), Prolog just remembers that position and continues to execute delems_dwalk/3. The next line checks if b is not in [] which is true and moves on to execute the recursive call: delems_dwalk(a,[b],E). At this new call the variable B of the call dwalk(a,[a],B) gets again unified with b, however, this time the goal \+member(b,[b]) will fail, triggering a backtrack of the call dwalk(a,[a],B). This time Prolog will try to unify d(a,B) with another fact, but such fact does not exists. Hence, Prolog explores the second part of dwalk/3 and unifies B in dwalk(a,[a],B) with c. The execution of delems_dwalk(a,[b],E) resumes, and since c is not in [b] it moves on to the recursive call delems_dwalk(a,[c,b],E). At delems_dwalk(a,[c,b],E), for all possible instances of B in the call dwalk(a,[a],B), the goal \+member(B,[c,b]) fails resulting on the failure of delems_dwalk(a,[c,b],E). Prolog explores the second clause of delems_dwalk/3. And unifies _ with a and Acc with [c,b] resulting in the unification if E with [c,b].

So far things look good and just as we laid them out in "Programming Without Problems" but, what about if we ask Prolog to give us another answer? Well, it will backtrack to the previous delems_dwalk/3 call that has unexplored goals, that is the call delems_dwalk(a,[b],E). Since when the call delems_dwalk(a,[b],E) was made the first clause was successful, now Prolog will explore the second clause and will unify _ with a and Acc with [b], resulting in the unification of E with [b]. If we ask Prolog to give us more answers, then it will backtrack to the call delems_dwalk(a,[],E) but at this point dwalk/3 has unexplored paths, so, instead of unifying B with b, the resulting unification is B with c. Prolog then will follow the execution of delems_dwalk(a,[c],E) and finally give the answer E = [b,c]. Again, if we ask for more answers, Prolog will backtrack and remove b from [b,c] and give the answer E = [c]. Finally, if we ask Prolog one more time, it will backtrack to the root call delems_dwalk(a,[],E) and unify E with [].

$A \lor B = B \lor A$

We can argue that these problems only arise if we ask Prolog for more answers, and if we don't then we should be ok. This is true, however, we would not be able to build anything on top of delems_dwalk/3 because backtrack could be necessary, much like dwalk/3 in delems_dwalk/3. But even if we don't build anything on top of delems_dwalk/3 we can not use the predicate to check if an answer is correct. For example if we execute delems_dwalk(a,[b,c],[b,c]) Prolog will answer yes which is what we expected but if we also execute delems_dwalk(a,[this,is,wrong],[this,is,wrong]) Prolog will answer yes. All because of that second clause. So, it is a good idea to fix this, and recover the or's commutativity.

Can we use the specification to help us? We better use it, what's the benefit of having a specification if we can not rely on it when things get tough? So, let's bring delems_dwalk/3 specification: delems_dwalk(A,V,E) is true if $\forall e \in 𝐃 (d), dwalk(A,[A],e)$ iff $e \in E$ Let's rewrite the specification: If $\forall e \in 𝐃 (d), dwalk(A,[A],e)$ iff $e \in E$ then delems_dwalk(A,V,E) is true. Now we can clearly see the specification is too weak. It only conditions the success of the program when the answer is given: dwalk(A,[A],e) iff $e \in E$ but not the inverse, that is: every time the program succeeds it is because the answer is correct. Also the V variable, that we eliminated in subsequent specifications, doesn't play a role and surely it should; when we eliminated it we replaced it by the call delems_dwalk(A,[],E), that is, we defined delems_dwalk(A,E):- delems_dwalk(A,[],E). But as we have seen, this does not ensure the correctness of delems_dwalk then we should work with the specification of delems_dwalk/3.

We made a mistake, our specification of delems_dwalk was too weak, we can make it stricter and then use that to correct our code. Most of the required elements are already present in the specification, that is, the characterization of E, we just need to make sure that E always gets filled with correct things. We can rewrite the specification as: delems_dwalk(A,V,E) is true iff $V, E arelists, V is a prefix of E \land (\forall e \in 𝐃 (d), A ~ e iff e \in E)$ , where $A ~ e$ is a dwalk from A to e. With this new specification now we can argue that out implementation of delems_dwalk/3 is wrong. How do we fix it?

By construction, the first clause of delems_dwalk/3 is correct: the aim is to find and accumulate the elements that can be reached from A. At this point we should remember that Acc's role in the current version of delems_dwalk/3 is to be an exclusion list: there are no constraints on its initial value. We need to change this role from exclusion list to a prefix of E. However, we can not do this if we don't know E before hand, which is the list we want to calculate in the first place. This might seem like an egg-chicken problem. To break this circularity, we just need to make sure that at the time we unify Acc with E, Acc is exactly E. That is, we need to implement the specification and we can do that by checking two properties: first that for every element B in Acc dwalk(A,[A],B) is true, second, that we can not find a B such that dwalk(A,[A],B) and that is not in Acc. In other words:


         delems_dwalk(A,E,E):-
               forall(member(B,E),dwalk(A,[A],B)),
               \+((dwalk(A,[A],B),
                  \+member(B,E))).

This clause combined with the first clause yields an implementation that fulfills our stronger specification. The forall predicate is a SICStus Prolog predicate that succeeds when the second argument can be proved for all instances of the first argument.

From I think to _∎

It is important to point out the process we followed to fix delems_dwalk/3. Once we found issues with the predicate, we consulted the specification and work from it. That is, we followed a top-down approach, once we realized the specification was too weak, we strengthen it and proceeded to adjust the program. The usual way is to work down-top, in other words to 'fix' the code and most of the time we stop there. We could have 'fixed' the code using a Prolog predicate called 'cut', !, as follows:


         delems_dwalk(A,Acc,E):-
         	dwalk(A,[A],B),
         	\+ member(B,Acc),!,
         	delems_dwalk(A,[B|Acc],E).
         delems_dwalk(_,Acc,Acc).

The cut prunes the backtrack search committing to a search path, in our case once we know B is not in Acc we can forget about the other possibilities. This approach would fix our issue of backtracking once the first answer is given with a call of the form delems_dwalk(A,[],E). It not solve instances of the form delems_dwalk(a,[this,is,wrong],[this,is,wrong]) If we don't happen to test for those kind of instances, we would end up with an incorrect implementation, and more over with an implementation out of sync with the specification, rendering our approach useless.

Now we must try to prove that our new version of delems_dwalk/3 fulfills our stronger specification. Let's show the first direction: if delems_dwalk(A,V,E) is true then $V, E arelists, V is a prefix of E \land (\forall e \in 𝐃 (d), A ~ e iff e \in E)$ . First we can notice that for delems_dwalk/3 to succeed, the first clause must fail and the second succeed. If the first clause doesn't fail then delems_dwalk/3 is recursively called an infinite number of times. Since the d/2 facts are finite this is impossible. Hence at some point the first clause of a recursive call delems_dwalk(A,Acc,E) will fail. This means the goal \+member(B,Acc) was always false for all possible unification of B in dwalk(A,[A],B); implying any possible unification is in Acc. Since delems_dwalk(A,Acc,E) succeeds, then the second clause must succeed. If E is a variable, then it gets unified with Acc, since Acc is at least V then V is a prefix of E. If E is not a variable, then it must be equal to Acc. Again Acc is at least V, hence, V is a prefix of E. The first part of the second clause, checks if all elements in Acc are reachable from A. Hence, Acc is a subset of all reachable elements from A. The second part checks if all reachable elements from A are in Acc. Since both parts succeed, then in Acc are all reachable elements from A_∎.

Now, let's show the reverse direction: $V, E arelists, V is a prefix of E \land (\forall e \in 𝐃 (d), A ~ e iff e \in E)$ then delems_dwalk(A,V,E) is true .
If V = E, then the first clause of delems_dwalk(A,V,E) will fail because of how E is constructed. The second clause will succeed because E contains all reachable elements from A. If V is a proper prefix of E, then we proceed by induction on the length of the difference $E - V$ . If the difference is one element, then there exists a b in the domain of d such that b is reachable from A but it is not in V. Such element will be found by the first clause of delems_dwalk/3 in the first call. The successive recursive call to the first clause will fail because Acc will be already equal to E, so, all reachable elements from A are already in Acc. The second clause of delems_dwalk/3 will succeed because $Acc = [b|V] = E$ . Now we assume that if the length of the difference $E - V$ is $n$ then the proposition holds and we prove it also holds when the difference is of length $n + 1$ . Let c be the head of V and b the next element in E. To apply the induction hypothesis, we must show delems_dwalk/3 will eventually find b and make the recursive call delems_dwalk(A,[b|V],E). For each x, $x \neq b$ the recursive call delems_dwalk(A,[x|V],E) will fail. Once the first clause of delems_dwalk/3 fails, all elements reachable by A will be in Acc, however the second clause will fail because $Acc \neq E$ ; for two lists to be equal they must be equal element-wise. Hence, each recursive call delems_dwalk(A,[x|V],E), with $x \neq b$ will trigger a backtrack. The element b will be eventually found because dwalk(A,[A],b) is true, since b is in E. Hence, eventually, the recursive call delems_dwalk(A,[b|V],E) will be made. The list [b|V] is a prefix of the list E and the difference between both is of length $n$ , therefore by the induction hypothesis delems_dwalk(A,[b|V],E) is true and so delems_dwalk(A,V,E)_∎.

Review

With these proofs now we can rewrite delems_dwalk/3 specification and of course it calls for a review of all our specifications. Let's start with dleft_only/1. The current specification is: $\forall d:sbrelation,$ dleft_only(A) is true if $A \in 𝐃 (d) \land \exists B \in 𝐃 (d), d(A,B) \land ∄ C, d(C,A)$ . Again the first change is to make sure that every time dleft_only/1 is true, is because it complies with the specification, so, we need to change it to: $\forall d:sbrelation$ dleft_only(A) is true iff $A \in 𝐃 (d) \land \exists B \in 𝐃 (d), d(A,B) \land ∄ C, d(C,A)$ . Next we need to prove both directions. I think this proof is more or less direct. Likewise we need to change the specification for dright_only/1 to: $\forall d:sbrelation,$ dright_only(A) is true iff $A \in 𝐃 (d) \land \exists B \in 𝐃 (d), d(B,A) \land ∄ C, d(A,C)$ .

Using a similar approach to the one used for delems_dwalk/3 we can review dwalk/3 specification. In Programming Without Problems, the last specification felt a bit, inexpressive, this is the specification with the needed change: $\forall d:sbrelation,$ dwalk(A,V,C) is true iff $A, C \in 𝐃 (d) \land (d(A,C) \lor \exists B \in 𝐃 (d), d(A,B) \land B \notin V \land dwalk(B,[B|V],C))$ . The specification is basically the code, so it feels 'low-level' in particular because the role of V is not stated. We should try one more time. The list V is an exclusion list, it prevents some elements from being considered. But it also stores the elements that had been explored so far. We should express those two behaviors in the specification $\forall d:sbrelation,$ dwalk(A,V,C) is true iff $d(A,C) \lor (\exists B \in 𝐃 (d), B \neq A, d(B,C), \exists V1 = [B|_], V1 <> V, A ~$ _V1 C) . Where $V1 <> V$ means V1 and V have no elements in common and $A ~$ _V1C is a dwalk from A to C using V1 elements.

We first show If dwalk(A,V,C) is true then $d(A,C) \lor (\exists B \in 𝐃 (d), B \neq A, d(B,C), \exists V1 = [B|_], V1 <> V, A ~$ _V1 C) .
If d(A,C) then we are done. If d(A,C) is not a fact, then let dwalk(B,V2,C) be the last recursive call, it exists because dwalk(A,V,C) is true. The list V is a prefix of V2 because dwalk/3 adds elements always at the head of the list. Let $V1 = V2 - V$ , that is, we remove V from V2. Since dwalk/3 only adds new elements to V if they are not present, then V1 and V are disjoint. We now proceed by induction on V1's length, ( $| V1 |$ ).
If $| V1 | = 1$ , $V1 = [B]$ , then d(A,B), otherwise $| V1 | \neq 1$ , hence d(A,B),d(B,C) is an $A ~$ _V1C dwalk.
We assume now that we can construct $A ~$ _V1C if $| V1 | = n$ .
We show how to construct $A ~$ _V1C when $| V1 | = n + 1$ .
$V1 = [B|V3]$ , the list V3 is of length $n$ . Hence, by the induction hypothesis, the call dwalk(A,V,B) succeeds and we can construct $W = A ~$ _V3B . The dwalk $W d(B,C)$ is $A ~$ _V1C .

Now we show the inverse: $d(A,C) \lor (\exists B \in 𝐃 (d), B \neq A, d(B,C), \exists V1 = [B|_], V1 <> V, A ~$ _V1 C) then dwalk(A,V,C) is true .
If d(A,C) then dwalk(A,V,C) is true. If d(A,C) is not a fact, we proceed by induction on the length of V1.
If $| V1 | = 1$ , $V1 = [B]$ then d(A,B) and dwalk(A,V,C) succeeds with the dwalk: d(A,B), d(B,C).
If $| V1 | = n$ we assume dwalk(A,V,C) succeeds.
If $| V1 | = n + 1$ then $V1 = [B|V2]$ . By the induction hypothesis dwalk(A,V,B) succeeds, then dwalk(A,V,C) succeeds because d(B,C)_∎.

Next in the list is ddomain/2, we can specify it as: $\forall d:sbrelation,$ ddomain(V,E) is true iff $\exists V1 list, E = V1 + V, \forall a \in E, a \in 𝐃 (d) \land \forall b \in 𝐃 (d), b \in E$ . With this specification, again, we see the code we have for ddomain/2 is wrong. We need to correct the second clause:


         ddomain(E,E):-
                forall(member(B,E),(d(B,_);d(_,B))),
                forall((d(B,_);d(_,B)),member(B,E)).

We first show: if $\forall d:sbrelation,$ ddomain(V,E) is true then $\exists V1 list, E = V1 + V, \forall a \in E, a \in 𝐃 (d) \land \forall b \in 𝐃 (d), b \in E$ .
First we notice that for ddomain/2 to succeed the second clause must succeed because the first clause either fails or calls ddomain/2 recursively. For the second clause to succeed, V must be a prefix of E otherwise unification is impossible, all members of E must be in the domain of d/2 and finally, all members of the domain of d/2 must be in E.

We now show the converse: $\exists V1 list, E = V1 + V, \forall a \in E, a \in 𝐃 (d) \land \forall b \in 𝐃 (d), b \in E$ then ddomain(V,E) is true .
We proceed by induction on the length of V1 plus the case where V1 is the empty list. If V1 is empty, then the first clause of ddomain/2 will fail and the second will succeed. If $V1 = [B]$ , then B appears on the left of a d/2 fact or only in the right of a d/2 fact. In either case, the first clause of ddomain/2 will eventually find it. Since B is not in Acc, it will be appended and all subsequent executions of the first clause will fail. The second clause will succeed.
We assume if V1 is of length $n$ then ddomain/2 succeeds.
If V1 has length $n + 1$ we have two options, either E is a variable or E is instantiated to a list. If E is a variable, then no matter the first domain element A gets unified to in the first clause of ddomain/2. By the induction hypothesis the call ddomain([A|Acc],E) will succeed. On the other hand, if E is instantiated to a list, then we must make sure the first clause of ddomain/2 select the last element in V1. Let a be such an element. Since a is not in V, then we know eventually the first clause of ddomain/2 will unify A with a. If ddomain/2 does not unify A with a in the first call, then when the execution eventually reaches the second clause, it will fail, triggering a backtrack. This backtrack chain will eventually reach the root call and trigger the unification of A with a. At this point we can apply the induction hypothesis_∎.

Now that we have specified dwalk/3, delems_dwalk/3 and ddomain/2 we can define the corresponding dwalk/2, delems_dwalk/2 and ddomain/1 with respect to them:


         dwalk(A,C) :- dwalk(A,[A],C).
         delems_dwalk(A,E) :- delems_dwalk(A,[],E).
         ddomain(D) :- ddomain([],D).

In the following table we can summarize the changes. Including the necessary substitutions for dwalk/2, delems_dwalk/2 and ddomain/1.

Predicate	Description
dleft_only/1	$\forall d:sbrelation$ dleft_only(A) iff $A \in 𝐃 (d) \land \exists B \in 𝐃 (d), d(A,B) \land ∄ C, d(C,A)$
dright_only/1	$\forall d:sbrelation,$ dright_only(A) iff $A \in 𝐃 (d) \land \exists B \in 𝐃 (d), d(B,A) \land ∄ C, d(A,C)$
dwalk/3	$\forall d:sbrelation,$ dwalk(A,V,C) iff $A, C \in 𝐃 (d), V is a list, d(A,C) \lor (\exists B \in 𝐃 (d), B \neq A, d(B,C), \exists V1 = [B\|_], V1 <> V, A ~$ _V1 C)
dwalk/2	$\forall d:sbrelation,$ dwalk(A,C) iff dwalk(A,[A],C)
delems_dwalk/3	$\forall d:sbrelation,$ delems_dwalk(A,V,E) iff $A \in 𝐃 (d), V, E are lists, V is a prefix of E \land (\forall e \in 𝐃 (d), A ~ e iff e \in E)$
delems_dwalk/2	$\forall d:sbrelation,$ delems_dwalk(A,E) iff delems_dwalk(A,[],E)
ddomain/2	$\forall d:sbrelation,$ ddomain(V,E) iff $V, E are lists, \exists V1 list, V <> V1, E = V1 + V \land a \in E iff a \in 𝐃 (d)$
ddomain/1	$\forall d:sbrelation,$ ddomain(E) iff ddomain([],E)

We must also revise our degree/3 definition as it only allows a correct top-down execution, we can rewrite the second clause as:


         degree(V,Acc,D):-
                forall(member(U,Acc), edge(V,U)),
                forall(edge(V,U),member(U,Acc)),
                length(Acc,D).

That is, we must make sure Acc is actually the neighbors set of V.
I hope this whole exercise shows that it is a good idea to prove or at least have a list of TODO proofs to make sure our programs will behave as we intend them. Of course it is valid that the intended use of our programs to be top-down. For example, we can adopt the cut version of delems_dwalk/3 and document that E is intended to be an output variable, customary denoted as +E. However, I think it is better to have the most complete solution possible and then trim that solution to fit a particular scenario.

Design

It seems to be clear that much of the code would be different if the d/2 facts were inside a list passed as an argument to the predicates instead of relying on the Prolog database engine. For example, we would know we are finished when said list would be empty and not rely on backtrack to fetch the next entry. This design could feel stranger in the rewrite setting because graphs are usually given as a matrix or a pair of sets and passed as arguments. The approach taken here feels more like a finite stream: you know there is an end, but you can't get a glimpse at the full structure. I sided for this approach precisely for this: I think it is fun and allows to explore backtracking and unification in a simple way.

I also designed this to show that proofs and specifications do help us in the design and implementation but they also have their challenges that range from their creation to the actual proof. For example, take dwalk/2 definition it might seem counter intuitive that we need to pass the [A] list instead of the [] list. We need to read the specification carefully to see that if we pass the [] list it would allow dwalks that contain a loop at A. This is not necessarily wrong but the main reason why we designed dwalk/3 in the first place was to be able to avoid loops. In the next part we will develop more predicates and talk about the lists on the specification side.

Programming Without Problems II: When Some Problems Arise

A ∨ B ≠B ∨ A

A ∨ B = B ∨ A

From I think to ∎

Review

Design

$A \lor B \neq B \lor A$

$A \lor B = B \lor A$

From I think to _∎