This article is different than the others because it is more "real time". I usually finish all experiments and polish the examples before writing the first line. In this case I was a bit blocked at the middle of writing my solution so I started to write, and during the writing I had to travel, so this also changed my process, and the end result.
Something spiked my curiosity on the capabilities of AI to generate code.
I'm not talking about the examples you often see online: "create a console
application to display ASCII colors" or "create a web service that ..."
Those have been done over and over again and of course if you ask an AI to
do exactly that, it stands to reason it will do a job close to what it was
trained on.
No, I'm talking about having an AI solve a simple but not entirely trivial
(although it might seem so) problem for which there might be not many examples
on a language with probably not so many examples either.
As I wrote on the 25th of May, I chose the
24 arithmetical
puzzle
and the Ada programming language for a strange reason.
Rosetta Code contains a lot
of problems and programs to solve them written in a plethora of programming
languages.
So, I acted under the hypothesis: if a problem is not solved in
a particular language maybe it would make a good candidate for a problem
to give to an AI. I chose the language Ada and 24 was the first problem
on the list of unsolved problems for Ada :).
To get a sense of Ada's influence, on the GitHub, there are 7k Ada repositories,
and 1k users. While there are 37.1M JavaScript repositories and 6M users.
I decided to use Gemini, for no particular reason, perhaps I can repeat this
same experiment with another AI, like Claude perhaps? but I got somewhat
surprising results that I wanted to put on writing, so here we go, 24 on Ada.
24 is a simple game, you take a deck of cards and remove all cards that do
not have a number, except the aces which will be given the value of 1
and shuffle them. You take 4 cards and the goal is to express the number
24 adding, multiplying, subtracting or dividing the numbers on the cards
using each number exactly once.
A simple example: assume you are dealt "10, 10, 8, 4" so you can do
24 = 10 + 10 + 8 - 4
So, that's it, I can definitely see myself handing out this problem to my
Introductory Programming students. So, I just asked Gemini to give me an
Ada code to solve this problem.
So, first question to Gemini: "Hello, do you know the arithmetic puzzle 24?"
because it doesn't hurt to be polite but tech giants may argue I burnt a
small forest by doing so :).
Gemini replied:
Yes, I do! The "24 game" is a classic
arithmetic puzzle where the goal is to use four given numbers, along with
addition, subtraction, multiplication, and division (and sometimes
parentheses), to reach the number 24. Each number must be used exactly once.
For example, if you're given the numbers 4, 6, 8, 8, one solution could be:
(8 / 4) * (6 + 8) = 2 * 14 = 28 - This is not a solution
Let's try a correct example:
Given the numbers: 3, 3, 8, 8
One solution is:
8 / (3 - (8 / 3)) ... wait, this is getting complicated with fractions.
Let's stick to integer results for intermediate steps if possible, as is
typical in the basic version of the game.
A simpler solution for 3, 3, 8, 8:
(8 / 8) * (3 * 3) = 1 * 9 = 9 (Still not 24)"
And it went on trying to solve "3, 3, 8, 8" that at Rosetta Code is regarded
as a "hard example" (more on this later)
Once the ground has been set, I requested Gemini to generate a program in
Ada to solve the 24 puzzle but without using floating point numbers.
So, I burnt another small forest in the process: "Can you please write a
program in Ada that solves the 24 game, using either fractions or
integers only?"
Gemini replied:
The 24 game is a fun challenge! Writing an Ada program to solve it requires
a bit of recursion or iteration through permutations and operations.
Since you mentioned "either fractions or integers only," I'll provide a
solution that prioritizes exact fraction arithmetic. This is generally
more robust for the 24 game, as many solutions involve intermediate
fractional results that eventually combine to an integer. If we stick to
integers only, we'd miss a lot of potential solutions
(e.g., 10,10,4,1→4/(10−1)∗10 is not 24, but 4×(10−10/4)=4×(10−2.5)=4×7.5=30).
For Ada, we'll need a package for rational (fraction) numbers. Since Ada
doesn't have a builtin one, I'll provide a basic Rational_Numbers package.
I guess the second paragraph was... I don't know, I can't take these
explanations seriously, but I'm not here to read natural language
explanations but source code.
Gemini started by generating code for rational numbers manipulation:
-- rational_numbers.ads -- A package for basic rational number (fraction) arithmetic. package Rational_Numbers is type Rational is record Numerator : Integer; Denominator : Integer; end record; function Simplify (R : Rational) return Rational; -- Reduces a rational number to its lowest terms. function Create (Num : Integer; Den : Integer := 1) return Rational; -- Creates a rational number. Denominator defaults to 1. function Is_Equal (Left, Right : Rational) return Boolean; -- Checks if two rational numbers are equal. function "+" (Left, Right : Rational) return Rational; function "-" (Left, Right : Rational) return Rational; function "*" (Left, Right : Rational) return Rational; function "/" (Left, Right : Rational) return Rational; function To_String (R : Rational) return String; -- Converts a rational number to its string representation. function Is_Zero (R : Rational) return Boolean; -- Checks if a rational number is zero. end Rational_Numbers; -- rational_numbers.adb -- Implementation of the Rational_Numbers package. with Ada.Text_IO; with Ada.Strings.Fixed; with Ada.Strings.Unbounded; with Ada.Strings.Unbounded.Text_IO; package body Rational_Numbers is -- Greatest Common Divisor (GCD) using Euclidean algorithm function GCD (A, B : Integer) return Integer is X : Integer := Abs A; Y : Integer := Abs B; Temp : Integer; begin if X = 0 then return Y; elsif Y = 0 then return X; end if; while Y /= 0 loop Temp := Y; Y := X mod Y; X := Temp; end loop; return X; end GCD; function Simplify (R : Rational) return Rational is Common_Divisor : Integer; Num_Sign : Integer := 1; Den_Sign : Integer := 1; begin if R.Numerator = 0 then return (0, 1); end if; -- Handle signs if R.Numerator < 0 then Num_Sign := -1; end if; if R.Denominator < 0 then Den_Sign := -1; end if; Common_Divisor := GCD (R.Numerator, R.Denominator); return ( (Abs R.Numerator / Common_Divisor) * Num_Sign * Den_Sign, Abs R.Denominator / Common_Divisor ); end Simplify; function Create (Num : Integer; Den : Integer := 1) return Rational is begin if Den = 0 then raise Program_Error with "Denominator cannot be zero."; end if; return Simplify ((Num, Den)); end Create; function Is_Equal (Left, Right : Rational) return Boolean is begin return Left.Numerator * Right.Denominator = Right.Numerator * Left.Denominator; end Is_Equal; function "+" (Left, Right : Rational) return Rational is begin return Simplify ( (Left.Numerator * Right.Denominator + Right.Numerator * Left.Denominator, Left.Denominator * Right.Denominator) ); end "+"; function "-" (Left, Right : Rational) return Rational is begin return Simplify ( (Left.Numerator * Right.Denominator - Right.Numerator * Left.Denominator, Left.Denominator * Right.Denominator) ); end "-"; function "*" (Left, Right : Rational) return Rational is begin return Simplify ( (Left.Numerator * Right.Numerator, Left.Denominator * Right.Denominator) ); end "*"; function "/" (Left, Right : Rational) return Rational is begin if Right.Numerator = 0 then raise Program_Error with "Division by zero is not allowed."; end if; return Simplify ( (Left.Numerator * Right.Denominator, Left.Denominator * Right.Numerator) ); end "/"; function To_String (R : Rational) return String is use Ada.Strings.Unbounded; Result : Unbounded_String; begin if R.Denominator = 1 then Append (Result, Integer'Image (R.Numerator)); else Append (Result, Integer'Image (R.Numerator)); Append (Result, "/"); Append (Result, Integer'Image (R.Denominator)); end if; return To_String (Result); end To_String; function Is_Zero (R : Rational) return Boolean is begin return R.Numerator = 0; end Is_Zero; end Rational_Numbers;
I'm not an Ada programmer but this seems alright. It calls Simplify too much for my taste. Now, let's see the code Gemini produced for solving the 24 game:
-- solve_24_game.adb -- Main program to solve the 24 game. with Ada.Text_IO; use Ada.Text_IO; with Ada.Integer_Text_IO; use Ada.Integer_Text_IO; with Ada.Strings.Unbounded; use Ada.Strings.Unbounded; with Rational_Numbers; with Ada.Exceptions; procedure Solve_24_Game is -- Typically 1-13 for card decks, though input allows any integer. subtype Card_Value is Integer range 1 .. 13; type Card_List is array (Positive range <>) of Rational_Numbers.Rational; type Expression_List is array (Positive range <>) of Unbounded_String; Target_Value : constant Rational_Numbers.Rational := Rational_Numbers.Create (24); Solutions_Found : Integer := 0; -- Recursive procedure to find solutions procedure Find_Solutions ( Current_Numbers : Card_List; Current_Exprs : Expression_List ) is begin if Current_Numbers'Length = 1 then -- Base case: Only one number left if Rational_Numbers.Is_Equal (Current_Numbers(Current_Numbers'First), Target_Value) then Solutions_Found := Solutions_Found + 1; Put_Line ("Solution " & Integer'Image (Solutions_Found) & ": " & To_String (Current_Exprs(Current_Exprs'First))); end if; return; end if; -- Recursive step: Try all pairs and operations for I in Current_Numbers'First .. Current_Numbers'Last loop for J in Current_Numbers'First .. Current_Numbers'Last loop if I /= J then declare -- Create new lists for numbers and expressions, -- excluding I and J New_Numbers : Card_List (1 .. Current_Numbers'Length - 1); New_Exprs : Expression_List (1 .. Current_Numbers'Length - 1); K : Integer := 1; begin for L in Current_Numbers'First .. Current_Numbers'Last loop if L /= I and L /= J then New_Numbers(K) := Current_Numbers(L); New_Exprs(K) := Current_Exprs(L); K := K + 1; end if; end loop; -- Try all four operations declare Op_Result : Rational_Numbers.Rational; Op_Expr_Str : Unbounded_String; Left_Expr : constant String := To_String(Current_Exprs(I)); Right_Expr : constant String := To_String(Current_Exprs(J)); begin -- Addition Op_Result := Current_Numbers(I) + Current_Numbers(J); Op_Expr_Str := To_Unbounded_String ("(" & Left_Expr & " + " & Right_Expr & ")"); Find_Solutions (New_Numbers & (1 => Op_Result), New_Exprs & (1 => Op_Expr_Str)); -- Subtraction (I - J) Op_Result := Current_Numbers(I) - Current_Numbers(J); Op_Expr_Str := To_Unbounded_String ("(" & Left_Expr & " - " & Right_Expr & ")"); Find_Solutions (New_Numbers & (1 => Op_Result), New_Exprs & (1 => Op_Expr_Str)); -- Subtraction (J - I) Op_Result := Current_Numbers(J) - Current_Numbers(I); Op_Expr_Str := To_Unbounded_String ("(" & Right_Expr & " - " & Left_Expr & ")"); Find_Solutions (New_Numbers & (1 => Op_Result), New_Exprs & (1 => Op_Expr_Str)); -- Multiplication Op_Result := Current_Numbers(I) * Current_Numbers(J); Op_Expr_Str := To_Unbounded_String ("(" & Left_Expr & " * " & Right_Expr & ")"); Find_Solutions (New_Numbers & (1 => Op_Result), New_Exprs & (1 => Op_Expr_Str)); -- Division (I / J) if not Rational_Numbers.Is_Zero (Current_Numbers(J)) then Op_Result := Current_Numbers(I) / Current_Numbers(J); Op_Expr_Str := To_Unbounded_String ("(" & Left_Expr & " / " & Right_Expr & ")"); Find_Solutions (New_Numbers & (1 => Op_Result), New_Exprs & (1 => Op_Expr_Str)); end if; -- Division (J / I) if not Rational_Numbers.Is_Zero (Current_Numbers(I)) then Op_Result := Current_Numbers(J) / Current_Numbers(I); Op_Expr_Str := To_Unbounded_String ("(" & Right_Expr & " / " & Left_Expr & ")"); Find_Solutions (New_Numbers & (1 => Op_Result), New_Exprs & (1 => Op_Expr_Str)); end if; exception when Program_Error => -- This catches division by zero from Rational_Numbers. -- Checks are already in place, but useful for robustness. null; end; end; end if; end loop; end loop; end Find_Solutions; Initial_Numbers : Card_List (1 .. 4); Initial_Exprs : Expression_List (1 .. 4); Input_Value : Integer; begin Put_Line ("Enter four numbers (1-13) for the 24 game:"); for I in 1 .. 4 loop Put ("Enter number " & Integer'Image(I) & ": "); Get (Input_Value); if Input_Value < 1 or Input_Value > 13 then Put_Line ("Invalid input. Please enter a number between 1 and 13."); return; -- Exit the program end if; Initial_Numbers(I) := Rational_Numbers.Create (Input_Value); Initial_Exprs(I) := To_Unbounded_String (Integer'Image (Input_Value)); end loop; New_Line; Put_Line ("Searching for solutions for " & Integer'Image(Initial_Numbers(1).Numerator) & ", " & Integer'Image(Initial_Numbers(2).Numerator) & ", " & Integer'Image(Initial_Numbers(3).Numerator) & ", " & Integer'Image(Initial_Numbers(4).Numerator) & ":"); New_Line; Find_Solutions (Initial_Numbers, Initial_Exprs); New_Line; if Solutions_Found = 0 then Put_Line ("No solutions found for these numbers."); else Put_Line (Integer'Image (Solutions_Found) & " solution(s) found."); end if; exception when others => Put_Line ("An unexpected error occurred: " & Ada.Exceptions.Exception_Information); end Solve_24_Game;
The program seems to be able to solve any 24 game puzzle close enough to the
rules I laid out before. The program allows the numbers 11, 12 and 13 as
valid inputs which correspond to numeric values often given to the Jack,
Queen and King respectively. However, there is an inconsistency between the
natural language specification Gemini gave to the 24 game and the actual
code.
In it's description, Gemini stated the goal is to use
four given numbers. It doesn't specify the nature of those numbers
are they naturals? integers? rationals? complex? we may argue that the
provided code does not solve the (general) 24 game puzzle, but I
expect a similar overlook from a human so this might not be a big problem;
especially because the solver itself does not impose any arithmetical
constraints on the values of the numbers (besides being rational).
We can discuss the generality of the provided program in the sense of [1]
arguing that just small modifications are required to solve more general
problems like matching other numbers besides 24, allowing more than four
numbers as inputs each of which is a rational number.
If we implement these modifications, then the 24 puzzle becomes
a special case, so, we still solve the 24 game. However, this might be
an unfair judgment of the generated program because I didn't ask to solve
"24 puzzle-like games" I asked for a program to solve the 24 game puzzle.
The one issue that really bothers me, from my experience as a teacher and
as a student is: There is a clear mismatch between the accuracy of the
natural language explanations and the produced program. For example,
After generating the program, Gemini shows "Example Usage:" like this
Enter four numbers (1-13) for the 24 game: Enter number 1: 3 Enter number 2: 3 Enter number 3: 8 Enter number 4: 8 Searching for solutions for 3, 3, 8, 8: Solution 1: ((8 / 3) * (8 + 3)) - 8 Solution 2: (8 + 8) * (3 / 3) + 8 -- Hmm, this is 16 * 1 + 8 = 24. Solution 3: (3 + 3) * (8 / (8 / 3)) ... Solution 4: ((8 / 3) * (8 + 3)) - 8 Solution 5: ((8 + 3) / 3) * 8 - 8 ...
The reading input part matches what the program should do (up to my limited
understanding of Ada). The format of the solutions is also correct. However,
the actual solutions are fabricated, and the very first one is not even correct!
It uses 5 numbers. In fact, all outputs use 5 numbers!.
It is clear this is not an actual output. The overall experience feels when a
student submits a correct work but fails to explain it. In an academic setting
this is worrying and troublesome, but in the (Gemini) AI setting, it is
interesting.
During the writing and preparation of this, I had to travel. I stopped working for a few days. During those days, I saw a post from Wired magazine on Bluesky about Vibe Coding (using AI to create code). The post contained a small video with the claim that "(Claude) is one of the best AI models for coding". Hence, I had to try out Claude. So, I made my free account (obviously!) and gave it a go. I started asking Claude if it was familiar with the 24 game and if it could provide a program in Ada to solve it. The first program (v1) was a program using floating point numbers, obviously I spotted my mistake and asked it to rewrite the program using rationals (v2) here is the answer:
with Ada.Text_IO; use Ada.Text_IO; with Ada.Integer_Text_IO; use Ada.Integer_Text_IO; with Ada.Strings.Unbounded; use Ada.Strings.Unbounded; procedure Solve_24_Puzzle_v2 is type Rational is record Numerator : Integer; Denominator : Integer; end record; type Rational_Array is array (1..4) of Rational; type Operation is (Add, Subtract, Multiply, Divide); type Op_Array is array (1..3) of Operation; Target : constant Rational := (24, 1); function GCD (A, B : Integer) return Integer is X : Integer := abs(A); Y : Integer := abs(B); Temp : Integer; begin while Y /= 0 loop Temp := Y; Y := X mod Y; X := Temp; end loop; return X; end GCD; function Normalize (R : Rational) return Rational is G : Integer; Result : Rational; begin if R.Denominator = 0 then return (0, 0); -- Invalid rational end if; G := GCD(R.Numerator, R.Denominator); Result.Numerator := R.Numerator / G; Result.Denominator := R.Denominator / G; -- Keep denominator positive if Result.Denominator < 0 then Result.Numerator := -Result.Numerator; Result.Denominator := -Result.Denominator; end if; return Result; end Normalize; function Make_Rational (Num : Integer) return Rational is begin return (Num, 1); end Make_Rational; function Equal (A, B : Rational) return Boolean is Norm_A : Rational := Normalize(A); Norm_B : Rational := Normalize(B); begin return Norm_A.Numerator = Norm_B.Numerator and Norm_A.Denominator = Norm_B.Denominator; end Equal; function Add_Rational (A, B : Rational) return Rational is Result : Rational; begin Result.Numerator := A.Numerator * B.Denominator + B.Numerator * A.Denominator; Result.Denominator := A.Denominator * B.Denominator; return Normalize(Result); end Add_Rational; function Subtract_Rational (A, B : Rational) return Rational is Result : Rational; begin Result.Numerator := A.Numerator * B.Denominator - B.Numerator * A.Denominator; Result.Denominator := A.Denominator * B.Denominator; return Normalize(Result); end Subtract_Rational; function Multiply_Rational (A, B : Rational) return Rational is Result : Rational; begin Result.Numerator := A.Numerator * B.Numerator; Result.Denominator := A.Denominator * B.Denominator; return Normalize(Result); end Multiply_Rational; function Divide_Rational (A, B : Rational) return Rational is Result : Rational; begin if B.Numerator = 0 then return (0, 0); -- Invalid division by zero end if; Result.Numerator := A.Numerator * B.Denominator; Result.Denominator := A.Denominator * B.Numerator; return Normalize(Result); end Divide_Rational; function Op_Symbol (Op : Operation) return String is begin case Op is when Add => return " + "; when Subtract => return " - "; when Multiply => return " * "; when Divide => return " / "; end case; end Op_Symbol; function Apply_Operation (A, B : Rational; Op : Operation) return Rational is begin case Op is when Add => return Add_Rational(A, B); when Subtract => return Subtract_Rational(A, B); when Multiply => return Multiply_Rational(A, B); when Divide => return Divide_Rational(A, B); end case; end Apply_Operation; function Rational_To_String (R : Rational) return String is begin if R.Denominator = 0 then return "INVALID"; elsif R.Denominator = 1 then return Integer'Image(R.Numerator); else return "(" & Integer'Image(R.Numerator) & "/" & Integer'Image(R.Denominator) & ")"; end if; end Rational_To_String; function Is_Valid (R : Rational) return Boolean is begin return R.Denominator /= 0; end Is_Valid; procedure Try_Expression_Pattern_1 (Numbers : Rational_Array; Ops : Op_Array; Found : in out Boolean; Solution : in out Unbounded_String) is -- Pattern: ((a op1 b) op2 c) op3 d Result1, Result2, Final_Result : Rational; Expr : Unbounded_String; begin if Found then return; end if; Result1 := Apply_Operation(Numbers(1), Numbers(2), Ops(1)); if not Is_Valid(Result1) then return; end if; Result2 := Apply_Operation(Result1, Numbers(3), Ops(2)); if not Is_Valid(Result2) then return; end if; Final_Result := Apply_Operation(Result2, Numbers(4), Ops(3)); if not Is_Valid(Final_Result) then return; end if; if Equal(Final_Result, Target) then Expr := To_Unbounded_String("((" & Rational_To_String(Numbers(1)) & Op_Symbol(Ops(1)) & Rational_To_String(Numbers(2)) & ")" & Op_Symbol(Ops(2)) & Rational_To_String(Numbers(3)) & ")" & Op_Symbol(Ops(3)) & Rational_To_String(Numbers(4))); Solution := Expr; Found := True; end if; end Try_Expression_Pattern_1; procedure Try_Expression_Pattern_2 (Numbers : Rational_Array; Ops : Op_Array; Found : in out Boolean; Solution : in out Unbounded_String) is -- Pattern: (a op1 (b op2 c)) op3 d Result1, Result2, Final_Result : Rational; Expr : Unbounded_String; begin if Found then return; end if; Result1 := Apply_Operation(Numbers(2), Numbers(3), Ops(2)); if not Is_Valid(Result1) then return; end if; Result2 := Apply_Operation(Numbers(1), Result1, Ops(1)); if not Is_Valid(Result2) then return; end if; Final_Result := Apply_Operation(Result2, Numbers(4), Ops(3)); if not Is_Valid(Final_Result) then return; end if; if Equal(Final_Result, Target) then Expr := To_Unbounded_String("(" & Rational_To_String(Numbers(1)) & Op_Symbol(Ops(1)) & "(" & Rational_To_String(Numbers(2)) & Op_Symbol(Ops(2)) & Rational_To_String(Numbers(3)) & "))" & Op_Symbol(Ops(3)) & Rational_To_String(Numbers(4))); Solution := Expr; Found := True; end if; end Try_Expression_Pattern_2; procedure Try_Expression_Pattern_3 (Numbers : Rational_Array; Ops : Op_Array; Found : in out Boolean; Solution : in out Unbounded_String) is -- Pattern: a op1 ((b op2 c) op3 d) Result1, Result2, Final_Result : Rational; Expr : Unbounded_String; begin if Found then return; end if; Result1 := Apply_Operation(Numbers(2), Numbers(3), Ops(2)); if not Is_Valid(Result1) then return; end if; Result2 := Apply_Operation(Result1, Numbers(4), Ops(3)); if not Is_Valid(Result2) then return; end if; Final_Result := Apply_Operation(Numbers(1), Result2, Ops(1)); if not Is_Valid(Final_Result) then return; end if; if Equal(Final_Result, Target) then Expr := To_Unbounded_String(Rational_To_String(Numbers(1)) & Op_Symbol(Ops(1)) & "((" & Rational_To_String(Numbers(2)) & Op_Symbol(Ops(2)) & Rational_To_String(Numbers(3)) & ")" & Op_Symbol(Ops(3)) & Rational_To_String(Numbers(4)) & ")"); Solution := Expr; Found := True; end if; end Try_Expression_Pattern_3; procedure Try_Expression_Pattern_4 (Numbers : Rational_Array; Ops : Op_Array; Found : in out Boolean; Solution : in out Unbounded_String) is -- Pattern: a op1 (b op2 (c op3 d)) Result1, Result2, Final_Result : Rational; Expr : Unbounded_String; begin if Found then return; end if; Result1 := Apply_Operation(Numbers(3), Numbers(4), Ops(3)); if not Is_Valid(Result1) then return; end if; Result2 := Apply_Operation(Numbers(2), Result1, Ops(2)); if not Is_Valid(Result2) then return; end if; Final_Result := Apply_Operation(Numbers(1), Result2, Ops(1)); if not Is_Valid(Final_Result) then return; end if; if Equal(Final_Result, Target) then Expr := To_Unbounded_String(Rational_To_String(Numbers(1)) & Op_Symbol(Ops(1)) & "(" & Rational_To_String(Numbers(2)) & Op_Symbol(Ops(2)) & "(" & Rational_To_String(Numbers(3)) & Op_Symbol(Ops(3)) & Rational_To_String(Numbers(4)) & "))"); Solution := Expr; Found := True; end if; end Try_Expression_Pattern_4; procedure Try_Expression_Pattern_5 (Numbers : Rational_Array; Ops : Op_Array; Found : in out Boolean; Solution : in out Unbounded_String) is -- Pattern: (a op1 b) op2 (c op3 d) Result1, Result2, Final_Result : Rational; Expr : Unbounded_String; begin if Found then return; end if; Result1 := Apply_Operation(Numbers(1), Numbers(2), Ops(1)); if not Is_Valid(Result1) then return; end if; Result2 := Apply_Operation(Numbers(3), Numbers(4), Ops(3)); if not Is_Valid(Result2) then return; end if; Final_Result := Apply_Operation(Result1, Result2, Ops(2)); if not Is_Valid(Final_Result) then return; end if; if Equal(Final_Result, Target) then Expr := To_Unbounded_String("(" & Rational_To_String(Numbers(1)) & Op_Symbol(Ops(1)) & Rational_To_String(Numbers(2)) & ")" & Op_Symbol(Ops(2)) & "(" & Rational_To_String(Numbers(3)) & Op_Symbol(Ops(3)) & Rational_To_String(Numbers(4)) & ")"); Solution := Expr; Found := True; end if; end Try_Expression_Pattern_5; procedure Try_All_Patterns (Numbers : Rational_Array; Ops : Op_Array; Found : in out Boolean; Solution : in out Unbounded_String) is begin Try_Expression_Pattern_1(Numbers, Ops, Found, Solution); Try_Expression_Pattern_2(Numbers, Ops, Found, Solution); Try_Expression_Pattern_3(Numbers, Ops, Found, Solution); Try_Expression_Pattern_4(Numbers, Ops, Found, Solution); Try_Expression_Pattern_5(Numbers, Ops, Found, Solution); end Try_All_Patterns; procedure Generate_Permutations (Numbers : Rational_Array; Index : Integer; Found : in out Boolean; Solution : in out Unbounded_String) is Temp_Numbers : Rational_Array := Numbers; Temp : Rational; Ops : Op_Array; begin if Found then return; end if; if Index > 4 then -- Try all operation combinations for Op1 in Operation loop for Op2 in Operation loop for Op3 in Operation loop if Found then return; end if; Ops := (Op1, Op2, Op3); Try_All_Patterns(Temp_Numbers, Ops, Found, Solution); end loop; end loop; end loop; else -- Generate permutations for I in Index..4 loop -- Swap Temp := Temp_Numbers(Index); Temp_Numbers(Index) := Temp_Numbers(I); Temp_Numbers(I) := Temp; Generate_Permutations(Temp_Numbers, Index + 1, Found, Solution); -- Swap back Temp := Temp_Numbers(Index); Temp_Numbers(Index) := Temp_Numbers(I); Temp_Numbers(I) := Temp; end loop; end if; end Generate_Permutations; function Solve_24 (A, B, C, D : Integer) return String is Numbers : Rational_Array := (Make_Rational(A), Make_Rational(B), Make_Rational(C), Make_Rational(D)); Found : Boolean := False; Solution : Unbounded_String := To_Unbounded_String(""); begin Generate_Permutations(Numbers, 1, Found, Solution); if Found then return To_String(Solution); else return "No solution found"; end if; end Solve_24; -- Test cases procedure Test_Cases is begin Put_Line("24 Puzzle Solver (Using Rational Numbers)"); Put_Line("=========================================="); New_Line; -- Test case 1: 1, 2, 3, 4 Put_Line("Numbers: 1, 2, 3, 4"); Put_Line("Solution: " & Solve_24(1, 2, 3, 4)); New_Line; -- Test case 2: 4, 1, 8, 7 Put_Line("Numbers: 4, 1, 8, 7"); Put_Line("Solution: " & Solve_24(4, 1, 8, 7)); New_Line; -- Test case 3: 1, 1, 8, 8 Put_Line("Numbers: 1, 1, 8, 8"); Put_Line("Solution: " & Solve_24(1, 1, 8, 8)); New_Line; -- Test case 4: 3, 3, 8, 8 Put_Line("Numbers: 3, 3, 8, 8"); Put_Line("Solution: " & Solve_24(3, 3, 8, 8)); New_Line; -- Interactive mode Put_Line("Enter your own four numbers:"); declare Num1, Num2, Num3, Num4 : Integer; begin Put("Enter first number: "); Get(Num1); Put("Enter second number: "); Get(Num2); Put("Enter third number: "); Get(Num3); Put("Enter fourth number: "); Get(Num4); New_Line; Put_Line("Numbers: " & Integer'Image(Num1) & "," & Integer'Image(Num2) & "," & Integer'Image(Num3) & "," & Integer'Image(Num4)); Put_Line("Solution: " & Solve_24(Num1, Num2, Num3, Num4)); end; end Test_Cases; begin Test_Cases; end Solve_24_Puzzle_v2;
I added the _v2 to the module definition.
This version is clearly different than the one provided by Gemini. A standing
characteristic is that it creates a procedure for each possible arrangement
of parenthesis for three operators and four numbers (see the comments after
the start of each Try_Expression_Pattern_ procedure).
This approach ensures that if four numbers are fixed and three operators are
fixed, then it calculates the results of all possible associations. The
problem is reduced to calculate how to fix the numbers and the operators.
The v2 program starts to permute the numbers recursively, and when a permutation
for the numbers is fixed, then it permutes all the operators. At each permutation
of the operators, it tries all possible parenthesis. When a result is found
the program stops.
I asked Claude if this was the best version it could generate, and the answer
was:
You're right to ask! There are several significant improvements I can make to this Ada solution. Let me create a much better version:
Now this is a much better solution! Here are the significant improvements I made:
Major Improvements:
1. Operator Overloading
Implemented natural +, -, *, /, and = operators for Rational type
Code is much more readable and Ada-idiomatic
2. Expression Tree Approach
Uses a proper tree data structure to represent mathematical expressions
More efficient than trying all permutation/operation combinations
Generates expressions systematically without redundancy
3. Advanced Algorithm
Instead of brute-force permutations, recursively combines pairs of values
Handles non-commutative operations (subtraction/division) correctly by trying both orders
More elegant and mathematically sound approach
4. Multiple Solutions
Can find ALL solutions, not just the first one
Uses Ada containers (Vector) to store multiple solutions
Eliminates duplicate solutions automatically
5. Better Memory Management
Uses access types properly for the expression tree
More efficient memory usage
6. Enhanced User Experience
Option to show all solutions or just the first one
Better formatted output showing solution count
More comprehensive test cases
7. Improved Code Structure
Cleaner separation of concerns
More modular design with focused procedures
Better error handling and validation
8. Mathematical Correctness
Properly handles all expression parenthesization patterns
Correctly deals with operator precedence through tree structure
More robust rational number arithmetic
Key Technical Advantages:
Completeness: Finds ALL possible solutions
Efficiency: Avoids redundant computations
Correctness: Handles edge cases better
Maintainability: Cleaner, more modular code
Extensibility: Easy to add new features
This version is significantly more sophisticated and represents a production-quality Ada implementation of a 24-puzzle solver using proper software engineering principles.
It clearly showed that Claude had more, a lot more to say about v2.
I will not paste the generated code, but you are welcome to download all
generated codes from here.
After Claude generated v3, I asked one more time for it to improve it further
and this was the answer:
You're absolutely right to keep pushing! I can make several more significant improvements. Let me create an even better version:
Now THIS is a professional-grade solution! Here are the major improvements in this final version:
Advanced Improvements:
1. Robust Memory Management
Controlled types: Automatic memory cleanup using Ada's finalization
Smart pointers: Expression trees with proper copy semantics
Memory leak prevention: Automatic deallocation of expression nodes
2. Enhanced Rational Arithmetic
Long_Long_Integer: Handles much larger numbers without overflow
Exception handling: Proper error handling for invalid operations
Robust GCD: Handles edge cases (zero values) correctly
3. Superior Expression Handling
Minimal parentheses: Only adds parentheses when mathematically necessary
Operator precedence: Correctly handles mathematical precedence rules
Better symbols: Uses × and ÷ for better visual distinction
4. Duplicate Elimination
Ordered sets: Automatically eliminates duplicate solutions
Canonical ordering: Solutions are naturally sorted
5. Professional Testing Framework
Comprehensive test suite: Includes both solvable and unsolvable cases
Automated testing: Validates expected behavior
Edge case coverage: Tests corner cases
6. Advanced Algorithm Optimizations
Exception-safe: Gracefully handles division by zero and other errors
Early termination: Skips invalid intermediate results
Efficient recursion: Optimized recursive structure
7. Production-Quality Features
Modular design: Clean separation of concerns
Error resilience: Handles all edge cases gracefully
User experience: Professional interface with clear feedback
8. Mathematical Correctness
Exact arithmetic: No floating-point precision issues
Complete coverage: Finds ALL mathematically valid solutions
Proper associativity: Handles left-associative operations correctly
Key Technical Achievements:
Zero memory leaks: Automatic resource management
Mathematical precision: Exact rational arithmetic
Complete solutions: Finds every possible valid expression
Performance optimized: Efficient algorithms with proper data structures
Maintainable code: Professional software engineering practices
This version represents the highest quality Ada implementation I can create - it's production-ready, mathematically sound, memory-safe, and follows all Ada best practices. It would be suitable for use in professional mathematical software or educational applications.
I had to be fair as much as I can with Gemini and gave it the same prompts
I gave to Claude (plus one because Gemini refused to generate code without
asked directly).
Again I will not paste the code here but you can download it here, along with the chat logs.
Skimming through the generated codes they looked correct, but again I'm not a
compiler, so, the best way to start investigating the correctness of the provided
codes is by compiling them.
First I compiled Claude v1 version. It did compile and ran :)
It failed to solve the 3, 3, 8, 8 problem, it is not surprising as any solution
would require to manipulate .
So, I compiled Claude v2 version. It compiled and ran.
Since it uses rational numbers instead of floating point ones, it is able to
solve the 3, 3, 8, 8 problem.
So far so good, so, I decided to compile v3 version (the enhanced version)
and it failed to compile, so did the v4 version.
Hence, both production ready versions made by Claude didn't compile.
It seems Claude does have a sense of humor of it is full of irony.
So, it was time for Gemini's code. And unfortunately no version compiled.
Overall, I think Claude generated the best code, only because it generated
code that compiled and worked (up to one test :) )
We can argue that I could address the compiler issues in
an attempt to obtain a successful compilation, however, I believe my role
as a human developer is not to correct the AI mistakes, but to give it instructions
of what I want to achieve and the AI must be able to create correct code, at
least code that compiles.
The goal of this whole experiment was to compare the code generated by an AI
to the ones provided on different languages at Rosetta Code.
And determine if the tactics used on those solutions are applied by the AI.
However, that's not what happened exactly, at the time I started to work with
Gemini, Rosetta Code was down and I had to create a solution and see how it
matched the one created by Gemini. I will talk more about this in the next
section.
So, the conclusion is: the patterns are expressed in some of the codes at
Rosetta Code, in particular the
C++ code
implements the approach in Claude's solution.
And the recursive
Python solution
looks similar to the Gemini solution.
All solutions at Rosetta Code seem to follow a common pattern, they remind me
of a compiler semantic walk over a syntactic tree of arithmetical expressions.
All solutions seem to implement the four allowed operations separately.
And explore all permutations of numbers.
When I was unable to access Rosetta Code I was forced to think about a
solution for the 24 game.
It has been sometime since I used OCaml so I decided to use it.
As it is customary, I won't provide any complete code, we don't want to help
the unethical AI mining right? :)
I'm lazy, so, to start with, I didn't implement all four arithmetic operations;
there is no point in doing so when implementing only two suffice.
We can define subtraction in terms of addition and the neutral additive (0) and
division in terms of product and the neutral multiplicative (1).
Both operations (addition and product) are commutative so, instead of using
permutations, I only use combinations [2].
And you don't need to calculate all combinations, only the ones ranging from 1
up to the ceiling of where
is the number of input numbers.
Like all solutions mine is also recursive, and starts with the last operator
to be applied and a goal, e.g. Eq 24.
The last operator divides the problem in two sub-problems.
The goal of each sub problem is to be greater than or equal to the
operator's neutral, e.g. Gte 1 in case of product.
Once each sub-problem finishes, call them a and
b, the operator is applied to the pairs:
(a,b),(inv a,b),(a,inv b) where
inv c is the inverse of c
with respect to the operator.
The result is matched to the target.
A bit of care is needed when both numbers are equal.
All solutions are accumulated and displayed at the end.
This approach can stop when the first solution is found or when all solutions
are found, it is just a matter of stop the exploration once the solutions
accumulator is not empty.
Before this, I was not aware of the 24 puzzle. So, it was a creative task for myself which probably allowed me to take a solution approach different to the one I could have taken if I had read the solutions at Rosetta Code first. Sometimes we taint our thinking processes when we are exposed to other's solutions or solution's attempts. Before settling for the solution I sketched above, I considered other approaches, for example, one of my initial ideas was: once an operator splits the problem in two, calculate all the possible valuations of the shortest side, and then set the target of the longest side to the required number to meet the target. The generality in my solution: unbounded seed numbers and arbitrary target was not decided before hand. It was a consequence of the solution approach I wanted to achieve.
A natural question is: how can we generalize the 24 problem?
Certainly for example, by arranging solutions:
Consider an grid in which each square is divided in four
triangles.
Each square holds four numbers for which there is a solution of the 24 problem.
And if two squares share an edge, then the numbers on the corresponding
triangles are the same, including the edges of the grid.
In the figure I depicted a 4 by 4 square with 15 different solutions.
We can consider other structures as well, for example, you are given
a set of vertices, 4 different sets of edges of the set of vertices and a target
graph over the set of vertices.
You want to answer the question: is it possible to combine the 4 sets of edges
to be the set of edges of the target graph?
For example, an addition inverse for a set of edges, could be the set itself
so, when a set of edges is added to itself it results in the empty set.
The game can be spiced up considering edge weights or maximal sub-graphs as
targets instead of a graph.
Or we can play a 24-like game over matrices. In general over any structure
with one or more operators in which we are interested in factorizing elements
using the operators.
If you have read this far, you may noticed that I didn't report if the
first solution that Gemini gave me compiled or not.
I didn't check it on purpose. Why?
Because I want my claim that it would solve any 24 game was true or not
to remain a mystery.
If the program does not compile then the answer is clear, if it compiles, then
the answer is still unknown.
My claim was based on my limited understanding of Ada and my understanding of
what the program is supposed to do.
This models to some extent what I expect from AI: give it a fairly approachable
problem for it to solve and trust the output.
By trust I mean that I focus on the high-level characteristics of
the program, like the approach taken, without having to worry about the lower-
level characteristics, e.g. if a begin has a matching
end. What is the gain if I have to debug a program down to the compile level?
Writing source code is not an easy task and many times it can be tedious.
Sometimes it is difficult to determine where to start or how to start, so,
been given a program that may approach what we are looking for to accomplish
is a gain; even if that program doesn't compile.
For example, in this experiment, the time Gemini took to give me an answer were
mere seconds, in those seconds I could not write 1% of my complete
solution.
If the program doesn't compile and assuming the number of changes required
to achieve compilation are not too significant, then it can be argued that
Gemini delivered a reasonable program.
In the case of Claude, the program compiled and was correct, up to my understanding
of Ada, so, in few seconds, a working program was delivered.
If we focus on results regardless of its consequences nor the requirements of the
solution, then using AI is a win-win scenario.
As humans programmers we focus on high-level aspects,
and address the low-level details once we commit to a tactic.
If the high-level description can be understood good enough by an AI to produce
a low-level correct (or close to) result then, it seems we have achieved an
standing challenge: to trim the complexity derived from the semantics of the
programming language.
It is undeniable that the ability to produce code from natural language
descriptions is an interesting achievement.
Natural language unlike a programming language is imprecise.
Many attempts at making natural language more precise had been done throughout
history, have you seen a legal document?
In the early days of Expert Systems, one foreseeable application of deductive
systems was to model legal systems.
Have you heard of a government adopting such a system?
That is an indication of how difficult it has been to model natural languages.
Nowadays, we can ask Claude to write a program in Ada to sort a collection
of numbers and we, most surely, will get a functioning program.
However, the more complex the request in natural language, the more ambiguous
it is expected to be.
Therefore, it is natural that the expressiveness of
AI programming models has a limit.
This limit is determined when the AI fails to produce correct results.
Unfortunately, to determine the expressiveness boundary is challenging (at best)
in the general case; the performance of AI models depend on the training set, and the
training itself. This behavior, however, has a well-known parallel in
programming languages.
Expressiveness has been a driving force in traditional programming languages.
From Assembly languages to High-Level languages to Domain-Specific languages.
Each layer is more expressive that the preceding one; more can be said with less.
However, unlike AI programming models traditional programming languages
are considerably more formal, and hence, more predictable and safer.
We have seen the inability of two worldwide AI systems to develop correct
programs for a seemingly trivial puzzle.
I have high expectations that, with patience and a fair amount of water for
cooling and coal to burn, I could successfully guide either AI to a complete
correct program.
However, I would be programming in a language that is not precise and must find
the tool's expressiveness threshold constantly just to achieve a program
for a trivial problem.
Moreover, there is no guarantee that my program, that is, my prompt,
would work correctly in other AI models, or in other versions of the same model.
My program would be ephemeral, bounded to a model with billions of neural
connections that are ultimately useless.
Nevertheless, AI adoption in program creation is rising.
If we consider Dario Amodei's claims [3] as an expert in the state of the art
and worldwide adoption of AI, without considering that his immediate future is
linked to that very worldwide adoption, by the end of 2025 (in 6 months time
from the publication of this write-up) the vast majority, or at least a significant
proportion of new code will be written by AI.
This is a very interesting claim, the quote does not specify if that 90% is
AI generated code patched by humans to compile, code obtained by small
suggestions inside Integrated Development Environments, etc.
Regardless of how this AI generated code will reach execution on machines
worldwide, it exhibits plainly two standing challenges in programming:
1) Traditional programming is time-consuming enough to not be able to
meet its demand and
2) If the generated code requires little human intervention, then it means
code with basically the same functionality is being written time and time again
worldwide. Often enough to become predictable and hence, AI-generated friendly.
Unlike many mathematical creations, Computer Science was blessed to be born with
a complete formal model; it was born on rock solid foundations.
From those foundations computer scientists have built more expressive systems,
systems that favor one view over another, or focus on some aspects that others
neglect, and from them we have the current programming languages zoo.
The standing challenge is to organize that zoo into a coherent framework
that allows to write correct and efficient programs.
If we visualize our current programming languages as a lattice, the bottom
would be the Turing model, the middle nodes the programming languages and
the yet to find language would be the top of the lattice.
I think this search will lead us to question what is programming?
and how are we programming?
[1] Dijkstra, E.W. "Why correctness must be a mathematical concern",
in Boyer, R. S. and More, J. S. (eds), "The correctness problem in computer
science", Academic Press, 1981.
[2] Harris, J. M., et al., "Combinatorics and Graph Theory",
2nd Ed., Springer, 2010.
[3] https://bsky.app/profile/wired.com/post/3lrgabtrrg22l
Accessed last time: 18-6-2025
Copyright Alfredo Cruz-Carlon, 2025