coding trails

Modernization Code Converter

June 3, 2025 by schildawg Leave a Comment

  _____        __      _____                      __         
 / ___/__  ___/ /__   / ___/__  ___ _  _____ ____/ /____ ____
/ /__/ _ \/ _  / -_) / /__/ _ \/ _ \ |/ / -_) __/ __/ -_) __/
\___/\___/\_,_/\__/  \___/\___/_//_/___/\__/_/  \__/\__/_/

Overview

Let’s face it, modernization efforts are a necessary evil. They are large, and complicated, and add no business value. Whatsoever. Until your application doesn’t work anymore. Then it is too late to start modernizing.

In the case of Indiana DWD, they have 2 million lines of code still running in Java 8. Last I checked we are on 24. This isn’t a large scale application by any means–I’ve seen much, much larger. But it’s not trivial, either. If you compare lines of code to words, it’s the equivalent to the Wheel of Time series. And with the half-life of software sitting at around two and a half years, that’s a lot of modernizing. There is a definite solution to this problem, but that’s the subject for another post.

So I did what any architect would do. I proposed a meta-programming solution. It’s been a long time since I wrote code. I now write code that writes code. Or have AI write code that writes code, that writes… you get the idea.

It’s a lot of code, but ultimately every Java EE project that needs to be modernized is pretty predictable. An aenemic domain model, with a DAO layer, services, controller and front end. SOAP, REST, hell, even Microservices all follow this same pattern. Write a lot of garbage code, and leave, because writing code is fun, and reading it is painful. But that’s the next developer’s problem. The problem is, usually you are the next developer. The good news is, as I said, it is totally predicatable.

This predictability is what makes automation so effective. Parse the code into a syntax tree, transform it, and print it back out. As modernized code. You don’t even have to play around with Lexers and Parsers. Java has been around for so long there are numerous libraries that do it for you. JavaParser is a good library, so I started with that.

Getting Started

The first step may not be so obvious. Write a printer that spits the code back out, unchanged. For a language like Java, this is around 200 – 300 lines of code. And make sure everything compiles. If you skip this step, you will regret it, I promise.

The next step is to generate unit tests. I’m not sure why, but legacy code always seems to have little or no unit tests. At this point you are not unit testing for correctness, but rather to ensure that you have a base line for refactoring. I like to think of this process as turning a shirt inside-out. Close to 100% coverage can be generated fairly easily for almost any Java class.

Yes, I generated coverage for getters and setters. I know there is a big argument about whether unit tests are needed for getters and setters. They are easy to do. And, yes, I found several hundred defects relating to getters and setters in the code base. Generating a unit test for other methods can get tricky, but not quite as tricky as the average LeetCode question. If you want to take a short cut, it’s fairly easy to set up a CoPilot API that you send the method and have it return the unit tests.

Converting

So, what types of conversions can you do? Everything ranging from simple refactors, like introducing the “var” statement or adding the diamond operator, all the way to large structural changes. Using the Visitor pattern as a basis for the printers, the simplest changes are only a few lines of code, and the largest are seldomly larger than a few dozen.

I wanted to introduce Kotlin to the domain model, because of its compatibility with the JVM, and its clarity and readability. Indiana wasn’t ready for Kotlin, so we settled on Lombok. They were already using it, and I like it. It’s a great compromise. A few lines of code to conditionally add annotations and remove boilerplate methods, and the code is already much more readable. With a comprehensive unit test suite in place, we can proceed with confidence.

Of course, you WILL run into problems. The code is predictable, but not THAT predictable. Developers do weird stuff like having multiple getters for the same field, differing only by case. I added audit loggers to output failed methods, and loop it back into the converter–next time, leave that code alone during conversion and add a TODO to fix the problematic code.

Hibernate to JPA

Converting from Hibernate wasn’t much more challenging technically than the Lombok converter. You have to read in all of the hbm files, converting the XML to annotations. Then in the JPA printer, you look up the annotations by class and field name. There are small things to consider. Your lookup needs to make sure each entity is mapped only once. JPA doesn’t like it if you have two entities with the same name. I said it is technically not harder than the Lombok, but you really need to know JPA in depth to make sure you get the annotations completely correct. The hardest annotation, without a doubt was @ManyToOne with the inverse mapping in the other class.

Done?

So the code compiles and all 4k unit tests pass. We are done, right? Right? Not even close. To ensure the conversion is working properly, you need integration tests. Can you create the entity? Read? Update and delete? Do the named queries work properly?

The starting point for the Integration Test generator was the Unit Test printer. First we need to set up the infrastructure. I used an in-memory H2 database, so it technically wasn’t an “integration test” but was more like a component test, but changing to a real database was simply tweaking a few properties.

The tests should be able to run in any order so this requires setup and teardown for each test. There should be one test for create and read. One test for update for each field, and one test for deleting. For example an update would create the db, add the entity, assert the value of the field, update the entity, read the entity, and assert the value changed.

For the unit tests, I could get away one value for each type. String is “TEST”, Long is 0L, etc. This doesn’t work for integration tests. If the field is mapped with a @Column(name = “test”, length = 1) the value “TEST” will fail. “T” would pass, but I wanted to make the tests a little nicer than that. There’s test data in the database. Use it. I created an adapter to read the most used values from each table in the database, and created sample data for each entity.

Miscellaneous

Not strictly necessary, but a code-review suggested I create test fixture constants eliminating duplicate values between the unit and integration tests. That was fun.

I also enjoyed creating the REPL. It made the conversion process more convenient. I added commands enable and disable generators, modify test values, show values, and run each step of the process. Plus it makes for a good demo. That’s always helpful.

Big Ball of Mud

All of the entities were in a Common JAR in a single package.

more to come…

Paradigm Shift

October 10, 2024 by schildawg Leave a Comment

In order to keep things simple while covering the basics, up till now I’ve written as if there is a single mathematical model governing the relationship of all words to each other. This is certainly not the case. There are potential models for every person in the world. For every philosophy, every religion, and every viewpoint. There are even valid models for every combination and permutation of every existing model. If a single model is a universe, the set of all models is the multiverse!

In upcoming posts I will explore how different models can interact with each other, merging or clashing. Models are also not static, but are continuously changing. However, in this post we will look at how even using the same training data can result in different models.

Let’s say we have two models, one with 1,000 dimensions, and another with 150, and we train them both using the book War and Peace. Both models will contain around 20,000 words. But the model with fewer dimensions will be lower resolution, and the words will lack the nuances of the model with more dimensions. If you reduce the resolution too much, the model becomes worthless, like a blurry picture where you can’t make out the details.

Because of diminishing returns, the 850 extra dimensions do not make the model 5 – 6 times more accurate. But if you include or omit an important dimension, it can make a huge difference. A perfect example of this is looking up into the night sky, and seeing two stars which appear to be right next to each other. But because of their distances from earth (the dimension we can’t see) they might really be very far apart.

This is what is known as a “paradigm shift.” A paradigm shift is when a single bit of information fundamentally changes your view of a concept or situation. Similarly, in a model, a single dimension can fundamentally change the meaning of a word.

Let’s take for example the words “pepper” and “jalapeño.” The two words are very similar, and share many dimensions, clustering them close together in the model with other vegetables and fruits. But if you add the dimension “spiciness” they become much further apart. Though not as far apart as “jalapeño” and “strawberry”.

Without this dimension, the prompt “create a recipe for two alarm chili” might end up substituting “jalapeño” with “bell pepper”, resulting in some disappointing results!

LeetCode

October 7, 2024 by schildawg Leave a Comment

Disclaimer: I don’t believe LeetCode is a true judge of a software engineer’s ability in most cases, and do not endorse the use of it for interviewing. My main purpose is to showcase Algol-24, while at the same time proving to myself that my language can handle all these scenarios.

Index:

Two Sum – Easy
Add Two Numbers – Medium
Longest Substring Without Repeating Characters – Medium
Median of Two Sorted Arrays – In Progress – Hard

Longest Substring Without Repeating Characters

October 7, 2024 by schildawg Leave a Comment

Description

Given a string s, find the length of the longest substring without repeating characters.

Tests

As always, let’s start out with the tests:

/// The answer is 'abc', with the length of 3.
///
test 'Example 1';
begin
    AssertEqual(3, LongestSubstring('abcabcbb'));
end

/// The answer is 'b', with the length of 1.
///
test 'Example 2';
begin
    AssertEqual(1, LongestSubstring('bbbbb'));
end

/// The answer is 'wke', with the length of 3. Notice that 
/// the answer must be a substring, 'pwke' is a subsequence 
/// and not a substring.
///
test 'Example 3';
begin
    AssertEqual(3, LongestSubstring('pwwkew'));
end

Solution

This is a medium level LeetCode question. Coding it is simple if you know the concept of the “sliding window.”

function LongestSubstring(S : String) : Integer;
var
   Start, Finish, TheMax : Integer := 0;
   TheSet : Set := Set();

begin
    while Finish < Length(S) do
    begin
        if Not TheSet.Contains(S[Finish]) then
            begin
                TheSet.Add(S[Finish]);        
                Finish := Finish + 1;
                
                TheMax := Max(TheSet.Length, TheMax);
            end
        else
            begin
                TheSet.Remove(S[Start]);
                Start := Start + 1;
            end 
    end
    Exit TheMax;
end

Use a Set to keep track of the unique characters. Start out with the “window” start and finish at the beginning of the String, and loop until the finish is at the end of the String. If the character at the end of the window is not in the Set, add it and move the end by one. If not, remove the character at the start of the window from the Set and move the start by one.

When the end of the window is at the end of window, return the largest number of characters the Set contained. (Every time you add a character, set the max to the length of the Set, if it is larger than the previous max.

The Big Bang

September 25, 2024 by schildawg Leave a Comment

“We don’t need no education” — The Wall

In The Word we see that the words in a language models can be visualized as constellations with precise mathematical relations between each other. And each constellation can also be thought of as a “thinking” entity which takes in other “context” words and redirects to the next word. (Adding itself to the context)

The code to navigate the model and generate content is pretty straightforward, and is not very difficult to develop or run, but the model itself is nothing short of amazing! It is something that a thousand programmers could not have programmed in a thousand years. We haven’t even begun to scratch the surface of understanding what is contained in it. So how did we manage to make these language models?

If you have a clear idea of the result you want, but have no training data, you can use “evolution” to teach neural networks. This is the method used for creating bots to beat Super Mario Brothers. The screen is the input, and controller actions are the output. The “hidden” layer is treated as DNA. The DNA is mutated, bred together, and divided into species. Over many generations, only the fittest DNA is kept, and the rest go extinct. The same principle can be used for evolving computer programs. I plan on using my BrainF*ck interpreter to create a laboratory to play around with this concept in PRISM, and I will write about it in depth.

Fortunately, with many zetabytes of content on the Internet, there is no shortage of content to train our model. Where do we start? First you create an hyper-dimensional void, and add all your words. At this point we don’t have the faintest clue where the words should go, however. And we don’t want to place them all at the origin, because that will introduce a bias towards the origin point. So we “explode” the words out, selecting a random hyper-dimensional point for each word.

Next we start working our way through the entire Internet! For every book ever written, every blog post, and every angry rant posted on a forum, we run it through our model nudging the stars to their correct place in the universe. This is where the hard part comes into play. You provide the inputs and the outputs, and use “backpropogation” to turn the knobs in the hidden layer of the neural network. This involves complex coding and advanced mathematics to pull off—people have written thesis papers on the topic, so I won’t be writing about it 😀

So, 64 ZB and a billion dollars later, you get ChatGPT-5. I said we don’t know what’s in it, but we actually do. It is the synthesis of the entirety of human knowledge, organized into a nice, compact mathematical formula.

With this massive undertaking accomplished, the doors are now open to a new form of training. There is no need to start back from the beginning. AI can be tasked with training other AI, which can in turn train other AI… and on and on. What took years to accomplish can now be done in a matter of hours.

Solutions such as OpenAI have thousands of dimensions, vastly exceeding the point of diminishing returns. Reducing dimensions down to several hundred can be 90% as effective, allowing it to run on inexpensive hardware. Stanford’s Alpaca can be trained in an hour and a half and run on a $300 computer! Llama 7B has even been successfully installed and run on a Raspberry Pi!

Add Two Numbers

September 4, 2024 by schildawg Leave a Comment

Description

You are given two non-empty linked lists representing two non-negative integers. The digits are stored in reverse order, and each of their nodes contains a single digit. Add the two numbers and return the sum as a linked list.

You may assume the two numbers do not contain any leading zero, except the number 0 itself.

Setup

In LeetCode, the linked list is provided for you. We’ll have to add that ourselves. To make the test cases more readable, we will also add a ToString to print it out, and a factory to easily create it from an array of integers.

class ListNode;
begin
    constructor Init(Value : Integer, Next : ListNode := Nil);
    begin
       this.Value := Value;
       this.Next  := Next;
    end

    function ToString() : String;
    begin
        var ToString := Str('[') + Value;
        
        var Current := Next;
        while Current <> Nil do
        begin
           ToString := ToString + ', ' + Current.Value;
           Current := Current.Next;
        end
        ToString := ToString + ']';

        Exit ToString;
    end
end

function LinkedList(Values : Array of Integer) : ListNode;
var 
   Head, Current : ListNode;

begin
    if Values.Length = 0 then raise 'Must contain at least one node!'; 

    Head := ListNode(0);
    Current := Head;

    for var I := Iterator(Values); I.HasNext() do
    begin
        Current.Next := ListNode(I.Next());
        Current := Current.Next;
    end
    Exit Head.Next;
end

Tests

With that taken care of, we can add the 3 example scenarios as test cases:

test 'Case #1';
begin
    var Sum := AddTwoNumbers([2, 4, 3], [5, 6, 4]);

    AssertEqual('[7, 0, 8]', Sum.ToString());
end

test 'Case #2';
begin
    var Sum := AddTwoNumbers([0], [0]);

    AssertEqual('[0]', Sum.ToString());
end

test 'Case #3';
begin
    var Sum := AddTwoNumbers([9, 9, 9, 9, 9, 9, 9], [9, 9, 9, 9]);

    AssertEqual('[8, 9, 9, 9, 0, 0, 0, 1]', Sum.ToString());
end

Solution

To get a O(n) solution, you just loop through the two linked lists at the same time, and add the two together. If the first or second node is Nil you set that value to 0.

Probably the hardest part of this problem is carrying to 10’s place. You can use division to get the value, and the mod operator to get the carry. The tricky part comes in modifying the while loop to make sure you continue till carry value is gone, even if it goes beyond both linked lists.

Note: it makes the solution easier by creating a dummy head, and just returning the Next property.

function AddTwoNumbers(List1, List2: ListNode) : ListNode;
var 
   Head, Current : ListNode;
   
   First, Second : Integer;
   Sum, Carry    : Integer := 0;

begin
    Head := ListNode(0, Nil);
    Current := Head;

    while List1 <> Nil Or List2 <> Nil Or Carry <> 0 do
    begin
       First  := List1 = Nil ? 0 : List1.Value;
       Second := List2 = Nil ? 0 : List2.Value;

       Sum := First + Second + Carry;
       Carry := Sum / 10;

       Current.Next := ListNode(Sum % 10);
       Current := Current.Next;

       List1 := List1?.Next;
       List2 := List2?.Next;
    end
    Exit Head.Next;
end

**********

Bonus

Using operator overloading the code becomes more readable:

operator + (Other : ListNode) : ListNode;
begin
   Exit AddTwoNumbers (this, Other);
end

Now the code looks like LinkedList([1, 2, 3]) + LinkedList([4, 5, 6]).

Bonus #2

In an interview the other day I had the good old “reverse a linked list” coding question, in Java of course, but here is the equivalent in Algol-24 😀

/// Add to ListNode class.
///
function Reverse() : ListNode;
var
    Current, Previous, Next : LinkedNode := Nil;

begin
    Current := this;

    while Current <> Nil do
    begin
        Next := Current.Next;
        Current.Next := Previous;
            
        Previous := Current;
        Current := Next;
    end
    Exit Previous;
end 

test 'Reverse a linked list';
begin
    var TheList := LinkedList([2, 4, 3]);

    var Sum := TheList.Reverse();
    AssertEqual('[3, 4, 2]', Sum.ToString());
end