Categories

The Truth is the Code

So now that the press has picked up on our recent funding announcement and it is official that I have joined Acellere from the former ranks of DTCP. I thought it is a good time to talk about why I was so excited to join this company and why we are such a ridiculously cool and promising company.

 Code as a Platform – the history

I think to fully understand our mission, one has to understand the intellectual category we are in. And in modern words that would be what I call “code as a platform”.

That category is both old and new. It’s old, because code and tools that cater to it have been around as much as 3500 years when the first abacus likely came out and helped its users train their brains to use better algorithms to do basic, intellectually non-trivial computation. The code in this example is the operating algorithm of the abacus. The abacus being the programming language. And the human using the abacus being the computing device the programming language executes on. The abacus here both Is the language and the developer tool, as it materializes the state of computation and hence helps the computing device which is also the developer to find the best algorithm to solve its problems.

Things evolved quite a bit since then. Humans formalized computing systems and hence the human no longer is the computation device. We now have “computers” for that. They come with a larger set of algorithms for solving standard maths problems and have become easier and easier to use by also non-technically, non-mathematically trained users.

One of the recent larger inventions in the field in the evolution of developer tools was the development of compilers. They allowed users or programmers to introduce new languages in which to express algorithms and computation instructions and separated them from the underlying complexity of lower level programming in Assembly and machine code. That was so well-received by programmres and users that we saw two things. First of all, writing code had became so easy that people started to enjoy it and wrote software projects that were millions and hundred of millions of lines of code long. Something that then became challenging from a maintenance perspective and required the analytical and engineering brains of the industry to come up with standard forms as to separate such large projects into smaller isolated components. Which lead to the emergence of modules, classes and object oriented design. Secondly, this lead to the development of more and more such new languages and communities around these languages that build more and more products on them. Because each new language reached a new set of people who priorly didn’t understand the other languages. And suddenly you saw more and more people using and writing software and creating products. This is probably why someone said at one point that software is eating the world.

The emergence of larger codebases and the explosion of languages and projects in these languages is very interesting to understand where the market is going. And especially where companies are coming from that are in the space of software code analysis akin to our first product Gamma. Because the massive growth on the right side of the spectrum – right being close to users and new languages and products and left side being closer to the machine, less dramatic and radical in changes -, more and more software tools are being developed that help the growing number of programmers. Call this space DevTools, DevOps, etc. They all spring from one core idea: see what this right side is doing, where there are many people having the same problem and solve it simply with product. This is how you get something like GitHub, Jira and and Gitlab. And this is how you see for example the a static analyzer like SonarQube appear. Some open source developers initially started writing their own static analyzers after being disappointed with what was out there, until you had a larger community working on it. Then someone raises the idea that valuations for large communities are great and they need financing, and they formed a company and raised a lot of money. Good stuff. But lack of vision and understanding of the relevance of code.

What really matters to understand something as simple as static code analysis is what code really is. And you cannot understand code without understanding the runtime environment of that code. Code written on the right side of the spectrum, in a higher order language is basically just an idea of how a software should function. This idea still needs to be translated to work on a specific operating system by being compiled or interpreted. That compiler must be bugfree for this purpose and must know the operating system very well and is itself compiled code. And that operating system again is nothing but compiled code, that has to be compiled against the underlying hardware architecture it runs on. And it needs to be written and compiled against such architectures whenever the architecture drastically changes. On top of that, it also has to still work when parts of the architecture are changed – from chipsets and components. And that, too, requires writing of code – or drivers – and compiling them against the operating system and/or machine code. And even now, once we are on the chipset level, chipsets are nothing but a huge collection of components that are basically code – not complied this time, but manufactured into a chipset. And even these components just run on electric circuits which are compiled against the electronic components that compose them. Etc. Yes, all that that is basically code.

And it goes into the other direction, too. Humans that write code do so with a larger set of trained behaviours, patterns and thoughts, which are essentially algorithms and hence code themselves. As a developer, a programmer isn’t just a human being, but really a flexible learning A.I. that learns more and more algorithms over the time of its life as it develops and observes code.

And on top of that, even the users that use products – software or not – that the developers create, interact with them in trained fashions and algorithmic ways – which is what we learn from design thinking – and hence also code.

So if you understand that code is so essential and pervasive in our world, then you are feeling a bit dumb if you think of code analysis to only take place in higher order languages. And you are plain wrong when you think that showing a few anti-patterns and programming best-practices is improving code quality.

And that’s it for now. Acellere’s category is basically Code. And what we can do to make Code more powerful by connecting all those disintegrated segments of the world that are code by technology. But enough philosophical discussion, this is the background to understand our first product Gamma, which I will explain now.

Gamma – Version 1.0

Coming back to what our current product actually is. In essence it is a static code analyzer that looks at all the anti-patterns that exist in code. Both from a programmatic and architectural viewpoint.

To achieve this, we developed our own parsers for key programming languages out there and created representation models to understand code and comments in a way that no other solution on the market can. As a by product, we also can offer to find and highlight anti-patterns in any code library similar to our competitors, we just find a lot more and continue to find more. So from that viewpoint, our MVP for Gamma was to simply make this pattern detection fast and simple, build a KPI-tree kind of model to map found issues to an overall score in code quality and build the software tools needed to make this actionable for programmers. So Gamma MVP is a superior software quality analyzer running in the cloud that can be hooked up to developer’s favorite IDE that guides them increasing their code quality from various aspects. On top of that, we build a nice interface for our cloud product that allows them to visually explore their code base and how and where such anti-patterns occur.

But that is really just the MVP and it is based on a “by product” of our actual work. It’s still good enough to beat our competitors and it is definitely a product that we will sell as a main product line.

But now comes this part with “artificial intelligence” and machine learning. First of all, we scan millions of github libraries out there on a regular basis and match it with Jira libraries where we can. That gives us not only access to a ton of software projects that we can analyze to find new anti patterns and structural issues in code, it also helps us train several important algorithms.

One that can identify how and when and why new commits lead to an improvement or worsening of the code base. That is interesting from many ancles. First of all, which developers are contributing good and part parts to a project under what condition? How does the mythical manmonth or the total lines of code and features produced per unit of time given resources and architecture impact the evolution of code quality. So we are talking core metrics of building and operating software development teams from large enterprise level to small agile open source projects and start-up projects. As a by product from just this, we can potentially rank all git hub developers in their ability to product high quality code in a specific language or product domain and under given conditions. Neat, isn’t it?

But on top of that, we build our natural langua processing neural networks that are able to track JIRA logs to see which feature requests by whom are targeted at improving software quality or producing new features. We hence can see when something is supposed to have a positive impact by either killing a bug or improving an architectural aspect of the software, how it is intended to be solved and how it is solved and what impact it actually has. So we have a performance measurement tool along all aspects of the product development roadmap and software quality management that helps us measure development performance from both management and developer view.

Once we understand the evolution of code in development and the communication of developers around it, we have a very unique basis to start looking how these metrics sit in the wider and global economy. What is the impact of a particular project on the top line or bottom line performance of a company, how does the focus on features over quality on a project that potentially runs on millions of machines globally affect the environment – “Green code” -, how does a specific culture of a team or the skillset of developers or the time pressure on a team affect society when it is working on e.g. a critical component of an IoT component with potential billions of units using it around the globe? What is the cost of a critical bug in this environment.

This goes on and on. But we are still talking about code as code in a higher order language. The interesting part comes once we bridge through the compiler and OS settings. One thing we work on is using A.I. to really develop superior architectures of components. We actualy have parts of this feature already in our current gamma cloud version. It helps you uncover how to better separate concerns in a large class. But deep down, the question is: how can we understand the whole system. Let’s say we look at Chrome. What are the best compiler settings when compiling the latest Chrome release against Windows 10 to increase runtime performance, lower battery and energy consumption and maximize security. And how does this compare against the same machine when we replace the core CPU from Intel to an AMD processor architecture? Is this an OS kernel/driver issue? Or is it a chipset issue? Let’s assume it’s not the operating system. What can we learn about chipset design that helps us improve them? Once we are able to understand the overall metrics of any kind of software release on the total set of all runtime environments of that code, we basically start to understand the whole stack of technology just by starting from the code analyzer in the higher order languge.

Once we solved that puzzle, we can ask how to bridge the gap between the engineers that working on these separate domains. How can we get operating system designers to work together with chipset manufacturers and help them build better infrastructure for software products in different domains?

And as we handle more and more complexity in the architectural domain of code, we also have to solve the question how we make all this complexity accessible to developers and people that maintain these systems.

Code as a platform

And this is where we see the commercial category we are in. The future at some point down the line will no longer be written by companies that hire a fixed set of engineers, to maintain a fixed set of code to patch together a fixed set of products, that they market to position themselves among buyers of their products. The future doesn’t belong to product companies. The future belongs to those companies that hunt code and people without any boundary defined by the relationship to a marketable product. The code and people are the product. And they don’t belong to the company. The company is the rather the world where they express themselves. Kind of like Google search. Just rethought in the world of Code.

With the ongoing technological advancement we see, building new products very fast and pushing them to people becomes increasingly more simple. You cannot hold together a business anymore under this increasing velocity of change. But what you can hold together is the integrity of the global code base and the people that you train to access this massive code base. That is where the commercial viability comes from: having the ability to allow people to use this vast knowledge pool. And giving them access. And giving them the ability to function in this world.

And that is what Acellere is all about. Basically, just code. In general. And specifically.

Leave a Reply