On the Brink of War

The Client

Microsoft
Microsoft
Uber
Uber
Bing
Bing
MSNBC
MSNBC
Shopify
Shopify
NBC News
NBC News
Starbucks
Starbucks
Time
Time
Discovery
Discovery
WPEngine
WPEngine
Campbell's
Campbell's
Skype
Skype

WebDev Studios is a large development agency with almost 40 staff. Having worked with many big brands including Microsoft, Uber, Starbucks and Time, they are one of the leading providers of WordPress services. They are very active contributors in the WordPress community: writing many popular books, coding several popular plugins and giving talks at prominent conferences. The de facto standard high-end WordPress hosting service, WP Engine, regards WebDev and their CEO as the best technical team they know and refers them work highly as their top choice.

The Problem

We were called in by WebDev’s client when they were struggling with a big project. The project was a data-heavy web application with large entities that needed to be constantly updated and served to a user front-end with multivariate browsing. We were asked to analyze and verify their work.

Technical Issues

The project had come to a complete standstill. All major technical features were not working as needed. Many more minor features that depended on those major features were unusable. They attempted solutions for months with no significant progress.

Speed

A major part of the app was a backend processing unit. There were data sources to be fed in, but they contained only a portion of the data the app needed. Extra query and calculations steps were necessary to complete the data before insertion. Processing a data feed of 150,000 items took almost 40 hours. Many of the items expired in 4 hours so, by the time they cleared the system, they had expired a day and a half ago. This meant the site would be filled with useless items.

Geolocation

The biggest usability feature of the app was a geolocation system that would locate any user that came to the site and serve them only content relevant to their location. While some of the content was global, many of the items were specific to certain cities. This became a liability for the user experience. A user in California does not care about items only for New York users and vice versa. In addition, these local items were often much higher value than global items because they were highly targeted so they simply could not be ignored.

However, auto-location was not working. Every solution they tried created unresolvable errors. So attempts were made at a band-aid fix to allow the user to enter their location. The implementation was clunky, forcing the user to use unnatural input because it was the format they got the data in. The band-aid fix did not work either. It could not locate the users properly, and distance filtering did not work, always showing irrelevant content from across the country.

Corruption

Data that did get processed was not displayed properly. Search features that were supposed to return items did not return anything or returned everything in an unusable order. Interactive features that were supposed to do dynamic, external data fetching did not work.

Executive Attention But No Progress

WebDev brought their best talent to the project including their CEO, executive team, and top engineers. They tried their best to address their client’s concerns but were still not able to deliver.

Missed Deadlines

The project was months past its deadline. Even worse, the deadline was part of a much larger plan. There was lots of marketing that could not start until the build was finished. Now months of planning were getting pushed back farther and farther. The project had already taken twice as long as expected.

On the Brink of War

As time dragged on, tensions rose between WebDev and their client. With so much going wrong and no signs of progress, their patience ran out. Their client, under pressure from a concerned investor, was very unhappy. Tensions turned into fights. Meetings became frustrating and combative. There was shouting and yelling on every call. Their client wanted answers, and WebDev only had excuses. WebDev tried to blame their client’s server team for many of the technical issues, but the client knew better.

WebDev was staring down the barrel of a lawsuit. But this was not just any lawsuit. Their client was a multi-million dollar startup with powerful backers. Every month, they had major overhead expenses. WebDev could be liable for not only the project expenses but also the operational expenses as a result of them continuing to delay everything. Their client informed us that they already had the most powerful lawyers in the nation already on retainer. WebDev did not know any of these details at the time, but this was a client they did not want to mess with. WebDev was in over their heads in more ways than one.

The Solution

WebDev’s crisis is a common sight for us. This is what happens when technical issues can’t be solved. Technical issues are one of the few problems that can bring projects to a grinding halt. The problem with technical issues is that there is no guarantee that you will be able to solve them. Just because you throw more bodies at the problem doesn’t make it go away faster. It is the biggest reason why companies fail: they could not execute. This is the ugly side of business that people don’t like to talk about but happens all the time.

Technical issues are like a virus that slowly takes over a company until it kills it. It starts first with the engineering team where failure kills morale and makes their performance worse. Engineering missing targets ruins plans for other parts of the company like management, operations, and marketing who now have new issues to deal with on top of their existing ones. Customer service then has to spend all its time dealing with angry customers. Soon the damage to relationships is irreparable. What starts as a simple engineering task could ultimately create the downfall of a company.

With the situation spiraling out of control, we had to stop the bleeding and save WebDev and their client.

Speed

When we analyzed their work, we saw fundamental flaws in their approach. The flaws made it impossible to create anything usable. These issues were apparent to us on the first day. They were fixed in one day. Months of headaches and pain was turned around in a single day.

Data Model

They tried to create a flexible data model to speed up development, but their simple model killed the performance. They were using meta-based one-to-many storage locked in by the framework. This was really flexible because it allowed infinite, dynamic expansion of fields on entities so they could change their mind whenever they needed to, making development easier.

The problem is that the entities they were working with had up to 40 fields. That meant 40 joins to get a complete entity. A table of 10,000 entities took up 400,000 rows. Entities had relations to multiple other entities which each had up to 20+ other fields making getting complete object set a scary thought. Indexing and sorting were incredibly slow.

What is easy to build is not always the best to build. We knew this model had to be ditched. We replaced this data model with a leaner flat model. It was indexed properly for fast performance. Transitioning to the new model completely would take too much time, so we created a bridge between the two during migration. What took 40 hours before took less than 5 minutes now, a speedup over of 480x.

Overhead

They were using default framework commands. Frameworks often provider helper functions to do common things like insert and update objects. The issue is that these functions are rarely ever scalable. These functions often call other functions and perform extra queries to check for certain things like data validity. This creates massive overhead. They work for very small use cases with a couple hundred items but are disastrous on larger systems. They had almost 200k and would need to scale to millions.

People use framework commands when they do not understand the underlying data model. Inserting objects in multiple linked tables the way the framework expects it is tricky and dangerous. Doing it wrong can easily destroy a site, so people just avoid it. They’ve only used high-level abstractions instead of raw code. The trade-off is severe performance issues.

We were able to monitor what their framework was doing when it inserted objects and replace the framework commands with our own leaner commands that bypassed the unnecessary checks. An insertion with commands was taking 8 seconds. With our leaner, we reduced that to less than a hundred of a second, a speedup of over 800x on insertions.

Geolocation

The geolocation issue stemmed from a third-party plugin they were using but did not know how to set up. They did not understand how to get the plugin to work on the client’s servers. In their internal testing at their company offices, they were using pre-built pseudo-servers like XAMPP that already had things set up for you. So on a real server, they had no idea how to do it and could not instruct a server team on what to do.

We do not use third-party code unless we need to for a good reason. The reason why is because they carry liabilities that you will have to deal with at some point whether you realize it or not. This plugin was sending requests to external servers for geo calculations when we could emulate the data locally and get a major performance boost. So we ditched the plugin and wrote our own geolocation module.

The crucial step they were missing was setting it up on a live server. They brought their entire company of almost 40 engineers to the project, but no one could figure it out. The answer lied in the fact they didn’t understand how to work with a user’s real IP behind a proxy. This took us a day to figure out. Once again, months of hassle solved in a day.

Crisis Averted

With major technical features quickly working, their client completely changed their tone. They were so happy to see progress and a real future for their company. With things working, there was less pressure to recoup related losses from WebDev. A major lawsuit no longer seemed inevitable, saving WebDev from potential corporate liquidation. Peace, it seemed, was finally possible.

The Results

1 Week

Time To Resolve 4 Months of Technical Issues

480x

Processing Speedup

800x

Insertion Speedup

1

Major Lawsuits Avoided

The Lesson

Use A Technical Advisor As An Insurance Policy

Having a highly technical advisor works as double insurance: insurance against the bad and insurance for the good. They don’t need to be engaged in a major way (ex: full-time hours). They can just be there on the sidelines to give guidance and resolve issues much faster.

Against the Bad

WebDev genuinely thought they were doing the best anyone could, that any more performance would naturally take months of research and development or was just simply impossible. They did not know what they did not know. On bigger projects, it is very easy to run into technical issues you are not used to. Instead of wasting precious time trying to figure it out or paying someone else to on your dime, you should have a technical advisor that specializes in high-end projects advise you.

They will save you much more time and energy than you think. If WebDev partnered with us before this project, we would have saved them months of agony and accelerated their delivery beyond imagination. We fixed months of issues in just days. They were heading toward a major lawsuit that could have ended their company, but all that was avoided when we stepped in. A technical advisor can literally save your company.

For the Good

Having a technical advisor is not just for avoiding bad things, but also for actively moving your company forward. If WebDev had partnered with us before, they would have impressed an important client. Their client was a multimillion-dollar startup with powerful backers. Doing well for them could have opened the door to a whole new world of possible clients and projects. It could have completely changed their company. A technical advisor grows your company faster by getting rid of the stumbling blocks.

Go With Consulting Firms Not Development Agencies

For high-end work, do not go to development agencies. Just because a company seems popular, comes highly recommended or has done work for big brands does not mean they can handle high-end work. Agencies with big brand clients look impressive, but the projects they do are almost always basic projects like blogs or landing pages. Big brands use consulting firms for more involved, high-end work.

Even the largest agencies need help. Agencies can’t specialize in high-end work because it is too demanding and requires too many high-level staff. That is why it is critically important to screen. A technical advisor can do that for you.

Development agencies have a different business model than consulting firms. Agencies work by doing simple projects at high volume. Consulting firms specialize in only high-end projects. Who would you trust? Someone who does high-end work daily or someone who rarely needs to? The answer is simple.