Foundations of System Integration

Working as an architect the scariest problem is to discover that two critical systems were built in a way that makes integration close to impossible. The root of these problems are choices that weren’t made as choices but as assumptions. Choices made actively usually aren’t too bad — the developers made some attempt to come up with strengths and weaknesses of their solution and made the best choice they could. With serious thought beforehand, even a suboptimal choice is unlikely to be disastrous. It’s the choices made without thinking, the unspoken, unthought assumptions about what MUST be true, that become the deepest problems.

Let’s go through common assumptions that I see made in general, and when integrating with AEM in particular.

Assumption #1: Systems Use the Same Language.
This one comes up when two teams — sometimes at different companies — are given a task to “build an api” and “consume an api”, respectively. The requirements for the API just describe what it can do but not HOW. The team creating the API ends up using a language-specific tool, such as JMS, to build the initial API. Unfortunately, the consuming team doesn’t use Java — they use PHP! At this point the project will almost certainly encounter serious overruns, as the choices that remain are all terrible:

Rewrite the API using a common interchange format such as XML or JSON. This is the best option.

Use a PHP->Java bridge or special library, resulting in weird, hard to maintain code and a very specialized setup.

Retrain the PHP team to use java, and possibly require a completely new approach to the system they were responsible for.

All of these options vastly increase the amount of time and energy needed for this part of a project. If this cross-system API is a critical dependency, the whole project may encounter delays or even failure. Sometimes launch dates simply CANNOT be moved, such as corresponding to a major sporting event.

Solution: Never use a language or framework dependent messaging system or integration point without making absolutely sure that all client systems are fully compatible with the format. Beta, untested, badly documented software or frameworks don’t count. No, it doesn’t matter how cool they are.

Assumption #2: Shared Location/Access
This assumption tends to really be a cluster of related assumptions revolving around access, latency, time zone, and even support personnel. The assumption tends to get made accidentally at the early testing phase: A team builds out their api and testing server at the same location, writes their tests and plans and begins work. That a system might be on a different network, may require special authentication, may have high latency (or go down) simply doesn’t come up. Depending on the nature of the business problem being solved, discovering that you cannot depend on these things can range from an easily recoverable annoyance to catastrophic failure.

Solution: Always assume that integrated systems will require authentication, will be on a different network, and fail completely and without warning sometimes. If you get assurances to the contrary design with these possibilities in mind anyway — but perhaps don’t throw a ton of resources on those items. Assurances you get at the beginning of a project don’t always hold till the end of it — or after. Put yourself in a position to be flexible.

Assumption #3: Linear Performance Scaling
This comes up with massive systems where performance probably does scale linearly… for a while. For example, a web API is built to serve something simple, such as retrieve and save account data. In performance testing, it is found that the system scales linearly very well. Then, post-launch, the problems begin:

The system becomes popular and overwhelms the original server. In order to keep the API going, you now must shard data, do load balancing and data migration, deal with data discrepancies across servers, etc.

API changes continue to add to the number of commands available to clients. While some of these commands scale linearly, others are heavy, leading to inconsistent performance. Clients that depend on stable performance (another assumption!) are impacted.

Assumption #4: Clients are “Nice”
The tales of large companies getting hacked and occasional news stories about large scale DDOS attacks should come as a warning: not all users of your API are benign.

Most good teams building a public-facing API are aware of this and design around it. The dangerous assumption happens when building a private API.

Private APIs usually have screened clients, possibly only being available over VPN and even restricting access to specific IPs. It may be that there is no public web access to the system at all. Is it safe to assume that API clients can be trusted?

No. Never trust clients any farther than you absolutely have to. In many cases, the systems calling a private API are doing it in response to a more public API of their own. Here are a few ways a private API can be broken:

The client is public-facing and is hit with a DDOS attack. Each request causes the client to call this private API, allowing the DDOS to spill over and affect the internal systems.

A sysadmin makes a very human mistake and ends up revealing the API to the public web. One of the many ongoing port scans done by malicious parties picks it up and they begin an attack.

Data coming into this system is incorrectly filtered by the client — e.g. the client may have different data escaping patterns and requirements than your system. What is “safe” to the client system and your internal system may not match. As a result, data is corrupted.

Solution: Never trust client machines. When you must trust — verify, filter, and log.

AEM Tutorials for Beginners

May 3, 2020
Estimated Post Reading Time ~