A Tribute to Our Dev Ops

Insider Blog

Photo by RealToughCandy.com

Why developing a (web) application is only the beginning

Web applications are mostly developed and tested in a Dev environment (Development environment) - however the actual endurance test takes place on the Prod systems (Production environment). Here, DevOps (short for Development Operations, representing the close integration between development and IT operations) plays a key role. Especially with web applications on Linux servers with complex network infrastructures, it often only becomes clear on the live system how stable and efficient an application is running. In this article, we take a look at some popular theories surrounding DevOps in the web environment and examine their validity. In particular, we will focus on aspects that are relevant to mid-market customers and partners of a software agency.

We will shed light on why development environments reach their limits, why genuine load tests and Edge Cases are difficult to simulate, and why work begins properly only after the launch (namely under monitoring and continuous improvement). We will also explore the tools that are indispensable in our daily work, from Sentry, Grafana, to Zabbix, and how CI/CD pipelines help even less experienced developers to deploy safely.

Reading duration: approx. 20 minutes

Development environment vs. reality: Limited performance and unknown edge cases

Thesis: Software projects and web applications are built in development environments that are limited in performance and test data.

During the development phase, a web application often runs on a laptop, a desktop PC, or in an isolated test environment of the developers. This Dev environment is typically less powerful than the later production servers and contains only a fraction of real data. This is normal at first - nobody has a complete copy of the production database on their laptop, and local quick work is also a goal. However, this discrepancy often leads to some problems not becoming visible in the dev environment at all. Also, with performance issues, there's a tendency to quickly assume that the productive system, with its superior performance, will not struggle as much. A staging environment can try to mimic the production environment but it has its limits: Production environments have nuances that are difficult to simulate in staging - such as real user behavior, large amounts of data, or complex system interactions browserstack.com . In other words: Everything runs "smoothly" in the test environment, but reality introduces completely different factors.

Thesis: Developers and clients often lack a complete overview of realistic use cases, edge cases, and peak loads.

Development teams and even clients know the major use cases of their software, but real users often push applications to their limits. Suddenly they use features in combinations that nobody had thought of, or input unexpected data. Such Edge Cases (special or boundary cases) often remain undetected in the specification. Only in an actual operation do errors occur, which no one had foreseen before. One reason for this is that some bugs only appear under specific conditions that are not anticipated in pre-production tests. For example, a user profile with an emoji in the name might trigger an error somewhere in the process – something that did not appear in the test dataset. Or a client uses the web app on an older browser and faces display issues. Cases like these often only show up when the application is being used "in the wild".

In addition, clients might be familiar with their business processes, but peak loads due to marketing actions or completely atypical usage times (e.g., at 3 a.m. on weekends) are easily underestimated. The critical view here: Modern approaches aim to close this gap by advocating that development and production environments should be as similar as possible (dev/prod parity) in order to minimize later surprises. Containerization (e.g. with Docker) allows to locally create an environment that at least in terms of dependencies and requirements of the software factors comes very close to the production environment. However, it remains unrealistic to anticipate all real conditions.

Thesis: Behavior under prolonged load, external crawlers, penetration tests, or spam-bots are difficult to simulate.

Performance tests are good practice, but constant load 24/7 over weeks cannot be fully replicated “in lab”. Most companies conduct stress tests before a launch – yet these often only last a few hours or days. How the application behaves over months (memory leaks? Database grows unexpectedly? Logs fill up the hard disk?) remains open. Also, malicious attacks are a topic. External crawlers (e.g. by Google, Bing, or others) might massively call up pages or spam bots cause atypical entries and calls. A constant barrage by a penetration testing tool or even a real attack simulation (DDoS) is only somewhat feasible, without possibly endangering the real systems or rendering them unusable for daily operations.

The developer and DevOps community are consistent on this: it is incredibly difficult to truly simulate the production load. Even with test environments that resemble the prod environment, unforeseen effects can occur. An interesting approach is therefore almost counterintuitive: sometimes, purposefully undersized test environments are used to uncover weaknesses. In the mentioned report, a small test database led to a growing log filling up the disk faster – a problem that would have only been noticed much later in a larger environment.

The overriding opinion in the tech community, however, is that nothing measures up to real production tests. Big players like Netflix even propagate chaos engineering, where disruptions are specifically generated in the running production to test the system robustness. An expert sums it up like this: “To test effectively, the system must be running in production. Because only in production can one work with factors such as state data, real inputs, and the behavior of external systems“ techtarget.com . This means, we will only see some errors once we go live. Then it's important to be able to react quickly – and this is exactly where DevOps comes in.

After the launch is before the launch: Analysis and optimization during operation

Thesis: Further analysis and optimization after the launch is essential.

In the past, a software project was considered finished after the go-live - today we know that Continuous Improvement is a critical part of successful software. It's precisely after the launch that the phase begins in which real use data is evaluated, bottlenecks are identified, and optimizations are made. As one expert article emphasizes: "Even with rigorous pre-launch testing, actual users can uncover performance problems in practice that were not obvious during development. Post-launch monitoring helps identify these problems before they affect a large number of users."topdevelopers.co . In other words: A launch without subsequent monitoring is like a maiden voyage without someone at the helm. In practice, especially small and medium-sized companies often underestimate this effort.

Critical Perspective: Some people think good software would run “out of the box” - but based on our experience, this is extremely rarely the case. It takes time and iterative improvements to stabilize and speed up a (web) application. Studies show that continuous maintenance can significantly increase user satisfaction and retention. This includes regular bug fixes, performance tuning (e.g., adjusting caching strategies, optimizing database indexes), and security patches. Security vulnerabilities that only emerge gradually must be immediately sealed off to prevent damage.

Another aspect is the feedback loop: Through real user feedback, you learn which features are well-received and where usability problems exist, if any. Ideally, this feedback flows directly into the backlog of the development team. A culture of continuous deployment ensures that improvements reach customers promptly and are not postponed until the next major “release”. Modern DevOps teams consider their software as a living product: "Software does not end with the launch - it should be seen as a living product. Post-launch support enables continuous improvement based on user feedback and performance data.".

Current practice in successful companies clearly shows: after the launch is before the launch. Stagnation is dangerous

  • anyone who does not invest in monitoring, troubleshooting, and optimization after the go-live risks failures, dissatisfied users, and outdated software. For medium-sized companies, this specifically means allocating sufficient resources for the operational phase or having a competent partner who takes over the monitoring and maintenance.

Monitoring and Logging: Observation is essential

Thesis: Monitoring tools like Sentry or Grafana are essential for logging and error analysis.

To quickly identify problems in operation, monitoring and logging tools are absolutely crucial. Two prominent examples are Sentry and Grafana (often in combination with time-series databases like Prometheus or log databases like ElasticSearch/Loki).

  • Sentry is a specialized tool for error tracking. It captures errors and exceptions in the application and collects them centrally. Why is this important? In production, a developer cannot just hang out with the debugger on the code. Sentry closes this gap: It delivers detailed error reports (with stack trace, user information, context variables, etc.) as soon as any exception occurs in the code. In the dev community, Sentry is now considered an industrial standard when it comes to crash reporting medium.com
    . Even less experienced developers find errors faster with Sentry because the tool eliminates much of the manual detective work. Without a tool like Sentry, many errors in a complex web application would not be noticed until users complain. With Sentry, however, the team often finds out immediately when an error happens – and can proactively respond, before all users are affected.
  • Grafana, on the other hand, addresses performance monitoring and visualization of system metrics. Grafana itself is actually a dashboard tool that can integrate various data sources – from server CPU load to database performance to application-specific KPIs. In combination with e.g. Prometheus (for metric collection) or Loki (for log collection), a powerful monitoring cockpit is created. You want to be able to see at a glance whether all systems are green, where potential bottlenecks may be, or if unusual spikes occur. Specifically, this means: Grafana & Co. help to recognize trends (e.g., steadily increasing memory load), track anomalies (e.g., sudden traffic increase at midnight), and in the event of an error, quickly identify the cause.
  • Apache
  • Docker
  • GitLab
  • Portainer
  • Sentry
  • Zabbix

Critical Perspective: Does every small web application need such an array of tools? Some smaller companies initially try to get by without dedicated monitoring, relying on simple logs or manual checking. However, our personal experience shows: As soon as the first major problem arises, it becomes clear that monitoring is not a "nice-to-have", but a real added value. Without these tools, you can easily be left in the dark for a long time, sifting through log files. However, it's crucial to properly channel and interpret the flood of data. Setting up monitoring correctly takes time initially (and requires some expertise), but it pays off many times over with the first incident. Well-configured monitoring also avoids alert fatigue – too many false alarms. Here, quality over quantity is key. Better a few, but meaningful metrics and alerts. We strongly advise all our customers: the investment in monitoring and logging tools is essential in order to remain capable of action in the event of an error!

Automatic Alerts: Zabbix and Co. as the Guardians of the Systems

Thesis: Monitoring tools like Zabbix are necessary for alerts during critical system conditions.

In addition to just observing metrics, of course, we want to be automatically alerted when something goes awry. This is where system monitoring tools like Zabbix, Nagios, Icinga, etc., come into play. Let's stick with Zabbix as an example: Zabbix is an open-source monitoring system that offers predefined triggers and notifications. You can set thresholds - e.g., "CPU load > 90% over 5 minutes" or "less than 10% free disk space" - and as soon as these are reached, Zabbix sends an alarm (via email, SMS, Slack, etc.).

Why do we need this, didn't we just praise Grafana & Co.? The difference: Grafana is great for visualization and analysis, but active alerting is often taken over by a dedicated tool like Zabbix (or Grafana is combined with an alert manager). Zabbix and similar tools are essentially the night watchmen who tirelessly monitor for defined conditions.

The importance of such alerts cannot be overstated. A fitting quote from a Linux Journal article: "Alerts and triggers are the heartbeat of monitoring. Zabbix lets you define specific conditions, upon the occurrence of which notifications are sent over various channels, so that you are immediately informed about critical events that could impact system performance"linuxjournal.com . Without an alerting system, a problem can go unnoticed for hours - in the worst case, you hear about it first from the irritated customer on the phone. With properly configured alerts, however, the team immediately knows, for example, if the web server has failed or if the response times are critically high.

Practical View: For medium-sized companies that may not have a 24/7 operating team, good alerting is even more important. It allows small teams to work efficiently because they can rely on the warning messages, instead of constantly manually checking everything. However, even here, a poorly configured system that is constantly crying "wolf, wolf, ..." (keyword false positives), will quickly be ignored. The trick is to define meaningful threshold values and send context-rich alerts (e.g., directly with an indication of which component is affected, attach logs, etc.).

Zabbix has proven itself in many of our projects and is often referred to internally as an indispensable tool.

CI/CD Pipelines: Standardized Deployments - Also for Beginners

Thesis: CI/CD Pipelines enable standardized, safe deployments even for less experienced developers.

The terms CI/CD stand for Continuous Integration and Continuous Delivery/Deployment. A CI/CD pipeline is an automated process chain that builds, tests, and eventually deploys code from commit to rollout. Why is this so important - and how does it help less experienced developers?

In traditional development workflows, the deployment was often manual work carried out by experienced admins or DevOps engineers because many things could go wrong (missing dependencies, incorrect configs, avoiding downtime, etc.). However, with a well-configured CI/CD pipeline, deployment becomes a standardized, repeatable operation - ideally at the push of a button. Even if a developer has never manually set up a Linux server before, he can make his code live through the functionality of the pipeline because the pipeline takes over the necessary steps for him.

Safety and quality are not neglected - on the contrary. Particularly less experienced developers benefit from the fact that the pipeline carries out automated tests and code checks before the deployment. This way, errors are intercepted before they're released to the user base. Additionally, the pipeline ensures that deployments always happen in the same way - there are no deviations that occur due to human forgetfulness (e.g., "Oops, loaded the staging config on Prod" - such mistakes are eliminated). For SMEs, this means: faster updates with a simultaneously lower error rate.

Of course, setting up a CI/CD pipeline initially requires know-how and effort. This is where a DevOps specialist often comes into play, who sets up such a pipeline (for example, with Jenkins, GitLab CI, GitHub Actions, or Bitbucket Pipelines). Upon critical examination, some argue that in a perfect DevOps team, this distinction between developers and DevOps would be unnecessary because everyone would be responsible for the process. However, in reality, it is especially beneficial for less experienced developers when a robust CI/CD system exists - it takes the fear of deployment away from them (that applies to me as well as a project manager). A junior developer can click "Deploy" in good conscience because they know, when automated tests have been run and the rollout is controlled, there's a minimum amount of safety that the production system won't be impacted.

It's important that CI/CD brings not only technical but also cultural changes. Deployments become smaller but more frequent. This reduces risk and the impact of errors. Teams get used to deployments being routine and not "major operation days". Especially in Agile Development, CI/CD is virtually the backbone to enable fast iterations.

In summary: CI/CD pipelines are a game-changer that enables even less experienced developers to deliver at the push of a button – reliably and repeatably.

Experience Counts: Live Data, Specific Errors, and the Role of DevOps Experts

Thesis: Certain errors and performance problems only manifest with live data and require experienced DevOps specialists.

Despite all automation and testing, the experience in dealing with production systems is irreplaceable. There are error patterns that only occur with real live data and loads - due to complex data constellations or simply scaling effects. A query that is lightning-fast with 100 test data sets can suddenly become a bottleneck with 100 million real data sets. Or a memory leak in a certain library only becomes apparent after weeks of continuous operation when the process increasingly occupies memory. Identifying and fixing such problems often requires an experienced eye.

A DevOps engineer with a lot of operational experience usually has a repertoire of diagnostic techniques. For example, experienced people know how to debug on Linux with tools like htop, iotop, or strace, which less routine developers may never have needed. Experienced DevOps also know duration load phenomena (keywords: floating-point precision bugs, memory-induced rounding errors, etc.) from practice. A drastic but real scenario: A memory error occurs only under proper full load and only in combination with certain hardware conditions - here you need experts who may have seen something similar before or know where to start.

Conclusion

For (web) applications in productive use, DevOps is not a luxury, but a necessity. Development environments hit their limits, real users produce surprises, and without continuous monitoring, you're flying blind. Small and medium-sized companies, which perhaps don't have huge IT departments, can benefit enormously from a DevOps approach. More stable systems, quicker response times to problems, and more satisfied customers. However, you must be willing to invest time and resources even after the launch to analyze data and implement improvements. Tools like Sentry, Grafana, and Zabbix form the backbone of monitoring - they deliver the necessary data and supporting mechanisms. Automation through CI/CD significantly reduces the risk of deployments and allows even less experienced team members to safely implement changes live.

In the end, it shows: People make the difference. Experienced DevOps specialists can solve tricky live problems and build a bridge between developers and operations.

Note: This article was also generated with the support of AI (model: GPT-4).

Author

Dr. Ing. Jens Bornschein

Characteristics

released:

April 1, 2025

categories:

What Moves Us

Tags:

DevOps
previous article
For developers, for store operators - We don't work with arbitrary frameworks, libraries, plugins and programming languages. We love it professionally. Software created at Helm & Walter should be powerful and lean. Here are our favorites: PHP (recursive acronym for PHP: hypertext preprocessor) is a widely used programming language for ...
June 27, 2017
Bernd Helm