To Improve Quality, Understand Your Rework Cycle
In an ideal world, a quality control department wouldn’t exist. Everything would be done correctly on the first try and unexpected surprises would never keep us up at night. Home on planet Earth, however, we are imperfect. It’s unavoidable and perfectly natural to make mistakes, readjust, and try again. We’ve all put the scissors away in the wrong drawer or misplaced a decimal point at some time in our lives.
The mechanics of how errors occur and are corrected can be captured in a mechanism called the rework cycle. This article aims to shed light on the rework cycle and help us to understand the lifecycle of an error. We’ll discuss this in terms of performing a task, which can mean many different things. To help with context, a task can be a parts drawing, a piece of software code, a schedule, or even just putting your kitchen scissors away.
Right or Wrong: What Happens When We Do Tasks
In 1999, the Mars Climate Orbiter was set to enter Mars’s orbit to study the red rock’s weather and climate. After a 10 month trip to Mars, its mission began and ended in a fiery blaze in the very atmosphere that it was meant to study. A software mismatch had occurred in which one section of code was using Metric units and the other (erroneously) was using English units — causing an incorrect altitude. It brushed the atmosphere and the rest is history.
This is hardly a revelation, but two things can happen when we perform a task: we can do it as expected or it can be done incorrectly.
In either case, our “to-do” list got smaller. The kicker, however, is that we don’t know that something is incorrect until we learn it’s incorrect. It’s a loose quality-equivalent of Schrödinger’s Cat. We won’t know we put the scissors away in the wrong place until next time we need the scissors or if you happen to wake up at night realizing the mistake. The effort required to fix an error is called rework (appropriately named because we have to redo the work). There’s also a metric we will call “Percent Done Correctly” which represents how often we have done a task correctly. Tasks done correctly will not have to be reworked, so we can consider that work completed. We can therefor rewrite our flow-chart with some name changes and a little more detail:
Our “Percent Done Correctly” will dictate how often task completion will end up in either category. We may have completed our to-do list, but we won’t know if the work was done correctly or not until it’s put to the test.
Error Discovery Hurts Our To-do List
Let’s look at a scenario in which every day we get 50 tasks to do. If we do things correctly 90% of the time, we will have five tasks in our Undiscovered Rework bucket. If we happen to discover the errors very quickly, we can address them the next day. We don’t often discover errors so quickly — so it’s important to keep in mind that they will come back to haunt us at an unknown time — even if that time is when a satellite reaches Mars.
When we discover that errors were made, the error moves from “Undiscovered Rework” into a category we will call “Discovered Rework”. Discovered Rework is the moment you realize something needs to be fixed and it’s now made its way back to our to-do list.
You may have noticed the inclusion of Error Discovery Rate. This rate represents how quickly we can discover an error. Ideally, we would like to find our errors before our product reaches a customer or is otherwise in service. In the case of a manufacturing environment, design reviews is one method of improving the rate of error discovery before it reaches more expensive downstream functions. It is much, much cheaper to fix a drawing early on than it would be to discover it after setting up our manufacturing floor for mass production.
Areas of Error Discovery
Your Error Discovery Rate depends on both your quality processes and the nature of the error. Some errors may not even be discoverable until much further downstream when the product is already being employed. This leads us to the two main types of errors we can encounter: non-systemic and systemic errors.
Non-Systemic Errors
A non-systemic error is a mistake which can be independently discovered. These generally include simpler mistakes that happen on localized levels that can be solved through part or sub-assembly reviews, basic software error checking, or process reviews prior to implementation. If our task is to write a book— these are the spelling and grammatical errors.
In more technical terms, non-systemic errors are self-contained and do not occur at interfaces. Our parts, software modules, or even paragraphs in a book are meant to interact with others. Errors that occur within these modules may stop it cold prior to even trying to integrate it with another module.
Systemic Errors
Systemic errors are tougher to find, take longer to discover, and are more expensive to fix. Your modules may function just fine on their own but fail when you try to use them in context. These are the errors you need to go back a few steps to fix. They are called systemic because the source of the error doesn’t show up until we plug multiple pieces together and realize a requirement may have been lost in translation. It may even appear to be working correctly up until an undisclosed time when results come back that don’t quite line up with our expectations.
Errors in mechanical fit, such as the above, may be discoverable in an engineering review or through the use of 3D CAD software. Electrical, software, or process errors of the same variety, however, may require more advanced simulation or robustness testing in order to have a hope of discovering them before fully integrating your system. This includes having your product performing fully as you’ve designed it, but arriving at the customer site to discover that the customer requirements were ill-defined.
Some areas of focus to increase error discovery of systemic errors includes:
- Subassembly testing
- Simulations (electrical, CAD, software, process practice-runs)
- Requirements auditing
- Including customer in design process
A Quick Reality Check
The above is anonymized data from a real-world project from product conception through prototyping. The % Released represents the part and subassembly drawings released over time that ended up in the final design. It’s easy to see how quickly errors (above in the form of engineering change orders) can add to the expense of a project, and how errors may not be discoverable until more downstream phases. Robust processes to complete work correctly the first time and discover errors quickly can significantly reduce headaches in product roll-outs. Adding processes often hurts flexibility, so it should be noted that all decisions have their price in one way or another.
Discussion
The rework cycle is going to exist — there is very little we can do about it. We’re humans and humans make mistakes. Servers may come across a fringe-case scenario and respond incorrectly. We’ve deconstructed a simplified version of the rework cycle that leaves us with two main topics to consider: Percent Done Correctly and Error Discovery Rate.
Percent Done Correctly
How can we minimize errors we make? Think of all the things that would enable an employee to perform their work with fewer errors. Often, our solutions for this may affect how quickly we can get tasks done. However, we can strike a balance.
Above we see some dynamics that can affect how many mistakes are made. Green means it’s a good thing for our company. A “+” designates an increase. Better training increases Percent Done Correctly.
A “-” designates a decrease: work pressure decreases our ability to do things correctly. We will make mistakes if we have to rush. Reducing our work pressure is a good thing, and we can do that by adding staff or reducing stress with a positive work culture.
This graph can be expanded by identifying more things that can help or hurt our ability to do our work correctly. We might add things like “proper tools”, which can lead back to our IT strategy. The important reflections are:
- What variables can cause us to create more errors?
- What are some things that are two degrees of freedom away from our Percent Done Correctly? Example: Positive Culture
- Can we go further than two degrees of freedom?
- Do some things affect more than one variable?
Error Discovery Rate
How fast can we find errors? The discovery rate seeks to capture how we can discover errors more quickly. Some errors may not come to light until much further downstream. A drawing with an incorrect screw might be very difficult to identify until we are in the manufacturing phase. There are many reasons we can find a lag in our error discovery.
By identifying the inputs to how quickly we can find errors, we arrive at talking points to improve our efficiency. Perhaps we can look into a more efficient Quality Management System or audit our engineering reviews. We can reflect on:
- What variables affect our ability to discover errors?
- How can we discover errors before they reach a customer?
- If a customer encounters a bug, how can we quickly identify what happened?
- Can we set more realistic deadlines to reduce the chances of something being rushed through a testing phase?
Conclusion
Understanding in an abstract way how errors progress through our business can help us function more efficiently. Looking at what creates errors in the first place and how we go about discovering them can shed light on pain-points in our organization. The rework cycle diagram can be expanded to include how we deploy our products to our customers and how we support them. Getting clear and detailed feedback can affect how easily we fix issues or introduce new features. Describing the flows of how we deliver our products and how things can go wrong enables a dialog into becoming a better business.