Failure is an option… as an ongoing process in design and learning

Testing software can be thought of as a highly organized experimental method.  Those of us who work in software development, are generally interested in maximizing the possibility that a software product or service is fit for purpose and meets the expectations of the customer.

We use a range of testing techniques that are appropriate to the technology and business context: from risk assessment to formal reviews, static and dynamic testing, manual and automated, progression and regression, functional and non-functional, requirements-based as well as exploratory testing techniques.  Testing should provide valuable feedback about whether the design after a particular iteration meets the various stakeholders’ expectations.

But what about the psychology of managing teams when testing reveals a quantity of problems and incidents that were not anticipated? How can you cultivate an atmosphere, which encourages looking for errors, mistakes and defects as a means to learning and improving the whole development process?

The problem is that frequently people do not want to confront errors or feel threatened by them. They may feel affronted or offended.   However this is missing an understanding of the fundamental reason why errors and mistakes are necessary to learn from, not only  in software development, but indeed any process of design and innovation.

I like to get inspiration from a range of sources and different modes of design.  For example, one of the most famous architects in the last 50 years, Frank Gehry says this about mistakes in the design process of large commercial buildings:

“So as you are working in real life and in real time you are constantly having small victories and small mistakes….the important thing is to keep moving ahead and learning from the mistakes and building on it, building positive momentum from that. Because it is a very complex endeavor.”

Large-scale enterprise software project transformations can be of comparable in terms of scale and complexity of commercial building projects so I think there is a parallel.

In the book Fail Fast, Fail Often, Michael Bloomberg, the billionaire who developed the financial software tools including analytics and an equity trading platform, is quoted on the process of developing software for his financial software:

“We made mistakes, of course. Most of them were omissions we didn’t think of when we initially wrote the software. We fixed them by doing them over and over, again and again. We do the same today. While our competitors are still sucking their thumbs trying to make the design perfect, we’re already on prototype version #5….It gets back to planning versus acting: We act from day one; others plan how to plan – for months.”

Bias Towards Action And Agile Iterations

Bloomberg’s quote reveals a keen insight about the bias towards action versus being paralyzed by excessive planning. The agile movement is a response to limitations of traditional project management in part where according to Mike Cohn in his book Agile Estimating and Planning:

“At the start of each new iteration, an agile team incorporates all new knowledge gained in the preceding iteration and adapts accordingly.” (Cohn, 2006, 26)

The process of testing software is one of the main methods in which new knowledge is acquired and unexpected behavior of the design is observed during each iteration. Therefore finding errors is a necessary way to obtain information which assists the development team to adapt to this new knowledge.   As Frank Gehry suggested “the important thing is to keep moving ahead and learning from the mistakes and building on it …”

So to inspire and uplift a team from uncomfortable feelings of perfectionism or threatened by mistakes, errors or omissions or even misjudgments in the software design and development process, a manager can do a number of things to cultivate a positive environment for creativity and productivity:

  1. Build an atmosphere of safety, where it is OK to make mistakes, ask crazy questions, find errors and this is, in fact, a necessary part of the evolutionary nature of the development process;
  2. Facilitate discussions around shared risks which promote trusting cooperation;
  3. Tell a narrative which creates a sense of shared goals, values and purpose and in doing so builds a coherent working environment.

Sprints as a structure for prototyping to aid design and product development

“If you already know what you are doing, you wont do it.” Frank Gehry

Part of the benefit of establishing a short timeframe for a sprint in software development, is that the time sets in motion a constraint for the team to create a boundary for focused effort. This creates a space for prioritizing development activity in a very focused way. It is well-known that sprints also assists in the relative measurement of velocity of the team between sprints.

Regular Feedback and Double Loop Learning

But there are other benefits. Short duration sprints ensure that there is regular feedback. As one of the original creators of Scrum, Jeff Sutherland describes, in his book of the same name, a more evolutionary, adaptive and self-correcting system of software development than its predecessor, waterfall was needed. He writes:

“And so my team embarked on what we called “sprints”. We called them that because the name evoked a quality of intensity. We were going to work all out for a short period of time and then stop to see where we were.” (Sutherland, 2014)

Teams can become more efficient by examining how they can improve their performance during retrospectives. Retrospectives mean double-loop learning, or generative learning, which may involve updating mental models, modification of goals or decision-making in the light of experience. This is particularly relevant when there is a high level of requirements uncertainty or technological complexity.

Encourage Problem Solving Through Prototyping

But depending on the level of requirements uncertainty and innovation, short iterations or sprints can provide a disciplined structure to encourage prototyping and problem solving. The advantage of a multi-disciplinary team providing regular feedback on a rough prototype of software or product cannot be overstated. The sort of inquiry that should result include such questions as:

1. Are we building the right product?
2. Is this what the client wants? Is it suitable?
3. Is it usable?
4. Is it functional?
5. What about its non-functional quality characteristics such as performance?

Prototyping- Build In Order To Think

But sprints, or short iterations leading to early prototyping in product development (not just in software) can also provide a framework for a problem-solving method. Prototyping is an approach to doing in order to think rather than thinking in order to do. This can make visible thought processes and initial intuitions through building multiple concrete prototypes or models of the final product.

Famous Architect Frank Gehry’s Working Method

Frank Gehry is perhaps best known for his iconic architecture including his idiosyncratic design of the Guggenheim Museum in Bilbao, Spain and the Walt Disney Concert Hall in Los Angeles. Building an extensive number prototypes was his working method which formed the backbone of his design process.

“Disney Hall did not appear to Gehry as a fully developed image, or a single idea. It emerged through a process of little bets through which Gehry and his team worked within constraints to frame and identify thousands of problems….Gehry and his team would, in fact create eighty-two prototypes models, working closely with the planning committee, until they arrived at the final form of the hall.” (Sims, P, 2011)

Disney_Concert_Hall_by_Carol_Highsmith_edit2
Disney Concert Hall

 

This philosophy reflects the Design Program at Stanford where the maxim, ‘building is thinking’ is an organizing principle of learning about the design process. Architecture, design and software development are activities that thrive on rapid prototyping to facilitate thinking, feedback and problem solving.  Sprints are a way to structure time and create constraints to focus effort and provide a continuous feedback loop which creates an adaptive, self-correcting system.  Design prototypes in order to think and solve problems.

References

Sims, Peter, (2011) Little Bets: How breakthrough ideas emerge from small discoveries: 79

Sutherland, Dr. Jeff, (2014) Scrum: A revolutionary approach to building teams, beating deadlines and boosting productivity: 73

Image By Carol M. Highsmith https://commons.wikimedia.org/w/index.php?curid=4544231https://en.wikipedia.org/wiki/Frank_Gehry#/media/File:Disney_Concert_Hall_by_Carol_Highsmith_edit2.jpg

Dealing with uncertainty in product and software development projects

The world is full of uncertainty, and when it comes to software engineering projects or new product development, uncertainty is a familiar conundrum to be negotiated in the process of development.  In our projects, how do we best deal with uncertainty?

The degree of uncertainty varies of course according to the level of complexity and unknowns in a product development or project.  On the one end of the spectrum are serious innovations which require the inventor to heroically dive into unchartered waters and move beyond conventional wisdom.  James Dyson explained that it took 5,127 prototypes over 15 years to develop the cyclone technology vacuum. Starting with cardboard and duct tape, then moving on to ABS polycarbonate, his phenomenal vision, and feat of blistering persistence lead to a blockbuster consumer product.

vacuum

Dyson’s iconic cyclone technology vacuum.

“By the time I made my 15th prototype, my third child was born. By 2,627, my wife and I were really counting our pennies. By 3,727, my wife was giving art lessons for some extra cash. These were tough times, but each failure brought me closer to solving the problem. It wasn’t the final prototype that made the struggle worth it. The process bore the fruit. I just kept at it.” (James Dyson, 2011)

An exploratory method, trial and error are necessary for true innovation like the design process of the cyclone vacuum. It is interesting to note that Dyson, prolific inventor that he is, clearly a very wealthy man, attributes his persistence to excelling at long distance running as a young man.

There is an important lesson here: persistent systematic testing of a product and prototyping inherent in a design process encompasses learning and overcoming uncertainty. Dyson did not know upfront all the requirements and specifications of his final product.   However he understood that his inventing the best vacuum cleaner in the world involved learning through a design process and in the process overcoming uncertainty in relation to optimizing the invention under various constraints.

Uncertainty in Software Development Projects

Similarly in software development, there is a difference between plan-driven and agile projects in their approach to uncertainty.   A plan-driven project attempts to minimize uncertainty about the product by making all the decisions about the requirements upfront. However that is not realistic in many situations. By making decisions too far in advance on complex projects, there is the risk of making erroneous assumptions based on incomplete knowledge.

Requirements Uncertainty Principle

Someone said that the only certainty in life is death and taxes.   In software engineering, I would add another certainty is requirements uncertainty! It was best articulated by W. S. Humphrey, whose well-known Requirements Uncertainty Principle, which states that:

“For a new software system, the requirements will not be completely known until after the users have used it. The true role of design is thus to create a workable solution to an ill-defined problem. While there is no procedural way to address this task, it is important to establish a rigorous and explicit design process that can identify and help to resolve the requirements uncertainties as early in the developmental process as possible.”

Since the publication of Humphrey’s book: “A Discipline for Software Engineering”, the agile movement has emerged and created a different approach to managing uncertainty to traditional project management. Agile planning gradually reduces uncertainty as the project proceed. This is illustrated in the following diagram.

agile planning approach

source: Charles Cobb,  2014-18

According to Kenneth Rubin in his popular text, Essential Scrum: A Practical guide to the Most Popular Agile Process:

“Plan-driven, sequential processes focus on using (or exploiting) what is currently known and predicting what isn’t known. Scrum favors a more adaptive, trial-and-error based upon the appropriate use of exploration.”

An agile project is built around recognizing, managing, and reducing uncertainty as the project progresses.

I have a strong appreciation that using testing and an experimental method deals directly with the problem of uncertainty by systematic exploration of functionality and risk under various conditions in moving towards a system to becoming fit for purpose.

Disciplined Learning

Alistair Cockburn, an early advocate of the agile methodology, refers to the systematic exploration as disciplined learning in very small doses (for illustration see Cockburn’s knowledge-acquisition curve diagram below), where each step provides information that can be used to adjust the four categories of learning:

  1. What a project team should be building (rather than what they thought they should be building at the outset);
  2. Whether they have the right people on the team;
  3. Where the technical ideas are flawed;
  4. How much it will cost to develop. (Cockburn, 2014, 15)

I have personally experienced, as Cockburn describes, the phenomenon that in many software development projects, not necessary just waterfall projects, the major integration occurs towards the end of development. This leads to much learning about the interoperability and related risks associated with the integration after most of the development costs have been accrued.   Whereas if uncertainty is mitigated earlier and learning takes place incrementally from the outset, this can lead to reduced risks in the final product and provide valuable feedback to the project sponsor to direct and redirect the project throughout the development effort towards a better result.

know curve

source: Alistair Cockburn, 2014

Risk and uncertainty is particularly high for new products, products that are new to the market, and applications that haven’t been implemented before. And new technologies are forming the foundation of many new applications that have never been developed before. That is where the interaction of technology uncertainty and requirements uncertainty is at its highest level.

How to think about uncertainty?  One way in which a project team can think about uncertainty is in relation to the perceived level of complexity of a technology undertaking.

The Stacey complexity model can be used to illustrate the level of complexity in relation to two dimensions:

  • REQUIREMENTS: How well-understood and well-defined are the project’s goals and requirements?
  • TECHNOLOGY: How well-understood is the technology and solution for solving the problem?

stacey complexity model

Source: Charles Cobb,  2014-18

Agile projects may help address what Daniel Kahneman calls the planning fallacy. People and groups, susceptible to the planning fallacy, overestimate the benefits and underestimate risks and costs of delivering the product. This can be seen as a form of wishful thinking. People want to be successful. Project leaders can be prone to optimism bias. Teams may also fail to take into consideration outside influences and the level of complexity of all the system of systems within an organisation.

The 12th annual state of agile survey, conducted in the second half of 2017, provided a snapshot on the state of play of agile practices in software organisations around the world. It concluded that while agile adoption is growing within organisations, only 12% of respondents said that their organisations have a high level of competency in agile practices.

An agile methodology, such as scrum, involve a more adaptive approachwith an appropriate use of exploration.  An agile project can be organised to reduce uncertainty incrementally as the project progresses. However it is critical to have very skilled engineers and team members who understand when it is better to use a traditional, tradition-agile hybrid or agile model. A scrum project still will produce requirements and plans, however with the assumption that details of those plans and requirements will be articulated as the project teams and product owner learns more about the design of the product during incremental process of building in structured sprints and learning periods (retrospectives). Plan-driven approaches to software projects use methods to handle large products and teams, and highly critical products. These type of projects create highly detailed plans upfront that are suitable for highly stable environments. The approaches can be combined which could be the topic of another discussion. Agile projects have an incremental approach to mitigating uncertainty as the project progresses and facilitates disciplined learning that Cockburn advocates. When there is a high level of requirements uncertainty and changes in the early stages of a project, an agile, adaptive approach is what may be needed so long as there is a mature level of competency in fundamental software engineering practices.

References:

Boehm, Barry, & Turner Richard, Using Risk to Balance Agile and Plan-driven Methods, IEEE computer society (June 2003)

Cobb, Charles Understanding Agile at a deeper level: management of uncertainty, High Impact Project management (2014-2018)

Cockburn, Alistair, Disciplined learning: the successor to risk management, Cross Talk Jul/Aug 2014

https://en.wikipedia.org/wiki/James_Dyson

Kahneman, Daniel Thinking Fast and Slow, 2011

Kenneth Rubin, Essential Scrum: A Practical guide to the Most Popular Agile Process

S. Humphrey A Discipline for Software Engineering, Addison-Wesley, 1995

www.wired.com/2011/04/in-praise-of-failure/

Leonardo to produce VW Dieselgate

volkswagen-big wheels

Volkswagen has admitted to fitting 11 million vehicles on the planet with fraudulent emission-detection software. Meanwhile Leonardo Di Caprio has announced that he will be coproducing a movie about the scandal at a time when the VW corporation is facing enormous financial repercussions and brand depreciation for their deception around the world. This scandal is putting in the spotlight how corporate fraud can derail an entire quality management system in a company that prides itself on its quality and have dire consequences.

On 18 September 2015 the U.S. EPA served a Notice of Violation (NOV) on Volkswagen Group alleging that approximately 480,000 VWs and Audis, equipped with 2-litre TDI engines, and sold in the U.S. between 2009 and 2015, had non-compliant emissions software installed. But the consequences have been felt in many parts of the world. There are recalls, inquiries, government interventions and numerous class actions taking place in Canada, China, India, the European Union, and the United States.

Australian Owners

Volkswagen owners in Australia are launching a class action over the emissions-rigging scandal with possibly more than 91000 Australian owners being affected.  Mr Scattini from Maurice Blackburn lawyers argues that Australian owners of cars with fraudulent software were emitting many times above the accepted level of poison into the atmosphere. Mr Scattini suggests to drivers that cars fitted with the fraudent emissions-reducing software chip might be noncompliant with Australian emissions laws.  Models affected in Australia include Golfs, Polos and Skoda Octavias built within particular years.

Shifting blame?

 “This was a couple of software engineers who put this in for whatever reason,” Michael Horn, VW’s U.S. chief executive, explained to the subcommittee hearing. “To my understanding, this was not a corporate decision. This was something individuals did.” Horn, revealed that three VW employees had been suspended in connection with software that detects emissions in the company’s diesel vehicles. Reuters is reporting that VW will dismiss the heads of R & D of Audi and Porsche, as well as US chief executive Michael Horn.  Only a couple of engineers were to blame?

Challenge to Toyota on quality and innovation

Back in 2007 at VW’s headquarters in Wolfsburg, Germany, former Audi chief Winterkorn was optimistic about transforming the VW automaker into a world-leading car manufacturer by delivering strong profits and challenging the market leader Toyota on quality and innovation. “We will bring the Volkswagen group to a new and higher level,” said Winterkorn, the incoming CEO in January, 2007.   VW were attempting to gain greater market share on quality so they would have had a strategic direction in quality management. 

Quality Management

 From the 1920s when Bell Labs developed statistical control charts to the 1990s during which ISO 9000 standards gained acceptance in the United States, quality management has been evolving for more than a century in manufacturing.   VW would have a quality management system to ensure that all the attributes of an automobile’s quality- performance, features, reliability, comformance, durability, serviceability, aesthetics and perceived quality are taken into consideration.   The quality of a VW Golf, for example, would comprise of measurable characteristics and their limits of variability.  The same applies to the software components. 

Any quality management system necessarily needs to be a strategic management decision which pervades the entire organization. See the chart below for a model of how various dimensions (eg. people, processes, products) of a quality management system might be visualised within an organisation. 

Screen Shot 2015-10-15 at 2.27.39 PM

 (source: Manfred Seika) 

When the outgoing US executive stated that there were just 3 rogue engineers to blame who devised the fraudulent software, my sense is that it is not credible. One can only speculate that it is more probable that managers and engineers from different departments (product development, mechanical, electrical and software engineering) collaborated on the solution in designing the emissions-detection algorithms. The trade-off between fuel efficiency, power and emissions would have been discussed and built into the hardware/software solution. The result was that it created an attractive product for customers, appealing to their environmental conscience, not compromising on power, while appearing to conform to the EPA and other regulations under various conditions.  

It is impossible to know at this time the full extent of the relationships and decisions that went on within the organization that lead to this crisis. Were the managers and engineers trapped in their role to conform to the dominant logic of the US subsidiary, while having poor sleep, with elevated levels of stress hormones?

We must learn that passively to accept an unjust system is to cooperate with that system, and thereby to become a participant in its evil.   M. L. King Jr.

Tesla driven by the environmental Zeitgeist

 In contrast Tesla has been scaling up in U.S. their electric car battery charging station infrastructure. Earlier in March this year, the innovative automaker, Tesla, announced that it had reached a milestone of having 2,000 battery rechargers worldwide, located at almost 400 Supercharger Stations in North America, Europe, Asia, and Australia for their electric vehicles.

 On being questioned about the effects of consumer’s perception of green technologies: Tesla Motors CEO Elon Musk: “What Volkswagen is really showing is that we’ve reached the limit of what’s possible with diesel and gasoline. The time has come to move to a new generation of technology.”

Whether we have reached the limit of what is possible with diesel is another discussion.  But as far as the quality management system at VW is concerned, it appears to have been ineffectual in the face of dishonest and fraudulent behavior at various levels of the organization, not least in the U.S subsidiary in a culture of complicity.  Ethical principles must prevail in corporate governance.  VW Golfs, rogue engineers, clandestine software development, car scandal of the 21st century….  maybe it will make a good movie Leonardo. 

What can we learn about quality management from the VW fraud scandal?

As a manager how do you define quality?  This is a question that must troubling the Quality managers at VW.  When we think of some definitions include ‘fitness for use’, conformance to specification’, or ‘fitness for intended use’, would have these got them out of serior trouble . How can an IT manager be confident that a product or service is fit for use? Incidently what is the quality characteristic that detects fraud?

ice-climbing-907_640

 

Being confident about a critical software product requires an objective assessment of risk.   Developing a strategy to test software depends on the technical and business context. There is not a one size fits all best practice for all different types of software and infrastructure projects.   While I have devised numerous approaches over the years, one of the most versatile and practical is a risk-based test strategy.  A risk-based test strategy is a good approach in the manager’s toolkit.

Risk-Based Approach

Risk-based testing is an approach which provides a high level of confidence that the right features, interfaces and functions are being tested at the right time.

What is a risk-based approach to testing? (Do you know anyone who has a risk-based approach to their life?)

In project work, there are broadly two types of risks:

  1. project risks; and
  2. product (quality) risks.

Project risks relate to problems that arise that affect a project’s success. Lack of the right resources is an obvious example. Or the risk of not being able to deliver in the required market in a timely manner. Perhaps VW did not assess the impact to their brand and share price if their systemic fraud was discovered.

Product quality risks depend upon the specified quality characteristics and requirements definition.

Risk Identification

It is desirable to use a combination of a top down and bottom up risk assessment approach. A top down approach uses quality standard classifications and checklists.  Whereas a bottom up assessment means using interviews with experts:  solution architects, software developers and business analysts to gain insight into  the relative importance of risks and quality characteristics for the various components of a system.

A top down method of risk assessment can use software quality standards (eg. ISO9126) or other risk templates. For example the following is an example of quality characteristics derived from several sources:

  •  Capability. –   Can it perform the required functions?
  •  Reliability –   Will it work well and resist failure in all required    situations?
  •  Usability  –   How easy is it for a real user to use the product?
  •  Performance. –   How speedy and responsive is it?
  •  Installability–   How easily can it be installed onto its target platform?
  •  Compatibility–   How well does it work with external components & 
configurations?
  •  Supportability–   How economical will it be to provide support to users of the 
product?
  •  Testability–   How effectively can the product be tested?
  •  Maintainability–   How economical will it be to build, fix or enhance the 
product?
  •  Portability –   How economical will it be to port or reuse the technology 
elsewhere?
  •  Localizability–   How economical will it be to publish the product in another 
language?

Source: Heuristic Risk-Based Testing by James Bach in Software Testing and Quality Engineering Magazine, 11/99

But what about another that if overlooked could reverberate throughout the company in years to come:  Is the software compliant with relevant regulations?

  • Compliance –  Does the software comply with regulations in the various jurisdictions that it will be used in?

Risk Assessment

Assessing the risk is about determining the consequences and likelihood of potential risks. It could be useful to ask questions such as:  How serious is this potential risk? What is the likelihood of the risk eventuating?

Probability

It is important to determine the criteria to assess the probably of the risk happening. These criteria could include complexity, degree of change to a function or component, program size, programming skill etc.

Consequences

What are the consequences if this function or component was to fail?

You could assign a criticality to each risk based upon the impact on the business or company if it were to fail.   A numeric scale may be used to rate the relative criticality.

FMEA

There is a formal procedure which is another alternative which can be used if it suits the context of the project. Failure Mode and Effects Analysis (FMEA) is a formal methodology created to identify potential failure modes for a process or product, to assess the risk associated with those failure modes, which allows ranking of issues in terms of importance, and to identify and carry out corrective actions to address the most serious concerns.

Failure Modes, Effects and Criticality Analysis (FMEA / FMECA) involves the identification of the following information:

  • Item(s)
  • Function(s)
  • Failure(s)
  • Effect(s) of Failure
  • Cause(s) of Failure
  • Current Control(s)
  • Recommended Action(s)

Most analyses include some method to assess the risk associated with the issues identified during the analysis and to prioritize corrective actions. Two common methods include:

  • Risk Prioritization; and
  • Criticality Analysis

Risk Mitigation

The level of coverage of software testing is contextualised by the information gained from the previous activity of identifying and assessing risks. Scope and risk assessment combine to inform a test strategy which may include resourcing and scheduling, techniques of static analysis, dynamic testing, progression and regression, manual and automated, functional and non-functional dimensions in the overall approach.

Juggling time, resources and quality

A risk-based approach to testing is designed to mitigate risks and find the right trade-offs in time, resources and quality.   Quality should not be compromised by other project imperatives.  Is your product’s quality and conformance ready for the world to see?  The systemic fraud that was designed into the emissions software of VW’s cars highlights the imperative for compliance as well all the usual product risks to be mitigated.

Hows your fitness for use?

As an IT manager are you prepared to go against the dominant logic of your organisation if there is systemic fraud and non-compliance?  How is your product’s fitness for use?

References:

Heuristic Risk-Based Testing by James Bach in Software Testing and Quality Engineering Magazine, 11/99

A First Course in Quality Engineering:  Integrating Statistical and Management Methods of Quality (2nd Ed.)  K.S. Krishnamoorthi, V.  Ram Krishnamoorthi.

Risk Based Testing and Metrics, article by Stale Amland for 5th International Conference, EuroSTAR ’99, November 8-12, 1999, Barcelona, Spain.

Software  Risk Management:  Principles and Practices by Barry W. Boehm, IEEE Software, Vol. 8, No. 1 January 1991.