Reliability and Safety What can go wrong? Risks of Computing ? They support many aspects of our security: ? Fly by wire aircraft ? Patient monitoring and care administration ? Financial transactions ? Telephone networks ? Military surveillance and responses Possible States of a Computer ? Functioning correctly ? Functioning incorrectly ? Down ? Intentionally off Computer failure causes: ? Faulty design ? Sloppy implementation ? Careless or insufficiently trained users ? Poor user interfaces ? Hardware/Software malfunctions ? Specification errors ? Scope/Application inconsistency Computer users perspective ? Should understand limitations of the computers ? Need for proper training ? Need for responsible use ? Difference between good products and bad ones Computer Professional Perspective ? Study computer failures ? Study computer ethics Educated Member of Society Perspective ? Help us evaluate the reliability and safety of various computer applications ? Help evaluate computer technology Three Categories of Failures ? Problems for individuals ? System failures that affect large numbers of people or cost large amounts of money ? Problems in safety-critical applications Problems for Individuals ? Billing Errors ? design and/or implementation of programs ? Not enough care - input error ? Not enough testing - reasonable range ? Not enough training Database Accuracy Problems ? Info in database is not accurate ? Automatic entering of info - mistakes can be overlooked ? Copies of incorrect info can be in other systems ? Not knowledgeable enough about the system Causes ? Large population ? Most of our financial interactions are with strangers ? Automated processing without human common sense ? Overconfidence in accuracy of data ? Lack of accountability Consumer Hardware and Software ? Usually have more serious errors in their first releases ? Regularly sold with known bugs ? Hardware also has flaws ? tradeoff between cost, debugging, and marketing ? Dishonesty, denials of problems, lack of adequate response to complaints System Failures ? Lots of $$$$ ? Complete shutdown of basic services ? Areas: ? communications ? Business and financial systems ? Military WHY? ? Not enough testing ? Technical difficulties ? Poor management decisions ? Dishonesty in promoting the system and responding to problems Communications ? Phone Service ? How Bad? ? pagers ? phone calls ? 911 ? Communications for airports ? cellular phones Business and financial systems ? Stock exchange ? ATM ? Contest by Pepsi ? too many winning tickets issued Destroying Business ? Loss of sales ? incorrect info affects business ? dissatisfied customers ? incorrect prices ? loss of data Military ? Data management ? Weapons system design ? Battle simulation ? Battle management ? command/control ? communications ? intelligence ? Nuclear war Why? ? Not enough testing ? technical difficulties ? poor management decisions ? dishonesty in promoting the system and responding to problems ? Results in delays and abandonment of projects The Denver Airport baggage system ? Outbound luggage checked at ticket counters or curbside ? to be delivered to anywhere in <10 minutes ? via automated system of cars on tracks ? connecting flights or terminals ? Laser scanners ? tracks - 4000 cars Problems Encountered ? Cars crash into each other at intersections ? Luggage misrouted, dumped or flung ? Needed cars were idle or put to rest Specific problems ? Real world problems ? scanners got dirty ? knocked out of alignment ? Software error ? rerouting of cars to waiting area - idle Causes ? Time allows for development and testing was insufficient ? Significant changes in specifications were made after project began ? Not enough debug time ? Poor management ? Unrealistic plan Safety Critical Applications ? Use of computers is increasing rapidly in these areas ? Use of computers in these areas can save $ ? Areas ? Military Medical Applications ? Power plants ? Aircraft ? Trains Aircraft - Fly by Wire ? Pilots do not directly control plane ? Actions are input to computers that control the aircraft systems ? Pilot interaction is critical ? Need for easy way to override computers ? Easy transfer between automatic and manual control Air Traffic Control ? Long delays ? Increased risk of collisions ? Old machines - computer systems ? Political - government spends $ elsewhere Case Study - Therac-25 ? Software controlled radiation therapy machine used to treat people with cancer ? Problems: ? Massive overdoses administered ? Repeated overdoses due to faulty display ? Death ? Operated in dual machine mode - electron beam or x-ray photon beam Why? ? Lapses in good safety design ? Insufficient testing ? Bugs in software that controlled machines ? Inadequate system of reporting and investigating accidents and deaths Specific problems ? Some hardware safety features were eliminated in newer models ? Software used was assumed correct form older systems ? Malfunctioned frequently ? Weakness in design of operator interface ? inadequate explanation of error messages if any Specific problems continued ? Machine allowed one-key intervention versus automatic shutdown ? Inadequate documentation ? Poor test plan Software Errors - bugs ? Fatal error was a simple fix ? Fixes are complex, expensive, and prevents use of machine while fixing ? Bugs ? can be intermittent and hard to detect ? importance of self checking ? importance of using good programming techniques Overconfidence ? Leaving out changes that are necessary ? Ignoring error messages ? Not using backup devices (video or audio) Conclusion and Perspective ? Irresponsibility leads to criminal charges ? Responsibility leads to merit awards ? Importance of good software development ? Consequences of carelessness, cutting corners, unprofessional work, or attempts to avoid responsibility ? Lack of appreciation for risks ? Poor training Ways to prevent problems ? Good computer systems ? Good training ? Accountability ? Individual responsibility ? Management responsibility ? IE IEEE Code of Ethics Increasing Reliability and Safety ? What goes wrong? ? Many lines of code and many programmers ? See page 130 ? Problems are managerial, technical, social, legal, ethical Overconfidence ? Unappreciative of risks ? Ignore warnings ? Don't consult manuals Professional Techniques ? ? Use good software engineering techniques at all stages of development: ? specifications ? design ? implementation ? documentation ? testing Professional Techniques ? Study the techniques and tools available ? Knowing or learning enough about the application field and the software or systems being used Why Study Failures? ? Provides technical lessons ? Leads to improved hardware and software products ? Provide ethical data ? Lead to improved ethical codes/laws Lessons Learned ? Accidents are not the result of unknown scientific principles but rather a failure to apply well-known engineering practices ? Accidents will not be prevented by technological fixes alone, requires control of all aspects of the development and operation of the system Lessons Learned ? Software developers need to recognize the limitations of software, and use hardware safety mechanisms User interfaces and human factors ? Aircraft control systems ? Pilot needs feedback to understand what the automated system is doing at any time ? The system should behave as the pilot expects ? workload that is too low can be dangerous Redundancy and Self-checking ? Redundancy - judging - expensive ? Complex systems collect information to diagnose and correct errors ? Audit trails are vital ? Detail records help protect against theft and help trace and correct errors Redundancy and Self-checking ? Designed to constantly monitor itself and correct problems automatically ? Half of the computing power is devoted to checking ? The rest for errors ? closes off part of teh system ? reroutes ? corrects problems and reroutes again TESTING ? CRITICAL! ? Principles and techniques exist ? can use another company to perform Independent verification and validation Dangerous Tendencies ? Operators ? bypass check mechanisms through familiarity ? Technicians ? Blame random mechanical or signal glitches rather than software ? Corporate Managers ? Initially deny and ignore - then cover up ? Finally - deal with expensive fixes Overall Lessons Learned ? Should not declare problem understood with first hypothesis ? Should not expect management to follow through on field reports ? Overconfidence in software leads to economical marginal designs Overall Lessons Learned ? Enforcement of software engineering practices is often abysmal ? Basing risk assessments on individual subsystems often leads to unrealistic optimism Lessons for systems engineering ? Hardware backups valuable ? Software must not be presumed innocent ? Software errors related can be indistinguishable ? Audit trails are critical ? Risk estimates are subjective ? User feedback is valuable Lessons for software engineering ? Documentation should be on-going ? Designs should be kept simple ? Testing should be built into software ? Software must be tested out of system and in system ? Reuse of software should be tested like new software Lessons for oversight ? Users are more likely to make initial observations than monitoring officials ? Users need reliable information in order to be maximally valuable Laws and Regulations ? Criminal and Civil penalties ? Suits against company that designs or sells the system ? Criminal charges when fraud or criminal negligence occurs ? Need contracts ? Need well designed laws and standards Regulation ? Requirement for approval by a government agency before a new product can be sold ? including specific testing requirements ? The profit motive cause skimping on safety ? Better to abandon in some cases ? Inadequate abilities to judge by customer ? Hard to sue large companies Regulation ? Expensive and time-consuming ? Newer procedures may not be enforced ? Lots of paperwork Professional licensing ? Licensing of software development professionals to protect against poor quality and unethical behavior ? Specific training ? Passing competency exam ? Ethical requirements ? Continuing education