Any embedded systems software engineers out there with practical experience in writing/designing safety critical applications? (think DO-178B/C) I've got a few years embedded experience under my belt between internships, my projects, and now my relatively new job at a major aviation company, but I feel like I'm behind on this topic of safety and code that can't fail. It's simply not taught and I really want to learn more. Partially it is out of personal pride because I want to make a great product, but more importantly, what I work on is protecting a human life. I really really really want to feel confident in what I build. Is there anyone out there who's got some years under their belt that can point me to some good references? Or maybe some helpful tips? Much appreciated. If it helps, all my work is in C.

  • 4
  • 4
    I've done embedded development as well as developing monitoring systems for high voltage systems, so I have some experience with this.

    I'm afraid there's really no guide for this type of stuff that I'm aware of. It's really about building redundancies into your code testing those redundancies and then thinking further about other failure conditions that could occur.

    You can even go as far as having an entirely separate backup system monitoring your main system and taking over if something goes wrong with the main system - like it freezes up. Be sure to always extensively test any safety systems you put into place.
  • 1
  • 1
    @Gogeta70 Thanks! Yeah the redundancy thing has been my main approach thus far, plus lots of tests and lots of runtime checks in the code.
  • 1
  • 2
    Your post reminded me of this Stackoverflow answer https://stackoverflow.com/a/...
    The question was similar to what you asked, how to prevent errors in extreme environment and recover from it. I'm no embedded dev but it might be useful to someone with your experience.

    It was the first time I realized that software can run in that kind of extreme environment and they'll get affected somehow.
  • 4
    In general, your company will have broken down the DO-178 into checklists that have to be fulfiled because relying on good coding practice is not enough. You must understand how the A-tables in the DO are broken down into your company's processes and procedures. There should also be some company coding standard, make sure you adhere to it.

    If the system is life critical, it will be at least DAL B, if not A. On that level, redundancy can't be achieved in code, this has to be done on system level. You need at least two independent control units with independent power paths, possibly even three of them, and some voting logic.

    All input must be sanitised, no matter whether analogue (filtering), logical (debouncing) or digital. Be especially cautious if you receive floating point data - checking the range is not enough because the IEEE format has also special things like NaN that you need to take care of. Never compare a float for equality, and don't use floats as loop variables.
  • 4
    Don't write "clever" code, keep it obvious. Since the DO-178 requires that no code must exist that isn't backed up up by a requirement, you need to put some kind of requirement tracking into the code, maybe by comments in some special format. Also bugfixes will have some kind of problem report ID, and that needs to be in the code, too.

    Then for the tests, code coverage will be required. So you probably have some tool that can measure it during the verification phase. For DAL A and B, simple statement coverage will not be enough, also ranges and stuff come in.

    Be sure to do already engineering tests properly: don't just test whether it works. Try to break the system. When there is any kind of threshold, be it in terms of analogue values or time, beyond which something has to happen: don't just test whether it happens afterwards. Also test that it doesn't happen right before.
  • 3
    For aggregate conditions, test each one of them individually, not only all together. Say for an AND, test that the output is TRUE if you put all conditions to TRUE. Then put each of them to false individually, check that you see a TRUE-FALSE transition on the output, set all of them to TRUE again, and repeat that for the next condition. Similarly for OR'ed conditions, just the other way around.

    I also recommend static source code checking tools. Crank up your compiler warnings to maximum level and treat every warning as error. Maybe your company uses MISRA (although that comes from automotive). CppCheck is a very basic, but free and easy tool that I highly recommend. There are also commercial tools like CodeSonar and Coverity, and your company should have one of these high-end analysis tools.
  • 1
    @Fast-Nop Wow thanks!! My code lives mostly in DAL B, but I occasionally get into A. It's funny, your process description sounds pretty much exactly like what my company already does, so that makes me feel better about how we qualify and organize our code.
Add Comment