> Almost all problems occurred at interfaces between companies (prime vs. sub, customer vs. prime) or between different groups within the same company, where one group misunderstood what another group was doing, or at actual mechanical and electrical interfaces between components designed and built by different groups.
This is obviously a well-known phenomenon in software engineering and I don't think anyone here is going to be be particularly surprised that it occurs in the aerospace setting. What is a little more surprising, to me at least, is that the systems people over there don't have procedures in place to minimise risks stemming from lack of communication.
It isn't realistic for any sub-team to be fully familiar with the overall system but surely, for instance, if a team is working on component X which interfaces with components Y and Z, then it should be standard practice for the X team to spend at least some time with the Y and Z teams during development?
Back when I worked on this hardware/software integration, we often didn't have the hardware to test.
So we coded to the specs. I spent a lot of time reading those and trying to figure out what they meant. It was a little challenging but usually all the information was there. It worked (mostly) and we tested alot. Some stuff was strange, I still remember seeing angles in BAMs (Binary Angle Mesurements)
This jibes with the way people compare SpaceX and "old space" development in industry sources I follow:
SpaceX works "hardware rich", building lots of prototypes early in the development process. When Boeing and ULA launch their first "production" launch, the previous test articles generally haven't been anywhere near complete.
PRINCE-2 and other methodologies used in these kinds of programs make ample provision for doing this - but like all methodologies the benefits only come from proper application. If the program manager is subjected to political pressure from different stakeholders then the processes and approaches that should catch division and misapprehensions may simply not run.
What's amazing to me is that it doesn't seem like Boeing did tests with a fully integrated capsule until after the CFT test was in progress.
They did test firings of individual thrusters, and even did some with multiple thrusters, but with many of the systems in the doghouse missing and the insulation taken off.
Having read a good amount about their methods, it really seems like Boeing has relied heavily on component level tests and analysis rather than integrated tests. And it has bitten them many times so far.
>with many of the systems in the doghouse missing and the insulation taken off.
I'm curious where you're getting this? I've read speculation, but I've never seen any authoritative source claim the test hardware configuration was different than the flight configuration. The better sources I've seen tend to indicate it was an inadequate thruster profile in the tests, rather than a configuration issue.
In my current role (high-assurance deterministic code for self-driving cars, one of the top-tier players, a company who claims to be "safety obsessed"), we have close to zero documentation. Every team or department has their own standards for documentation. Documentation is always back-written after coding is complete. Requirements are written after code is complete. For the past year, I've been given tons of praise at department meetings, "look at so and so, they've written so much really good documentation, their docs are the standard everyone else needs to follow", and then when it comes time for promotions my managers tell me "well, you haven't shipped as much code as other people on the team .... absolutely you've done a terrific job with documentation and we totally recognize you caught a ton of problems before they became problems, but promotions are really based on 'results', and 'results' means how much code you wrote ....". So I'm job hunting.
Anecdotal personal experience in safety-critical design:
A team was tasked with modernizing their multi-million dollar decades old test stand. The entire time they were cursing previous engineers for their lack of documentation that made reverse engineering difficult. Then when it came time for them to produce the documentation on their own design, they balked at the idea. I had a conversation with them about how they are screwing over the future engineers just like they were screwed over, but they still maintained cost/schedule pressure was too much to comply. We settled on them being allowed to go forward as long as they set aside a fund source and a date to have the documentation complete. When that date came and went, the documentation wasn't done and the excuse was the funding was used up by other projects. I feel like I owe those future engineers and apology.
I don't think I'd be so trusting/naïve today and would push back harder that if they couldn't get their documentation in order when the design was fresh in their mind, they're even less likely to do so in the future.
Ah yes, the old “we’re really looking for people who know how to game the metrics rather than wasting time on long term value to the company and our customers” conversation. Sorry to hear it, but wow do I know just what you mean.
I'm sure that these groups are producing specifications, and I'm sure those specifications are being followed to the letter (and perhaps even being validated as such). The problem is that the spec only ever contains about 80% of reality, with the rest being lost either to implicit assumptions made by the writer, or to requirements that the implementer couldn't possibly hit and can't know (unilaterally) how to trade into something more realistic.
This is why you have to get the humans to talk to the other humans. If that communication happens via a collaborative design document then yes, that's a process, and it's one that can work.
It's also why we could not recreate a Saturn-V today. We have the specs, but we don't have the knowledge and skills of the people who actually built them.
Even if you have the specs, you do not know if there was some important variable that was not referenced in the specs, and then you need a billion dollar research project to figure out what was missing from the original spec. Reference: FOGBANK. https://www.twz.com/32867/fogbank-is-mysterious-material-use...
I tried and tried to get two teams who were working on critical-but-independently-developed systems to put together an ICD. Team 2 says "no problem!" and comes back with a document 2 weeks later. Team 1 says "this proposed interface is terrible, here's a much better way to do it". Team 2 replies "oh yeah that's a nice interface but too late the interface in the ICD is the one we built two months ago can't change it now"
I read it but there is less and less available. Presentation and video recordings are more common but useless for the self study and search of information.
Writing good documentation and instructions is hard. I try it a lot.
Maybe there should not be 300 subcontractors involved in delivery and contracts should stipulated that work cannot be outsourced? The outsourcing of everything is part of the reason no one is ever held accountable.
>The point of outsourcing is because people don’t want to be held accountable.
Having worked on the public sector (Air Force), there's enormous pressure on groups like NASA to outsource because voters perceive government work as wasteful and expensive, and contracted work as efficient because free market.
> there's enormous pressure on groups like NASA to outsource because voters perceive government work as wasteful and expensive, and contracted work as efficient because free market.
And, those contracts end up being the most wasteful and expensive of all.
As labor is a driving cost, wouldn't that almost double the price? They are already uncompetitive in price with spaceX right? I'm not suggesting that profits be valued over lives, but they are clearly doing something wrong beyond having too few employees.
But by that time, the management whose quarterly or yearly bonuses drove the decision have moved on to bring their skills at increasing stock value to some other company (or retired).
I wonder if it's possible to avoid sub-teams of a project at this scale, could everyone working on it have a general understanding of the entire system? even with imperfect understanding, individual contributors would cover the gaps for each other.
Are there full-stack engineers? or are the individual domains too complex compared to coding?
This isn't really possible on a project like this. There are just too many specialties, and you need folks who have deep expertise in each one. Just off the top of my head there's structures, mechanisms, fluids, propulsion, avionics, dynamics, software, integration, systems, instrumentation, test, operations, human factors, and manufacturing, and each one of those has sub-specialties. In avionics for example you've got RF and power (among others); in software there's embedded, flight, ground, and interfaces (again, among others). There's a chief engineer whose job it is to oversee the project but they will be relying on the expertise of the individual teams, and each team has to work closely with and lean on their partner teams. Sometimes you'll have people who are cross-trained - I have experience in avionics, software, and ops - but that's not typical, and it doesn't take much to feel spread thin (I certainly do).
No, people look at what NASA was able to accomplish during the 1960s and compare it to now, and wonder how the level of competence can be so drastically lower now vs. then. NASA was not infallible during the 1960s, but the level of engineering competence was much higher.
You think a telescope that launched over a decade late and cost 10x the budget is a mark of "high competence"? I don't think being a useful science tool is enough.
What's the other telescope you have in mind?
And those two free telescopes from NRO are still sitting idle. One of them is supposed to finally launch after 15 years, though apparently it's in such a complex and high budget mission it might not be saving any money to use it there.
The combined Mercury, Gemini, and Apollo programs cost around $30 billion in then year dollars, which equates to about $300 billion in today's dollars. That's an average of about $25 billion a year over 12 years. That's about the same as what is being spent per year on NASA now.
NASA was doing plenty of unmanned missions during the Mercury-Gemini-Apollo years: the Explorer, Pioneer, Echo, Ranger, Telstar, Mariner, Lunar Explorer, and Surveyor programs all had multiple missions in that time period. So no, I don't buy the argument that NASA has to spread its budget over many more missions now as compared to then.
Many of those you mentioned are still part of the goal of getting a human on the moon. That's like saying Starliner's Demo flights don't count as human-rated because they were uncrewed. About a one third to one half of NASA's budget is dedicated to exploration and space operations which is a better comparison for what you're driving at. The rest is spread across science, aeronautics, environmental, educational outreach, and other goals.
> Many of those you mentioned are still part of the goal of getting a human on the moon.
Some of them were for getting more detailed information about the Moon prior to sending the Apollo missions there. But most of them were general solar system exploration.
> That's like saying Starliner's Demo flights don't count as human-rated because they were uncrewed.
No, it isn't. The counterpart in the 1960s to these missions would be the uncrewed flights of the Mercury, Gemini, and Apollo spacecraft that were done prior to the first crewed missions. I did not include those in the unmanned missions I listed.
This is obviously a well-known phenomenon in software engineering and I don't think anyone here is going to be be particularly surprised that it occurs in the aerospace setting. What is a little more surprising, to me at least, is that the systems people over there don't have procedures in place to minimise risks stemming from lack of communication.
It isn't realistic for any sub-team to be fully familiar with the overall system but surely, for instance, if a team is working on component X which interfaces with components Y and Z, then it should be standard practice for the X team to spend at least some time with the Y and Z teams during development?