Disclaimer: The following represents my opinions and recollections only;
in no way does it pretend to represent the official or unofficial views of STSCI, or that of any
other individuals.
Overview
By this I mean issues that make developing software more difficult than it
otherwise might be. Some of these issues are probably unique to STScI,
others are more general to astronomy, and some of general applicability
to science and engineering.
Unrealistic Expectations
There are generally highly unrealistic expectations regarding the time and
effort required to write software by astronomers, though this has improved
somewhat for reasons I will mention later. This is largely because most
astronomers have had some experience writing software themselves, and use
their experience in judging how much work is involved. However, many astronomers
are comparatively ignorant of the difference between their own experience and
what is required to write software for other settings. Because of that, their
estimates of the work involved are often off by an order of magnitude, if not
more. This, of course, leads to frustration on their part as to why the
developers are taking so long to write the software.
This is a perennial source
of unhappiness on their part. I would not be surprised if it is somewhat
amplified by a general attitude of astronomers and physicists who look down
on engineers and feel that they could always do better. That is to say, it
isn't necessarily restricted to software. I will concede that there are some
extremely talented astronomers and physicists that could do better, but
nevertheless, that they aren't willing to do those jobs. In any case
these are the exceptions.
Why their Personal Experience Misleads
Most astronomers that have experience writing software write it for their
own personal use, and for no one else. In doing so they usually take numerous
shortcuts that do not work for the job that the developers must do (keeping
in mind the constraints that the developers have vary from one institution
to another). For STScI, our constraints are nearly maximal (short of writing
software capable of injuring or killing people). What are these requirements?
The software must:
- Be run by other users, generally requiring the following:
- Run on different platforms, most of which we have no control over.
- Be sufficiently documented so users know how to use them
- Address a wide variety of use cases.
- Have a good user interface.
- Handle errors in an informative way.
- Handle corner cases.
- Be run in an operational environment, which requires:
- Minimizing errors causing the operations use to fail.
- Careful versioning of the software.
- Good regression tests
- Good unit tests
- Be maintained for a long time.
- Code must be understood by other people.
An astronomer that writes their own software for their research problems often:
- Makes ad-hoc changes to handle varying needs.
- Doesn't worry about corner cases until they run into them.
- Has no concern about running on different platforms, only their own, nor installation
issues.
- Not as concerned about the user interface since they know what the code does.
- Not as concerned about error checking
- Versioning often done in an ad-hoc manner.
- Has no documentation.
- No concern about other people understanding the code.
- Often no concern for regular test cases.
I was guilty of many of these prior to working in the software branch!
Handling the difference in these two different needs is usually a large amount
of work, for which most astronomers have little conception of. In recent years
astronomers that have become involved in contributing code to Astropy begin to
get a sense of many of these differences and how much work is involved in these.
Worse, the issues are not simply limited to these differences.
Issues related to Astronomer/Developer Interactions
When an astronomer writes software for themselves, there is no issue with regards
the communications of what the software needs to do. It is all in their own head.
As it turns out, most astronomers are terrible at detailing the requirements for
the software. (By using the term "requirements" here I am not referring to formal
requirements such as NASA may require.)
Doing a good job of this means giving serious thought about what
exactly must be done. When writing the software for themselves, they handle these
issues as they arise. "Oh, it isn't converging as I thought it would; I need to
change the algorithm." Or "I didn't think about what should happen if the calibrated
flux is negative." And so forth.
They typically will give general directives to the developer as to what the software
will do without usually having addressed many of the issues that might arise. It
also requires the developer to understand the typical thinking of the astronomer
since the astronomer usually expects the developer to understand the concepts (an
issue discussed in a future article on the ideal developer), and also that the
developer will be capable of recognizing when the program is giving ridiculous
results.
The problem of poor requirements would not be a serious issue, if the astronomer
is available for quick feedback on when such issues arise. Alas, at STScI, that
is often not the case. The reasons for this will be discussed at greater length
later. And the same is true for feedback on testing the software by the astronomer.
It is not uncommon to find that the astronomer cannot address questions or test
the software for weeks or months. This, of course, is very frustrating for the
developer, as well as having a serious impact on the development schedule.
Worse yet, it is not uncommon to get no specific requirements on the the software
task for a long time. This while program schedules have deadlines for the completion
of the task before any requirements exist. Typically in such a case the developers
must make their best guess and write the software according to that (while informing
the astronomer or their management as the the course of action chosen). This often
results in rewrites of the program once the astronomer does have time to address what
it should do, leading to even greater effort and schedule delay.
When managing pipeline development, where we are committed to writing a specified
number of pipeline processing steps, we face bottlenecks on specific steps and
complete negligence of other steps. Astronomers can be guilty of becoming fixated
on what the absolute correct algorithm for a step should be, typically to disagreements
between two or more astronomers. This ends up with long delays completing one
step to the detriment of others. The idea of making the whole pipeline an object
of iterative improvement is mostly foreign to them. The attitude is often we must
do this step right the first time, despite the fact that whatever is deemed the
right thing may be invalidated when real data is obtained. This is a case of
perfect being the enemy of good. All this gives the developers a great deal of
frustration.
Because of this, the order of magnitude differential between what astronomers
perceive as the amount of work involved, can be multiplied by a significant factor
more.
Why Astronomers are Often Unavailable
We must understand the realities of the situation with astronomers that leads
to the difficulty of getting timely information and feedback. Much of this is
beyond their control.
Research Time
Most of the instrument scientists (the astronomers responsible for directing
software requirements) have research time, some with 50% some with 20%.
So right off the bat, they are not fully available. If they have an observing run,
conference, or travel for collaborations, they may be unavailable for a week or
two for that reason alone.
Competing Demands on their Time
In the case of a telescope and corresponding instruments being built, any problem
that arises that threatens the success of the instrument, immediately becomes
a top priority. Such things happen regularly, and typically are not sufficiently
well budgeted for in the staffing plan, where things are presumed to go more
smoothly than they usually do. Such problems can pull an instrument scientist
away from duties for managing software requirements and testing for many weeks.
Furthermore, there are often programmatic exercises, usually required by NASA, asking
what the effect of a change would be on the effectiveness of scientific programs, or
calibration, or any such thing. These also are not adequately scheduled for and can come out
of the blue any time. Dealing with the software development is often at the bottom
of the priority list. In other words, it can wait. These other things can't.
Other Management Issues
As a result of many of the previous issues, there can be a secondary effect on
the management decisions on the Instrument Science side. One such effect, I think
arising partly out of the software inclinations of the managers and partly due
to the other pressures on time, is a tendency for short-term thinking regarding
software priorities. Given a choice between getting something quicker, but with
long-term maintenance costs, or taking longer for a more maintainable solution,
the short-term solution is often chosen, deferring the long-term problems, which
usually result in greater total expenditure. There are times where the immediate
needs make that a logical choice (e.g., complete failure vs higher cost), but
I'd argue that usually isn't the case.
Another consequence is to decide to do the software "in house", using staff on
the instrument science side, justifying it as not using the very expensive
software developers. This often makes sense if the idea is to explore solutions
before settling on one to be handed to the developers. This is done
all the time, quite reasonably. But sometimes it results in a product that has
to be supported by their staff, and often results in significant problems.
I will give a few examples.
Examples
When many decided that Exposure Time Calculators (ETC) were essential for HST, one
of the instrument teams decided to develop it on their own and support it
through a web interface. But over a couple years they found it impossible to
support since it had not been designed well. It was handed to us to fix.
It needed to be completely rewritten. Interestingly, we found out another
instrument team was modifying the original one for use with their instrument.
When I contacted the person working on it to suggest that they use the
more reliable and easier to maintain rewritten one, they agreed that was
the sensible thing to do, but they couldn't since their manager would not
give them the extra time to use the new one, thus illustrating both
management effects described above.
A second example also relates to the Exposure Time Calculators. This time
it arose from a science side effort to unify the exposure planning tool and
the ETCs since they share many inputs, thus reducing the effort on the
users. Prototype GUI applications were developed in Java for one HST instrument,
and this led to a full project to support all the instruments. Much work
was done on these GUIs and at a point 18 months or so later, the division
management of the instrument teams was given a status report on their progress.
It was at that point they realized that these GUI's could not be used on the
web, thus forcing proposers to install the software (despite this fact
being known the whole time, but with key management not paying attention
to the consequences) This resulted in
the requirement that the ETC be supported both ways. Ultimately this resulted
in a mess since the architecture was all centered on a GUI approach, and the
pressures of deadlines did not adequately allow for refactoring the design
(the GUI version fell out of support, ironically). Despite more and more
problems arising out of this hurried approach, the pressure was always for
more features. This kept until the whole system collapsed under its own weight
and had to be rewritten in Python. The software engineering aspects always
were deferred until it all failed.
The last example was with a quite complex script developed by an astronomer
to deal with combining data from multiple exposures. The
complexity really cried out for a reworking of the design to make it more
general (e.g., usable for other telescopes) and maintainable. But the
strong desire was to use the script approach as a basis since it was quicker
to get out there. As a result, the complexity, inflexibility, and maintenance
problems persisted a long time (and still do).