NCIS briefing Tuesday afternoon: don’t forget to bring popcorn

Tuesday afternoon, in a joint meeting of two committees, the Council will get briefed by representatives of Seattle City Light, Seattle Public Utilities, the newly-formed Department of IT, and an outside consultant on the NCIS billing and customer service IT system that is very late and very over budget. And going by the published materials they have submitted in advance, they have drastically misread the situation and it’s going to be a bloodbath.

For those not following the NCIS debacle, here’s the cheat sheet: Council member Kshama Sawant’s Energy and Environment Committee has oversight of Seattle City Light, and Council member Lisa Herbold’s Civil Rights, Utilities, Economic Development and Arts Committee oversees Seattle Public Utilities. The NCIS system is being developed for both SCL and SPU to use. The project was approved in 2012, and work began in 2014 at a projected total cost of $66 million and with an expected completion of last fall. Last year they quietly asked for more money in the 2016 budget, raising the total projected cost to $85 million and moving out the delivery date six months; this only became apparent last month — at which point they admitted that the expected completion was now fall of 2016 and the price tag is “over $100 million.” The Council naturally flipped out, as they should, and Tuesday’s joint committee hearing was scheduled to dive deeper into the mess that is NCIS.

There are two presentations scheduled for Tuesday afternoon’s meeting: the first is by SCL, SPU and the Department of IT, based upon this memo they submitted. The second is by a contractor on the project, KMG Consulting, who apparently provides “Project Quality Assurance” assessment and reporting to the Executive Steering Committee of the NCIS project.

Neither is particularly enlightening. There are three plainly obvious questions that need to be answered about NCIS:

  1. What happened — in particular how and why did the scope change? (Council President Bruce Harrell specifically asked for an answer to this last time, and didn’t get one)
  2. What problems still remain?
  3. What changes are you making, or have already made, to address the problems that still remain and to ensure that there will be no further delays or overruns?

Both presentations are problematic in that neither presents informative or in any way convincing answers to these questions; worse, they provide conflicting views of the state of the project today.

The SPU/SCL/DoIT memo claims that they are well into the testing phase, and that they have an extensive “Day in the life” suite of tests that exercises all the business requirements of the system. They ran the entire suite on March 31, and while it didn’t meet their quality bar they claim it gave them an inventory of everything that still needs to get fixed. With that knowledge, they have a plan to fix all the defects and they have high confidence in the September 2016 rollout date. There is, however, no data provided in the memo that would lead anyone to agree with their conclusion. In fact, there is no data at all regarding the basic metrics of software development and testing as applied to this project. Perhaps they thought that the City Council would have little interest in the data, or would lack the ability to make sense of it. And they would be wrong; first, they are all data nerds. Second, Sawant is a former software engineer. Moreover, it was an intentional lack of transparency to hide troubling information that got them on the wrong side of the City Council in the first place, and betting that they won’t want to see data that the NCIS team is choosing not to reveal is not wise.

Even if the Council trusted the SPU and SCL management (which they don’t), they should still ask for project data, because there is a red flag hiding in plain sight that suggests the project is still out of control: the project has repeatedly slipped six months for every six months that pass. As someone who has worked on multiple large software projects, I can attest that when the end date stops getting closer, the management team has no idea when they will really finish. NCIS has been approximately six months away from completion for a year. Maybe it really is six months away, but the odds say it is just ask likely to be seven, or eight, or five. For the new end-date to be consistently moved six months out means that the date most likely came by fiat from above, and not from the data. No one should believe them until they cough up numbers to support their assertion.

In the meantime, the memo also reveals that the new projected cost is $109 million. They were nice enough to give a breakout:

project budget

Of course, once again they chose not to provide any useful insights that would help answer the useful questions, like what the original numbers were when the budget was $66 million. These numbers will, however, serve as a nice “weapon of mass distraction” as the Council members ask about the $3 million spent on “interest/rent” and particularly the $4 million “management reserve.” — has that been spent already, and if not, what is it being reserved for at this point?

Instead, the SPU and SCL folks want to reassure everyone that they can contain the overrun through a nearly imperceptible change in utility rates:

rate adjustments

On one hand, this will be reassuring to the Council, as they wrestle with budget issues over the next month, that they don’t have to find $24 million to cover the overrun; on the other hand, we should all be angered by the notion that there is no accountability at SCL and SPU: if their expenses run over, they simply raise our rates.

The TMG consultant’s presentation is nearly as useless. There are seven slides; two are about how great TMG and the presenter, Tim Almond, are, arguably to make us believe that we should trust what he tells us. The next two are “meta” slides: telling us what Project Quality Assurance Assessment is, and reiterating for us what the “definition of success” is for NCIS. The three remaining slides purport to give us useful information on the project itself.  The first is this paragon of overstuffed information visualization:

TMG historical view

Check out the bar chart at the bottom, telling us that something has bounced between low and high over the course of the project. It tells us neither what that something is, nor does it provide any units for measuring it. It also moved from orange to red; we’re probably to assume that red represents a greater cause for concern.  And then we have bracketed time ranges, whose labels seem to suggest causes for various problems in the project. We can hope that Almond chooses to provide some data behind the labels, because they too mean little. Apparently “late start of bill design” lasted for 13 months; I can’t even guess to what that means. And there was schedule slippage during the testing phase due to “management disciplines.”

Software metrics is a well-researched area; it doesn’t necessarily stop projects from going over time and over budget for all sorts of reasons, but it’s well understood today how to spot the telltale signs of  both well-run and poorly-run projects — particularly in hindsight.

The closest thing Almond provides to helpful information is his list of current risks:

project risks

But again, without quantifying these the list of risks is nearly useless. Is there lack of oversight on all dependent/coincidental projects? Are there particular ones that are higher risk?  Assuming that the “IT resourcing” issue refers to understaffing, how far understaffed is it, and is the understaffing spread evenly or concentrated in particular areas? The same question arises for the “project staff is tired, and operational staff are working harder” — how much are they working, and how much rest are they getting?

The “remaining testing and undiscovered defects” risk is bizarre in that Almond claims the risk is increasing. That would only be true if new functionality was still being added, but according to the SCL/SPU memo, they are already running their “Day in the life” test suite across the whole system. If that’s true, then the risk can only decrease as they continue to test and fix defects. So someone is not telling the truth here.

Almond lists recommendations, but most of them say “keep doing what you’re doing.”

recommendations

He does raise the concerns that there isn’t an appropriate plan for staffing after the system goes live, and suggest that a governance and staffing plan needs to be put in place; that’s a real issue, and the SCL/SPU memo addresses that as something they are working on. But that is entirely independent of how they got into the situation they are in now. After all this discussion, we are no closer to having answers to the three big questions: what happened, what problems still exist, and what is changing in order to resolve the problems?

And there are other complications, such as the fact that last month the Department of IT scooped up IT staff from across city departments into a new centralized organization. One would assume that included at least some of the 100 internal SCL and SPU staff working on NCIS (alongside the PriceWaterhouseCoopers consultants on the project). How did that change the management structure for the NCIS team?  It might have made things better by bringing them together; or it might have made them worse by shaking up the team at a critical moment.

It’s a curious choice, and arguably a misread of the situation, for SPU, SCL, the Department of IT, and their QA consultant to show up and give little useful information about what is really happening with the project. This Council likes to dig into the details, and doesn’t like it when the executive branch withholds those details (Which they are doing with increasing frequency). The presenters should expect to get pummeled with questions, especially by Herbold and Sawant but also by Harrell and Burgess if they decide to show up to the committee hearing. There is certainly no shortage of questions to ask them; here is my list:

Questions for TMG:

  • How many people were on the QA assessment team?
  • How many hours has Arnold spent on the project himself (people with “executive vice president” titles at consulting firms tend to parachute in for important presentations, but usually aren’t the people doing the work in the trenches)?
  • When was TMG brought in?
  • You say that you have been making monthly reports. When did you start noticing and flagging issues with the project, and what were the responses?
  • Can you share your monthly reports with the Council?
  • Which dependent/coincidental projects lack oversight?
  • WTF do the bars represent in the bar chart?
  • What do you mean by “management discipline” causing schedule slippage in testing?
  • What issues do you see with IT resourcing, and how should that be addressed?
  • What do you mean by “too product focused” as it refers to initial training? As opposed to service-focused?
  • How could the risk be increasing for remaining testing? Team says that they’ve run their entire “day in the life” suite of tests, which implies that all the components now exist. What’s missing that would be a source for additional “unknowns”?
  • Most of your recommendations amount to “keep doing what you’re doing.” What needs to change?
  • How much of what has happened in this project is typical for projects like this on in a public utility?
  • What do you mean by “late start of bill design”? How late did it start, why, when did it start, and what problems did that cause?
  • You say that there were “changes in scope” but no additions. What kinds of changes in scope did you see?
  • Do you believe that the project will “go live” in September?
  • TMG’s web site has a report claiming that 67% of utilities run into major issues with projects to replace their CIS system. Are the problems that SCL/SPU have seen typical? And should they have been foreseen?

Questions for SPU, SCL and the DoIT:

  • Why and how did the scope of the project change?
  • What fraction of the test scripts in the “day in the life” test suite failed when you ran them on 3/31? Were the failures concentrated in particular areas, or spread evenly across the project?
  • How do you respond to TMG’s statements about issues and risks in the project? How do you respond to their assessment of what led to the project being late and over budget?
  • What areas of the team are understaffed, if any, and what is your plan to address that?
  • How does city-wide consolidation of IT staff affect the project? Who is accountable for the project now? Who has final decision-making authority?
  • How does the breakout of the newly-revised $109 million budget compare with the original budget of $66 million? What’s the $3 million for interest and rent? Has the $4 million “management reserve” already been spent? If so, on what? If not, then assuming everything goes perfectly from now on does that reduce the final cost to $105 million? Who decides how the “management reserve” is spent?
  • How is the cost overrun split across SPU and SCL?
  • You say that the NCSI project is still in the construction/implementation phase. Are all components complete and being tested now, or are there still components being written?
  • How many simultaneous users can the current CIS system support, what’s the target for NCIS, and what’s its current maximum based on testing to-date?
  • When you “cut over” to the new system in September, are you going to move customers over in phases or everyone together?
  • What is the governance and staffing model for after the “go live” date?
  • TMG suggests that the team is overworked; can you quantify how hard they are working and whether that poses a risk to the project? What measures are you taking to ensure they are well rested and doing high-quality work?
  • Where is the accountability in SPU, SCL and DoIT for this project? What are the consequences for being late and over budget? And why should SPU’s and SCL’s customers pay for the cost overrun — if customers automatically pay for overruns, then why would SCL and SPU care if a project went overbudget?

On the surface, it looks like SPU and SCL are planning to continue their preference for hiding details. Let’s hope — for their sake — that on Tuesday afternoon the presenters show up with a newfound commitment to transparency and volunteering information. Otherwise, they are in for a long, painful afternoon at the hands of Sawant, Herbold, and their colleagues on the Council. And it will be well-earned.