Resilience on the Railways

Digital Camera

In this post, I recount my personal experiences of a(nother) failure on the railways in Scotland. Based on what happened during this event and how it affected me (and other passengers), I also suggest some pragmatic solutions that could be used to make the system more resilient to failures like this in the future.

Another Day, Another Failure

The other week I had to travel to Newcastle for a meeting. On arriving at Markinch train station, travellers were being advised that there were no trains going north from Edinburgh Waverley because a train had failed in the tunnel en route to Haymarket station. The failure must have happened around 7:00 since the northbound train had been due to leave Markinch at 7:47 (about 50 minutes from Waverley).

We were advised that there didn’t appear to be any problems with trains heading south into Edinburgh. This was a relief, as I had to catch a connecting train from Waverley at 10:00. There was some slack time built into the journey when I bought the tickets, so I would have about 35 minutes to make the connection.

Some people (travelling north) simply decided to work from home, rather than wait an indeterminate length of time in the forlorn hope that a train might show up soon.

My train arrived more or less on time: so far, so good. Then, shortly after crossing the Forth Rail Bridge, the train stopped, near to the airport. We were informed that we’d caught up with the trains in front, and there was a backlog, so were being held. The train guard couldn’t tell us how long we’d be delayed. We were still pretty much on schedule, so there was no need to worry yet.

From that point on, however, it became a stop-start journey, as we inched towards Haymarket. The guard kept apologising about the delay, and then giving out details about the Delay Repay scheme. This is the passenger compensation scheme, which kicks in when a train is delayed for over 30 minutes, so the implication was that we would be at least 30 minutes late. The guard still couldn’t tell how long we’d be delayed, however.

In the interim period, an ex-colleague had reported that his train had been terminated at Haymarket, rather than waiting any longer to continue to Edinburgh Waverley.

Our train was now awash with people making phone calls, explaining that they would be arriving late for work, meetings and so on, although they could not say how late.

It was somewhat frustrating at this point to see Westbound trains apparently operating normally between Edinburgh and Glasgow.

En route we passed Haymarket Depot, where there were several unoccupied tracks. Pulling into Haymarket station we could then see a train standing on platform 0, which is normally vacant. This, presumably, was the stricken train.

It was now almost 10:00, so the chances of making my connection had gone. Again the guard continued to apologise for the delay, and to publicise the Delay Repay scheme. He also noted that there was a train heading south from Waverley that would leave shortly after 10:00, but he didn’t know which platform it was on. He suggested that people travelling south on the East Coast main line might be able to make it.

We arrived around 10:08, and the departure boards were somewhat confusing: it wasn’t clear which trains were already in the station. It also wasn’t clear what the ticket situation was: it turns out that if you miss a train, and it’s not your fault, you can use the next available service (run by the same company, I believe) without paying any extra.

This was a major failure at rush hour. It had lasted about 3 hours, and must have inconvenienced thousands of people. Although this particular type of incident may be unusual, delays on the trains are not. Earlier that same week, ScotRail had announced plans to abandon the practice of “stop-skipping” during rush hour: trains that are running late don’t stop at all the planned stations along the route as a way of trying to make up time and avoid being late at their final destination.

The rail system clearly isn’t very resilient. I can’t believe that there has never previously been a failure of a train in the same tunnel, because Scotrail operate a lot of old rolling stock. So what can be done to improve things?

How Can We Make the Rail System More Resilient?

Having experienced the effects of the failure first hand, it set me to thinking about how the system could be made more resilient. I’m talking here about a practical analysis, based on my knowledge and experience, rather than a major field study involving lots of time and money. I’m also talking about pragmatic solutions that could be implemented fairly quickly at little cost.

The rail system is set up in such a way that its purpose has effectively morphed into making sure that performance targets are met, i.e., that trains arrive at their destinations by some predefined time. If they don’t meet these nationally defined targets, the companies can be subjected to sanctions.

The real purpose of the rail system (and public transportation, more generally), though, is to help people (not trains) get to their destination by their allotted time. This may involve several modes of transport (e.g., driving to the station, taking the train, then catching a bus or walking). Since the sorts of failures that can occur on the railways are well documented, many of them can be anticipated. So rather than just react when a failure occurs, as if it is a new event, it should be possible to provide a backup in case a failure occurs. If we regard the passengers as part of this broader socio-technical system, we should allow them to use their adaptability and flexibility to help the system achieve its overarching goal.

By considering the four essential capabilities of resilience (see Resilience Engineering in Practice: A Guidebook edited by E. Hollnagel, J. Paries, D.D. Woods & J. Wreathall, 2011):

  • Knowing what to do, by adapting to the prevailing situation, activating predefined responses to deal with variability and disturbances.
  • Knowing what to look for, and monitoring changes to the system and its environment help determine responses.
  • Knowing what to expect, to anticipate future events that can disrupt operating conditions.
  • Knowing what has happened, so that lessons can be learned.

it is relatively easy to come up with a few quick suggestions for how the resilience of the system can be improved, to make it better equipped to cope with failures in the future:

  • Better communication. My first train left about 90 minutes after the train had failed between Edinburgh Waverley and Haymarket. This information could have been relayed up the line to all stations, and all trains heading south to Edinburgh. Passengers could then make an informed decision about whether to use the train to make their journey, or to make alternative arrangements. This could even be extended to pass on the information that trains are all operating normally.
  • Better support. The two tunnels between Haymarket and Edinburgh Waverley are a bottleneck. It would therefore make sense to have an engine at the Haymarket depot to rescue trains that fail, at least during rush hour.
  • Learning from past failures. When a train or a signal fails the system often grinds to a halt. The system could be made more flexible by holding trains at intermediate stations, for example, rather than continuing towards the point of failure. Passengers could then be allowed to alight, so they could at least try to find alternative ways of continuing their journey using other modes of transport.
  • Rethinking performance measures. When systems have to meet pre-set performance targets, effort tends to be directed at focusing on compliance with those targets, rather than making sure that the system meets its real When a major failure (like the one described here) occurs, we need to think about (at least) temporarily suspending targets.

There are others too. It’s not rocket science (or Stephenson’s Rocket science!). It just requires a bit of planning and forethought. That’s how you achieve resilience. Not by sending the transport minister to sit in the control room the morning after the failure occurred.