Natural Language Processing
Working it out in logs
Railway safety management is a complex subject that involves a significant amount of manual intervention in the assessment, analysis and control of risk. Supporting documentation is usually worked on by multiple parties, with differences in system viewpoints and writing styles. Maintaining quality safety documentation is therefore an interesting challenge for the industry. Hazard logs, for example, play a central role in both system engineering and risk assessment activity. The role of the log is to present a representation of the risks related to the system under consideration; its content relies upon input from a variety of sources and collaborative activities involving teams with varying expertise and knowledge. From experience, we have found that the quality of this information can vary greatly both within and between projects. The volume and variety of the data and the need for collaboration creates the significant challenge of managing the content, keeping the textual readability, format and consistency. We are currently working on a tool that assesses the ‘quality’ of a risk log in either ‘real time’ or at regular intervals to check the output from critical risk workshop sessions. It uses Natural Language Processing and machine learning to assess the quality of a hazard log based solely on its textual content. The method includes text classification and term frequency-inversion to identify important keywords to represent quality indicators. The intention is not to replace a human expert, but rather to support assessments by providing an early indication of the textual data in a given log. This involves checking for signs of imprecise and unclear writing and identifying issues that may make it hard for readers to fully interpret incident sequences. The tool has been built around CENELEC standards to aid compliance with both standards and risk management best practice. The Intelligent Hazard Log Tool (IHLT) has been developed in collaboration with Lancaster University and several applications have been undertaken to prove the method. Results have demonstrated the power of textual analysis in this arena and have identified a number of quality indicators; demonstrator software has performed well against a manual evaluation of a sample data set. The results of this product deployment will be presented at the Transport Research Arena conference in Vienna in April 2018.