Welcome to the first FutureDAMS podcast. In this episode Stephen Knox discusses his latest paper which presents an innovative data manager developed by FutureDAMS researchers, which supports models for resource networks like water or energy. Steve is the software development team lead in the water resources group at the Department of Mechanical, Aerospace and Civil Engineering (MACE) at the University of Manchester.
You can listen to the podcast by visiting Soundcloud or by clicking play below.
Below is an edited transcript of the recording.
Read the paper: ‘An open-source data manager for network models’
What motivated you to work as a software developer at MACE?
I noticed when I was doing my PhD that often people would have a really good idea and implement it just enough to get their PhD from it and then it would just die. So I made the made the decision to become a professional software engineer to learn how to write software really well, but with a view to coming back to academia at some point in the future. So from there, I moved to London and I got a job in a gambling company, where I spent a number of years developing gambling websites, which are as strange as it might sound, gave me an excellent grounding in software development. From there, I saw an advertisement to work in a water resources research group doing software development and I got this job in 2013. What really attracted me to it was the fact that you feel like you’re making a difference to the world, which is a very heart-warming feeling, and the stuff that we’re building, specifically when we’re working with developing countries and we’re training people in the software, you can really see the change of attitude towards their water, and they’re using our system to actually can see that change happen. It’s very gratifying. So I think that’s a very brief history of me.
“When we’re working with developing countries and we’re training people in the software, you can really see the change of attitude towards their water, and they’re using our system to actually can see that change happen”
The paper: ‘An open-source data manager for network models’
I was brought on board to try and develop this system which would allow data to be stored for lots of different people. Lots of different researchers from all around the world were doing very similar work but they were all producing bespoke solutions but there’s a lot of overlap between what everyone’s doing in the way they store their data and usually, everyone’s storing the data in an Excel spreadsheets that look very similar but aren’t compatible. So what if we could create a system where all everyone’s data could be stored in exactly the same way and in doing that, you can not only share the data much more easily, but also you can validate the data much more easily? And there’s a multitude of benefits to having your data stored in a structured, curated, validated way. So that was the basic idea.
The way achieved that was to figure out what the overlap between all of these different pieces of work were. We were specifically working on was the modelling of infrastructure systems, so water systems on a macro scale (not towns and cities, but reservoirs and rivers, energy systems, transport systems, and food systems). Typically in a computer you represent the links between the biggest pieces of infrastructure with a network of nodes and links. For example, in a river system or a water system, a link might be represent might represent a river or a pipe, or a node might represent a city, or a reservoir or a power station. And coming from a non water or a non energy background, I tried to identify what all of these commonalities were, and then try to extract all of those commonalities and encode them in the system so that any person could come along and recognise the structure of their system in our system. Users can align their concepts to the Hydra platform and they could see that their data could actually be stored in here, all they need to do is translate the Excel file into the Hydra platform and then all of a sudden I’ve got my data stored in a much more structured way. And then another researcher can come along and do the same thing and then another. And all of a sudden, you’ve got lots of different researchers storing their data inside one central repository with the same data formats, and all of the benefits that that brings.
Empowering stakeholders
The traditional way of doing modelling was to simulate a system and then present your results to a stakeholder, or present the results in a paper or something like that. Typically what you would do is you would run your simulation, lots and lots and lots of times and you would pick the results that you find interesting, put them in a PowerPoint presentation, present them to the stakeholder and then that’s the end of the story. What we’ve come to learn now is that the stakeholders, particularly the industrial stakeholders, like water companies, need the ability to look much deeper into the results. So if you had presented something in a PowerPoint presentation, and then they asked, ‘what does this result actually mean?’, in reality, then you would have had to spend two weeks going back and re running your simulations and figuring out what it actually means, then creating another PowerPoint presentation. Whereas now, with a system like this, you give them access to the system itself, they can have a login because it’s all web based, then they can inspect the data all by themselves. It takes that middleman out of the equation entirely and it gives people much more power to analyse data and much more flexibility.
The Hydra Platform
The main thing about the Hydra platform is its ability to store multiple different types of systems inside the same system: you can have water system stored in the Hydra platform database, and an energy system which is entirely different, and another water system or food system – it doesn’t care what kind of system it is, so long as it’s capable of being represented as a series of nodes and links and it has data associated to those nodes and links, just like any other model. It’s based on a templating system so when the data is stored inside of Hydra platform, it’s all relative to the definition that the user has produced, which means that you can more or less store any type of system inside there. The real benefit of the Hydra platform is that flexibility.
Benefits of the data manager
The data management allows you to do number of things. First of all, there is an ordering system, which means that if you change a piece of data, that change is recorded. That’s really good for security reasons and for going back and checking if you’ve made a mistake. There’s loads of fringe benefits to that. The fact that we have added a web server on top of the data management itself means that you can just give anybody a login and from anywhere in the world, they can just access their data. So that’s a huge benefit of this. And it’s all open source so as the need and the usage of Hydra platform grows, so does the level of feedback we receive, so does the level of confidence people have in the system and also, so does the level of features and bug fixes and stuff like that, because the community itself will start, well has already started, contributing code. So it starts organically growing, and it’s not just a system that’s entirely controlled by us – that’s a really valuable thing because then, somebody has a different opinion about how something should work or how something should be represented, then it can grow organically based on that. And it’s a very nice way of doing this particular thing, because this is a very much a community idea.
Database – Python API – web server
There are three levels. The first level is a standard database. And then connecting to the database is a piece of Python code. That’s a Python API (application programming interface) where you say something like, update network, remove node, that kind of thing, all these commands that you can perform within the system. And then the third level on top of that, then is a web server where you can you can run those commands, but through a web browser. And then kind of parallel to that there’s an actual web interface which connects to the web server, so that a user can login and there’s an actual website where they can click buttons and visualise the network on a map.
Examples of application
There are two main examples of where the data manager has been used. The first was a consultancy project we did for one of the water companies in the UK called Anglian water. What they wanted to do was remove that PowerPoint presentation style of interaction with their results and they wanted to be able to run their own model themselves, input the data themselves, run the model themselves, inspect the results themselves, and rather than the previous approach, which was the email the results to us, email the inputs to us, we’d run the model here locally, put the results in a PowerPoint presentation… Oh, I don’t like those results… Can you do it again two weeks later. So moving from that kind of dynamic to, they upload the inputs into the web site, run the model themselves, see the results. Something that used to take two or three weeks now takes two or three hours and they’ve got full autonomy.
The second use case is our own. So the first use case we created a bespoke version of our website for energy and water that’s entirely separate from our public website. So our website is for public use, anyone can log in and start running models and making models if they want to and this is what we use for training. This is what we use for creating multiple different types of models. So we’ve got loads of different types of water models on there now. And then we perform training sessions in developing countries. So you could have a training session in Kenya and a training session in Ghana, and all of those people are logging into the same server, our website and running entirely separate models.
So there’s two kinds of cases one is that a very bespoke kind of consultancy style thing work which provides very bespoke solutions, and then there’s a much more general solution where you have to be very careful about generality you need someone to be able to login, having never used a system before, and be able to start dragging and dropping things and for it to work. And if it doesn’t work, it needs to be very clear why it didn’t work. After a number of years of developing the software, now we’re entering a world of making it nice for users and non expert users, which is an interesting transition.
Impacts
There are lots and lots of benefits to this. The first is the speed at which they can see results. So the, the former two weeks, which is now two hours or two minutes, means that the results that are produced from these models, certainly for the water companies, are far more accurate and far more well researched by the water company and data that gives them much more confidence in those results. From the kind of general users point of view, it’s an interesting one because the having something like this, particularly the web interface where people can actually see the network of their country and see how all of these things connect together, and it’s all free and it’s all available online (there’s no licencing fees or anything like that), means that in developing countries and even here, people actually engage with the whole decision making process. And they able to be able to share results with one another and they can have multiple people sitting on different desks looking at the same data and better understanding the consequences change. Like, if I build a huge reservoir over here, it’s going to screw up the fish plantation over there or something like that. You can do that really, really quickly, whereas before either that piece of analysis wouldn’t have been done at all, or would have taken months of planning to try and make that happen. And now you can do that really, really quickly. To summarise, one of the benefits is building confidence in results and allowing people to have the autonomy to run those models over and over again, themselves. And the other one is giving non experts exposure to this stuff and training. We’ve been working hard on training people on what does water management mean? What does modelling mean? What impact can it have? You know, because in a lot of countries, no modelling happens, you build a reservoir, because you kind of go, ‘Oh, we need a reservoir… Let’s put it over there’,
but now we’re opening people’s eyes to the benefits of really planning something. And this whole system has enabled us to do that.
The underlying abilities of the Hydra platform and its ability to store all these different types of networks and all of these different types of models, but also to be able to share those models with your colleagues – that’s what enables all of this to happen. When we go to a training session, we have a single version of the model, which we then share with 20 people, and then in the space of five seconds, everybody refreshes their page and they’ve got that model in front of them. With any other system in the world, that’s not possible. And that’s one of the huge benefits of it.
Image by Florante Valdez from Pixabay