Sunday, 4 August 2013

Introductory Product Information



Introductory Product Information

Name 

The Scientific Workflow and Integration Software for Health (SWISH) and the Extreme Weather Events Database (EWEBD)

 

Primary users 

The primary users of the software are epidemiologists, health researchers, statisticians, and other scientists whose work involves handling multiple datasets from different sources. The primary users of the EWEDB are researchers that need to safely store and work with sensitive and restricted data and researchers working with health, population, extreme weather events, meteorological, or climate data.


 

"Elevator pitch"

To access the impacts of climate change on health you need to gather large amount of data. The data needs to be clean, in consistent spatial framework and arranged in preparation for analysis. The SWISH project helps with data access and preparation. It reduces barriers like the need for advanced knowledge of databases and programming skills, instead providing drag and drop user interface.
Researcher can create executable workflows using the SWISH tools that capture documentation and processing steps. The system incorporates Stata, R, and Java technologies in a consistent platform and can be extended to incorporate new functionality developed by users. Workflows are saved as a single file that is easy to share with other researchers supporting reproducible research and collaboration.




Links to product

The project website is
The ANDS project blog is
The Software installer can be downloaded from
The EWEDB address is (for PostGreSql clients)
brawn.anu.edu.au:5432 
Other important links like to Kepler, R, java can be found on the blog and project website.


 

User documentation

The software documentation is available from the project website  Documentation. The documentation is broken into three sections; Setting up your Environment, Assembling Scientific Workflows, and Developing with Workflow Software.
‘Setting up your Environment’ is information about the tools and software required, where to get them and how to configure them. There are also test workflows to verify software is correct and the database can be accessed. Setting up
‘Assembling Scientific Workflows’ contains information about using the SWISH Kepler actors and the EWEDB for sourcing data and processing data. It contains a step by step tutorial, example workflows and useful common tasks. Assembling workflows
‘Developing with Workflow Software’ is information about how users can extend the system adding their own custom functionality. It details the steps to use R code and encourages the project to continue to evolve with support and development from the user community. Developing with workflows




Technical documentation

Technical information about the project can be found on the project blog and in the documentation section of the project website.

Installation and configuration of included technologies

·         Technologies-and-features
·         Tools

R

·         Set-up-r

EWEDB

·         Set-up-swish-computer
·         Hello-ewedb

SWISH Kepler actors

·         Installer
·         Installer-details

PostGIS

·         Postgis-utils

User testing reports:

·         a-swish-user-test-report

The source code is available at





Pictures

SWISH Kepler actors installer


Installed items


Kepler with workflow


PostGreSql password editor


Display grid actor


Display time series actor

 

 

 Product (or Product Components) Re-usability Information

The project has been developed with the motivation of supporting research in epidemiology: the study of the distribution and determinants of disease. Most of the implemented functionality however is not necessarily specific to the epidemiology discipline. The SWISH Kepler actors implement many table, time series, and grid data operations that would assist other spatial or temporal data based research endeavours. The EWEDB makes weather data and other data available which will continue to be accessed and used by epidemiology and other researchers.
The construction of the product also contributes to its reusability. The Kepler actors that make up a large portion of the software product are by definition reusable components for creating workflows. The R portion of the project has been packaged into R projects that Kepler actors use.  These packages can also be used outside of Kepler directly from R.


Contextual Product Information

Licensing

The software is licensed under the Creative Commons Australia Attribution 3.0 Licence.  Access to the EWEBD is available but requires NCEPH collaborations to be set up.  Interested users are invited to contact the NCEPH Data Manager for more info.



Sustainability

The project has concluded its development phase.
The web site, blog, software and source code will remain open and available into the future. The project website and the project blog will remain hosted by the github.com and blogger.com. The SWISH software and code will remain freely available from the website.  The EWEDB will be hosted by the ANU library for the at least the next year and possibly beyond.
No future updates are planned for the SWISH Kepler Actors software by the staff at the ANU. Future development of Stata functionality is opened up to the Kepler community because we have communicated with the developers of Kepler about our development.
The EWEDB will be periodically updated adding new data that has become available. Ivan plans to continue to use and develop the R code developed during the project but rebadge it as a slightly different product.


Wednesday, 3 July 2013

SWISH Kepler actors user testing report

We invited Peter Manger from the Fenner School at the ANU to be one of our test users. Peter has a background in simulation, modelling and software. He is a good example of someone proficient with data processing and analysis but with no previous exposure to SWISH tools.

How well did the user use SWISH?

Peter made his way through the tutorial, quicker than I expected for someone who had never used the software before.

During the testing two mistakes occurred

  1. A missing an input link to an actor. When run the workflow reported an error, Peter quickly identified the omission himself and corrected it.
  2. The identifier "Date" was used with a capital letters instead of "date". When run Kepler reported an error message; however this problem was too subtle. Even after I pointed it out Peter was unclear as to why the error was occurring.
The images in the tutorial provided the necessary details to continue through the tutorial without encouraging Peter to notice and understand the accompanying text. The tutorial explains that all the links need to be connected for the workflow to run and the identifiers are the names of columns in the input data.

Did the user complete the analysis they intended to do?

Yes!, Peter successfully assembled an operational workflow that ran and produced the correct result by following the tutorial.
Peter was then curious and started to 'play' with values in the heat index calculation. He lowered the maximum temperature limit and increased the min - max threshold to generate a greater number of 'heat waves'.

What features did they find useful and what features did they have difficulty with or wish to see (and are subject to future improvement if there are available resources)?

Peter was able to quickly use the installer and update the software to the latest version of SWISH actors. Peter found the drag, drop and link nature of Kepler intuitive and usable. He was able to easily find all the actors needed by the tutorial. The actors used where clearly named and labelled, Peter found them simple to use.
The biggest problem is the handling of errors. The SWISH actors that use Stata report their own error messages. The SWISH actors that use R do not report any error messages other than the fact an error has occurred. General Kepler actors report the Java stack trace from where the error occurred in Kepler source code.
None of these are useful to the user and could be improved.
The error reported by the Stata based actors indicate the error that Stata had, but it is always a consequence of a problem located somewhere in the lead up to running Stata. The error messages require an intimate understanding of how SWISH operates to decode.
The R errors are completely opaque and provide no useful information.
General Kepler errors are also cryptic and not helpful.
Error handling for general Kepler actors is the responsibility of the developers of the actors and Kepler to manage. The SWISH Stata actors, although they perform their own error checking and reporting would improve the user experience dramatically by reporting more meaningful messages and solutions to the user. At present the SWISH R actors are completely lacking in any form of error handling and need to implement it within the R code itself.

Observations

Peter had trouble linking some actors together as the links 'snap' to other nearby links or port. Peter worked around this when necessary by dragging links the long way round to avoid connecting to the wrong place.
Peter did not run the workflow until the very end. This made finding errors more difficult.
Peter did not realise that by creating and using the workflow in the tutorial he was using the statistical software Stata.
The understanding Peter gained from the tutorial was mainly operational details of how to use Kepler.

Peters feedback

Peter knew that Kepler was existing software but was unclear what part of the exercise corresponded to SWISH project. His suggestion was to package all the actors available in a SWISH subgroup of some kind.
Peter liked the images, and found the progression from individual small steps at the start of the tutorial to boarder instructions at the end made sense.

Thursday, 11 April 2013

The Extreme Weather Events Database has it's own website.


The SWISH project has a second website devoted to the database/analytics server  (the EWEDB)

http://swish-climate-impact-assessment.github.com

From those pages users can explore the scope of the database, find examples of data analyses and also apply for access to the data.

Sunday, 14 October 2012

Project Outputs


The project's main output will be Kepler "actors", with documentation.
These will be hosted on Github, with dissemination: principally through peer-reviewed publications.
However our chief contribution will be promoting the conceptual approach: using a scientific workflow system of any sort.







Technologies and Features

Development

Kepler 2.3 Kepler
Stata 9 / 12 Stata 9 / 12
Rundo for Stata Rundo for Stata
R studio R studio
RR
Microsoft visual C sharp express Microsoft visual C sharp express
EMacs EMacs
Java JDK (6u25) Java JDK
Ant Ant
Eclipse Eclipse

Version control

Git hub Git hub
TortoiseGit TortoiseGit
Git Git

Operating systems and tools

Putty Putty
WinSCP WinSCP
Windows Windows
Linux Ubuntu Linux Ubuntu
PGAdmin PGAdmin

Admin

Dropbox Dropbox
Skype Skype
blogger.com blogger.com

Key Factors


Scientists will use our systems and tools if they support data acquisition, data management and data analysis that is faster, more reliable, more transparent and better documented than their current methods.
Our goal is that other researchers than just ourselves will publish peer-reviewed papers that acknowledge our tools, so that they become a recognised and accepted contribution towork in the relevant disciplines.

Target Customers

Our software tools will be used by epidemiologists, health researchers, statisticians, and other scientists whose work involves handling multiple datasets from different sources. The exemplar is research on the health impacts of extreme weather events, which requires merging meteorological, health and demographic data to permit statistical analysis of causal associations.

Contributors