1、Celia Russell, Stephen Pickles and Mike JonesCombining Data Workshop ESRC Research Methods Programme Manchester, December 18, 2002,SAMD,Seamless Access to Multiple DatasetsA ESRC/DTI e-Science demonstrator project http:/www.sve.man.ac.uk/Research/AtoZ/SAMD,Supercomputing, Visualization & eScience,2,
2、SAMD,Seamless Access to Multiple DatasetsA project to demonstrate the benefits of applying e-Science grid technologies to an ordinary social science query We solve a genuine problem from the UK academic social science community - a multivariate analysis using a complex mathematical algorithm Based o
3、n a major social science databank, the Office for National Statistics Time Series Data, hosted at MIMAS,Supercomputing, Visualization & eScience,3,The problem,Published as Sensier, M., Osborn D.R. and cal N. (2002) Asymmetric Interest Rate Effects for the UK Real Economy , Oxford Bulletin of Economi
4、cs and Statistics, Volume 64, September 2002, n4The research query looks at the effect interest rate changes had on Gross Domestic Product in the UK over the period 1960 2000,Supercomputing, Visualization & eScience,4,Interest Rates in the UK,Supercomputing, Visualization & eScience,5,UK GDP quarter
5、ly changes,Supercomputing, Visualization & eScience,6,The Model,Where y is the quarterly change in GDP and z is the quarterly change in interest rates,Supercomputing, Visualization & eScience,7,Before SAMD,Supercomputing, Visualization & eScience,8,e-Science Grid,Supercomputing, Visualization & eSci
6、ence,9,SAMD Methodology,We built a mini demonstrator grid for SAMD by: Grid-enabling the NS Time Series Databank Parallelising the code to represent the HPC facilities Using Grid protocols for data transfer Creating a graphical user interface that included a single sign-on It all worked, and cut the
7、 data collection and analysis time down to around 8 minutes.,Supercomputing, Visualization & eScience,10,Extending SAMD,The approach and methods of SAMD are applicable to more general social science applications involving data collection and analysis More efficient handling of datasets data is moved
8、 to where its needed, not just to web browser The single sign-on for all databanks means users can cross search datasets and perform cross analyses of multiple datasets from different providers Grants access to high performance computing facilities on the grid without the user having to learn how to
9、 use them Can automate routine enquiries Cuts the time taken to run computing intensive problems by a factor of around 100,Supercomputing, Visualization & eScience,11,Scaling up with the Grid,E-Science Grids allow the social scientist to scale up their quantitative research by: Including many more d
10、ata points in their analysis Developing more complex models incorporating more variables Dropping assumptions Visualising data Creating new communities and collaborations Exploring new types of analyses,SAMD Architecture,Supercomputing, Visualization & eScience,13,Motivation,Web-based access to soci
11、o-economic datasets such as Office of National Statistics Time series data has lead to greatly increased use, but:- No standard authentication or authorisation too many usernames and passwords to remember To automate search and retrieval, can only emulate navigation through “screen scraping“ breaks
12、whenever the interface is “improved“ discourages third party developments and periodic re-analysis Data must be downloaded and saved to local disk not necessarily the system on which subsequent analysis is to be performed inefficient, especially for large datasets,Supercomputing, Visualization & eSc
13、ience,14,The SAMD solution,Use Grid Security Infrastructure for “single sign-on“ authentication everywhere Modified standard Apache web server to accept proxy credentials Permits re-use of existing CGI code Use third party file transfers (grid-ftp) to move data directly to where its needed Use stand
14、ard globus mechanisms to Locate HPC facility for analysis Stage analysis binary from local repository and run analysis job on HPC facility Retrieve results,Supercomputing, Visualization & eScience,15,Architecture,Supercomputing, Visualization & eScience,16,Whats new?,Web interfaces to datasets? We s
15、how that there are more flexible ways of delivering access to data over the internet than through static web pages alone Single sign-on? We show that the domain of single sign-on can be much broader than provided by Athens Graphical User Interfaces? We show that its possible for a third party to dev
16、elop new tools independently of data providers A short script can encapsulate all the essential functionality of the SAMD GUIIntegration, Interoperability!,Supercomputing, Visualization & eScience,17,Whats needed?,Culture of StandardsIf key datasets are Grid-enabled in a commonly understood, well-do
17、cumented way, we create an environment in which third parties can develop tools and services that add real value by bringing together independent datasetsSAMD shows that such an environment is technically possible, but does not by itself establish any standard. Look to Web services, Grid services, O
18、GSA-DAI,SAMD User Interfaces,Supercomputing, Visualization & eScience,19,GUI: Single Sign-on,Panel located at the top left Uses X509 proxy certificates grid-proxy-init Creates your proxy credential grid-proxy-destroy Removes your proxy credential,Supercomputing, Visualization & eScience,20,GUI: Data
19、 Acquisition,The Interface to the SAMD-ONS web server, steps 1 to 8,Supercomputing, Visualization & eScience,21,Data Search,Search by Keyword 1 Request and Mutual Authentication using a proxy credential 2,3 Authorisation 4 Query Data Store,Supercomputing, Visualization & eScience,22,Data Request,Dat
20、a moved to GridFTP server 1: send references to data 1,2,3: authentication & authorisation 4: ask datastore to move data (5) 6,7: datastore returns XML ticket,Supercomputing, Visualization & eScience,23,Data Transfer,Data moved to HPC engine 8: third party file transfer from MIMAS to HPC engine, rea
21、dy for analysis,Supercomputing, Visualization & eScience,24,Finding an HPC Resource,GIIS MDS Server e.g. ginfo.grid-support.ac.ukSearch for: OS type eg: IRIX64 Minimum No. Processors Jobmanager or manually enter your favourite,Data Analysis panel,Supercomputing, Visualization & eScience,25,Select an
22、 executable on the local machine Stage job using Globus Check status using Globus Retrieve results using Globus Clean-up using Globus Even delete job using Globus,Data Analysis panel,Using the HPC Resource,Supercomputing, Visualization & eScience,26,Command line automation,Not everyone has the exper
23、tise or time to write a special- purpose GUI. Given a GSI-enabled web server and documented protocol to communicate with it, a few lines of shell script can do all the essential steps Use grid-proxy-init to sign on Use curl to talk https to the web server Use GridFTP to move data to the HPC engine Use globus-commands to (stage and) run executable. retrieve results and clean-up,Supercomputing, Visualization & eScience,27,Acknowledgments,Funded by the,and the,Keith Cole Celia Russell Marianne Sensier,Geoff Lane Tim Hateley,Mark Riding Kevin Roy,
copyright@ 2008-2019 麦多课文库(www.mydoc123.com)网站版权所有
备案/许可证编号:苏ICP备17064731号-1