TY - GEN
T1 - Source selection in large scale data contexts
T2 - 21st International Conference on Database and Expert Systems Applications, DEXA 2010
AU - Pomares, Alexandra
AU - Roncancio, Claudia
AU - Cung, Van Dat
AU - Abásolo, José
AU - Villamil, María Del Pilar
N1 - Funding Information:
This research was supported by the project Ecos-Colciencias C06M02.
PY - 2010
Y1 - 2010
N2 - This paper presents OptiSource, a novel approach of source selection that reduces the number of data sources accessed during query evaluation in large scale distributed data contexts. These contexts are typical of large scale Virtual Organizations (VO) where autonomous organizations share data about a group of domain concepts (e.g. patient, gene). The instances of such concepts are constructed from non-disjointed fragments provided by several local data sources. Such sources overlap in a non mastered way making data location uncertain. This fact, in addition to the absence of reliable statistics on source contents and the large number of sources, make current proposals unsuitable in terms of response quality and/or response time. OptiSource optimizes source selection by taking advantage of organizational aspects of VOs to predict the benefit of using a source. It uses an optimization model to distinguish the sets of sources that maximize benefits and minimize the number of sources to contact to while satisfying resource constraints. The precision and recall of source selection is highly improved as demonstrated by the tests performed with the OptiSource prototype.
AB - This paper presents OptiSource, a novel approach of source selection that reduces the number of data sources accessed during query evaluation in large scale distributed data contexts. These contexts are typical of large scale Virtual Organizations (VO) where autonomous organizations share data about a group of domain concepts (e.g. patient, gene). The instances of such concepts are constructed from non-disjointed fragments provided by several local data sources. Such sources overlap in a non mastered way making data location uncertain. This fact, in addition to the absence of reliable statistics on source contents and the large number of sources, make current proposals unsuitable in terms of response quality and/or response time. OptiSource optimizes source selection by taking advantage of organizational aspects of VOs to predict the benefit of using a source. It uses an optimization model to distinguish the sets of sources that maximize benefits and minimize the number of sources to contact to while satisfying resource constraints. The precision and recall of source selection is highly improved as demonstrated by the tests performed with the OptiSource prototype.
KW - Combinatorial Optimization
KW - Large Scale Data Mediation
KW - Source Selection
UR - http://www.scopus.com/inward/record.url?scp=78049358028&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-15364-8_4
DO - 10.1007/978-3-642-15364-8_4
M3 - Conference contribution
AN - SCOPUS:78049358028
SN - 3642153631
SN - 9783642153631
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 46
EP - 61
BT - Database and Expert Systems Applications - 21st International Conference, DEXA 2010, Proceedings
Y2 - 30 August 2010 through 3 September 2010
ER -