URLs

https://neo4j.com/docs/developer-manual/current/cypher/functions/aggregating/
7th SDA workshop, Viana do Castelo, October 18-20, 2018 Slides

Data aggregation

RC1. Develop a generalized framework for the analysis of complex data, producing more detailed/precise results from aggregated data than traditional methods.

RC2. Develop common data representation models to allow for integrated approaches in the analysis of aggregate, complex data, and for the use of different software programs for the data analysis.

RC3. Address the issues of data privacy, namely on the internet, and in official statistics, by proposing criteria for data aggregation and subsequent analysis, and provide the resulting methodology to relevant stakeholders, in particular National Statistical Offices (NSO’s) and industrial partners.

RC4. Develop visual analytics tools for aggregated complex data, including symbolic data, compositional data, and functional data.

RC5. Develop appropriate methodology for the aggregation of large surveys and their analysis at macro level, as well as for the combination of independent surveys, only possible at the aggregate level, and provide the resulting methodology to relevant stakeholders, e.g. NSO’s.

RC6. Develop appropriate methodology, relying on different combined approaches, for the aggregation and analysis of sensor data, internet flow data, complex network data, and for the spatial and/or temporal analysis of complex data, appearing in the form of data streams, symbolic data, and compositional data.

open big data sources (The GDELT Project, Bike Share Data Systems, Bureau of Transportation Statistics, The European Social Survey (ESS), etc.) and provide tools to transform them into rich data frames or networks.

The objectives are the elaboration of the concept of complex data and theoretical discussion of the role and properties of the aggregation process as a way for obtaining complex data. Tasks: T1-Identification of data summarization/aggregation models and study of their properties. T2- Definition of criteria for data aggregation.T3-Extension of the collection of complex data types and foundations of complex data analysis.

ABCData aims at defining criteria and guidelines concerning methodology for the analysis of complex data, addressing the questions of which approaches should be considered in a particular problem at hand, and how to combine different methods. Moreover, focus will be put on the process of data aggregation to define criteria, both on the form of aggregation and on the granularity. This may be approached in two ways. One is to try to capture as much information in the original dataset as possible, while restricting to a fixed number of elements in the aggregate (complex) dataset. This is a matter of minimising within-symbol variability and maximising between symbol variability. The other approach is to do that, but specifically focusing on a given model, aiming to achieve certain characteristics of model parameters (unbiasedness, highest precision etc.) or predictive qualities. Existing approaches are certainly sub-optimal in this respect, and in this Action effort will be put in defining appropriate criteria and designing more efficient aggregation techniques.

For storage and exchange of such data we will develop a special JSON based format RDF - rich data format. We will establish a repository of rich data sets as a test bed for the developed rich data analysis methods. The newly developed methods will be implemented in R, Python or Julia. The format RDF will also enable the interoperability of the developed software.

An overview of the main ideas and results in this field.

Overview of complex data types and aggregation models; proposal of a complex data framework (a theoretical basis for RDF); Overview of the literature and study of data aggregation process; A workshop on this topic after first 6 months; Deliverable: a report; T2-On the basis of T1 recommendations for data aggregation will be prepared; Deliverable: a document with recommendations; T3-Research on complex data types and approaches to complex data analysis;

The temporal and spatial dimensions may also be included. For example, an analysis of traffic patterns in a network with aggregated information about the bike rides between stations in a bike sharing system (such us Santander C. London, Citi Bike New York, Capital Washington DC). Another example is a regionalization problem: clustering of complex data (for example describing population pyramids for NUTS or world countries such that the obtained clusters form contiguous regions for units neighbouring relation – the network is determining a relational constraint. This is only the tip of the iceberg of possibilities. How to present and visualize complex data together with relations (network links) as well as how to explore such data are the questions that are first to be tackled. Summaries of networks, such as density or centrality measures, are currently used to compare and evaluate different networks and must be adapted to networks based on complex data.

References

Adamson, Christopher: Mastering Data Warehouse Aggregates: Solutions for Star Schema Performance. Wiley, Indianapolis 2006.
Aleskerov, Fuad: Arrovian Aggregation Models. Theory and Decision Library 39. Springer 1999.
Appice, Annalisa; Anna Ciampi, Fabio Fumarola, Donato Malerba (auth.): Data Mining Techniques in Sensor Networks: Summarization, Interpolation and Surveillance. SpringerBriefs in Computer Science
Banaian, King; Roberts, Bryan: The Design and Use of Political Economy Indicators: Challenges of Definition, Aggregation, and Application. Palgrave Macmillan, 2008.
Banisch, Sven: Markov Chain Aggregation for Agent-Based Models. Understanding Complex Systems. Springer 2016.
Barnett, William A.; Chauvet, Marcelle: Financial Aggregation and Index Number Theory. World Scientific 2011.
Beliakov, Gleb; Pradera, Ana; Calvo, Tomasa: Aggregation Functions. Springer 2007.
Bärmann, Andreas: Solving Network Design Problems via Decomposition, Aggregation and Approximation. Springer 2016.
Bouchon-Meunier, Bernadette: Aggregation and Fusion of Imperfect Information. Studies in Fuzziness and Soft Computing 12. Springer, Berlin, Heidelberg 1998.
Bustince, Humberto; Fernandez, Javier; Mesiar, Radko; Calvo, Tomasas (eds.):Aggregation Functions in Theory and in Practise: Proceedings of the 7th International Summer School on Aggregation Operators at the Public University of Navarra, Pamplona, Spain, July 16-20, 2013. Advances in Intelligent Systems and Computing 228. Springer 2013.
Calvo, Tomasa; Mayor, Gaspar; Mesiar, Radko (eds.): Aggregation Operators: New Trends and Applications. Studies in Fuzziness and Soft Computing 97. Springer 2012.
Fare, Rolf; Grosskopf, Shawna; Primont, Daniel: Aggregation, Efficiency, and Measurement. Studies in Productivity and Efficiency.
Froeschl, Karl A.: Metadata Management in Statistical Information Processing: A Unified Framework for Metadata-Based Processing of Statistical Data Aggregates. Springer, Wien 1997.
Grabisch, M.; Marichal, J.-L.; Mesiar, R.; Pap, E.: Aggregation Functions. Encyclopedia of Mathematics and its Applications 127. Cambridge UP 2009.
James, Simon: An Introduction to Data Analysis using Aggregation Functions in R. Springer 2016.
Marcu, Daniel: The Theory and Practice of Discourse Parsing and Summarization. Bradford Books
Mirkin, Boris: Core Concepts in Data Analysis: Summarization, Correlation and Visualization. Undergraduate Topics in Computer Science
Moreno, Juan Manuel Torres: Automatic Text Summarization. ISTE
Nenkova, A., McKeown, K.: Automatic Summarization
Oliver, Dev: Spatial Network Data: Concepts and Techniques for Summarization. SpringerBriefs in Computer Science
Pervozvanskii, A.A.; Gaitsgori, V.G.: Theory of Suboptimal Decisions: Decomposition and Aggregation. Mathematics and Its Applications(Soviet Series) 12. Kluwer, Dordrecht 1988.
Poibeau, Thierry; Horacio Saggion, Jakub Piskorski, Roman Yangarber (eds.): Multi-source, Multilingual Information Extraction and Summarization. Theory and Applications of Natural Language Processing
Sanchez, Daniela; Melin, Patricia: Hierarchical Modular Granular Neural Networks with Fuzzy Aggregation. Springer 2016.
Sen, Jaydip: Secure and Privacy-Preserving Data Aggregation Protocols for Wireless Sensor Networks. in Cryptography and Security in Computing, Jaydip Sen, ed., IntechOpen, Rijeka 2012, arXiv.org > cs > arXiv:1203.1175.
Tanguiane, Andranick S.: Aggregation and Representation of Preferences: Introduction to Mathematical Theory of Democracy.
Torra, Vicenç; Narukawa, Yasuo: Modeling Decisions: Information Fusion and Aggregation Operators. Cognitive Technologies. Springer 2007.
United Nations: National Accounts Statistics: Analysis of Main Aggregates, 2006.
Van Daal, J., Merkies, A.H.Q.M.: Aggregation in Economic Research: From Individual to Macro Relations. Theory and Decision Library 41
Xu, Zeshui; Cai, Xiaoqiang: Intuitionistic Fuzzy Information Aggregation: Theory and Applications. Science Press Beijing and Springer-Verlag Berlin Heidelberg 2012.
Xu, Zeshui: Intuitionistic Fuzzy Aggregation and Clustering. Springer 2013.

Aczél, János: Aggregating clones, colors, equations, iterates, numbers, and tiles. Birkhauser Verlag, Basel 1995.