Abstract
The The Lifecycle of Geotagged Social Media Data tutorial covers the four stages that are part of the lifecycle of geotagged social media data in research, namely representing, processing, analyzing, and visualizing. The tutorial aims to arm participants with both theoretical and practical knowledge about how to make sense of geospatial data for use in applications that range from computational social science and social media analysis to behavioral studies on digital platforms. We provide the basics on how to obtain, represent and combine different spatial data sources, with an accent on how to efficiently store, index and query a location-based dataset. We further discuss the main techniques on how to derive insights from spatial data, how to avoid common pitfalls and how to exploit social media (e.g. user interests, user movements) for the purpose of gaining a deeper understanding of the phenomenon under study. The tutorial will end with an overview of the main libraries and paradigms to build interactive and dynamic visualizations of geographical data on a map.
Modules
Represent
We present common ways of representing geographic data, from simple coordinates to polygons with holes, and using different data formats. We discuss several platforms that offer geotagged social media and show how their APIs can be used to obtain this data. We further illustrate how geotagged data can be used to acquire auxiliary information, such as historic weather conditions, that complements the original data.
Analyze
We cover a variety of techniques you can use to derive actionable insights from geotagged social media data. We present methods like clustering, predicting and recommending. We show you how geotagged data differs from traditional data and often requires special considerations in order to obtain reliable output.
Process
We cover techniques for storing spatial data and use spatial primitives, such as the distance between geographic coordinates, computing areas of and overlaps between polygons, and how the data representation influences which techniques should (not) be used. We discuss different strategies for efficiently storing, reading and querying geotagged data, such as indexes, as well as for efficiently processing the data.
Visualize
The world is not flat, so visualizing geographic data is not straightforward. We present you several tools that can assist you to better understand your data. A hands-on session will let you understand how to effectively use the right tool at the right time to maximize the knowledge you can extract from the data.
Syllabus
Represent
- Geometries
- Coordinates
- Boundaries
- Single polygons
- Multiple polygons
- Holes
- Data formats
- Shapefiles
- GeoJSON
- OpenStreetMap
- WKT
- Applications
- Projections
- Examples
- Distorsion
- World
- Plane
- Equirectangular projection
- Shortest distance
- Euclidean
- Sphere
- Geodesics
- Shortest distance
- Great circle
- Spherical law of cosines
- Haversine
- Ellipsoid
- Geoid
- Reference datum
- Geodetic datums
- Global datums
- Local Datum
- Conversions
- Shortest distance
- Vincenty
- Karney
- Plane
- Pitfalls
- Dateline crossing
- Timezones
- GPS errors and accuracy
- Obtaining data
- Flickr
- Weather
- Exercises
Analyze
- Introduction to analyzing geotagged social media data
- What is so special about geographic data?
- Clustering
- Hard clustering
- Soft clustering
- Density modeling
- Mixture Models
- Kernel density estimation
- Points and Polygons
- Point-in-polygon
- Region boundaries
- Thresholding
- Voronoi
- Region connection calculus
- Equivalence
- Adjacency
- Proximity
- Overlap
- Containment
- Space and Time
- Incompatibility of dimensions
- Spatial Modeling
- Attributes
- Features
- Topics
- Languages
- Temporal modeling
- Trajectories
- Evolution
- Applications
- Similarity
- Search
- Ranking
- Recommendation
- Prediction
- Exercises
Process
- Introduction to spatial databases
- PostGIS
- Creating a spatial database
- Loading external spatial data
- Batch insert of textual data (WKT)
- Load ESRI shapefiles
- Load OpenStreetMap data
- Querying spatial databases
- Geometries
- Geography
- Spatial functions
- Constructors
- Outputs
- Accessors
- Measurement
- Decomposition
- Composition
- Simplification
- Spatial Relationships
- Intersections and differences
- Intersect relationship types
- Equality
- Nearest-Neighbour Searching
- Spatial Indexing
- Spatial Joins
- Projecting Data
- Python for Spatial Computation
- Connect a PostGIS database to a Python script
- Create and manipulate geometries with Shapely
- Load external data
- Load ESRI shapefiles with Fiona
- Load textual data in WKT
- Spatial functions
- Indexing and prepared geometries
- Using PostGIS in a desktop environment (introduction)
- Spatial data on large-scale computational framework (introduction)
- Exercises
Visualize
- Build interactive web maps
- Tile based vs vectorial maps
- Client-side Javascript libraries: Leaflet, OpenLayers 3
- Create a simple map
- Load GeoJSON data in Leaflet
- Show Points
- Customize markers
- Clustering
- Heatmaps
- Show Linestrings
- Show Polygons
- Choropleth Map
- Deal with a large amount of polygons
- Show Points
- Compose different geometries in layers
- Create data-driven custom style
- Add interactivity
- Link d3.js to spatial visualizations
- Desktop tools to visualize spatial data
- Introduction to QGIS
- The little Java brother: OpenJUMP
- Build static spatial visualizations programmatically
- Visualize spatial data in Python
- Exercises
Material
The tutorial material will be provided as a virtual machine where we set up the environment and exercises we will present during the practical sessions.
The virtual machine has been created with the general-purpose full virtualizer VirtualBox.
To be ready for the tutorial, follow these steps:
1. Download the Virtual Box installation package for your platform at this link.
2. Install Virtual Box (follow these instructions for more details)
3. Download the GeoCycle virtual machine at this link.
4. Import the virtual machine launching VirtualBox and running the command "File -> Import Appliance..." from the main menu.
5. Start the virtual machine called geocycle.
Slides:
1. Represent
2. Process
3. Analyze
4. Visualize
Additional material:
Exercises