• GIRARD Paul (10)
  • OOGHE Benjamin (5)
  • JACOMY Mathieu (5)
  • GUIDO Daniele (3)
  • Show more
Document Type
  • Conference contribution (5)
  • Web site (3)
  • Conference proceedings (2)
  • Working paper (1)
Unfolding the Multi-layered Structure of the French Mediascape


France started to compile statistics about its trade in 1716. The "Bureau de la Balance du Commerce" (Balance of Trade's Office) centralized local reports of imports/exports by commodities produced by french tax regions. Many statistical manuscript volumes produced by this process have been preserved in French archives. This communication will relate how and why we used network technologies to create a research instrument based on the transcriptions of those archives in the TOFLIT18 research project. Our corpus composed of more than 500k yearly trade transactions of one commodity between a French local tax region or a foreign country between 1718 and 1838. We used a graph database to modelize it as a trade network where trade flows are edges between trade partners. We will explain why we had to design a classification system to reduce the heterogeneity of the commodity names and how such a system introduce the need for hyperedges. Our research instruments aiming at providing exploratory data analysis means to researchers, we will present the web application we've built on top of the neo4j database using JavaScript technologies (Decypher, Express, React, Baobab, SigmaJS). We will finally show how graph model was not only a convenient way to store and query our data but also a poweful visual object to explore trade geographical structures and trade products' specialization patterns. Project funded by the French Agence Nationale de la Recherche (TOFLIT18)

Publication date 2018-10 Conferance name WS.2 2018 International conference on Web Studies, Paris, France — October 03 - 05, 2018

The emergence and success of web platforms raised a gimmick into social studies: “Hyperlink is dead!“. Capturing web users into mobile applications and private web platforms to propose them a specific user experience (and a business model) created indeed new silos in the open World Wide Web space. The simplified availability of user behavioural data through these platforms APIs reinforced this idea in academic communities by providing scholars with a rich and easy way to collect user centric data for their research. After discussing the methodological and ethical aspects of the web divide between platforms and classical websites, we will argue in this communication that hyperlinks, although more complex to collect, manipulate and apprehend, remain an invaluable matter to use the web as a research field. We will illustrate it using Hyphe, a dedicated web corpus creation tool we developed to mine hypertexts.

Publication date 2018-04 Conferance name Nantes JS

Depuis 9 ans, le médialab de Sciences Po crée des instruments de recherche a l'intention des Sciences Humaines et Sociales. Son ambition est de concevoir des instruments rendant des moyens de traitement complexes accessibles à des équipes de recherches ou des citoyens au travers d'Interfaces Homme-Machine. Nous retracerons dans cette présentation comment nous avons utilisé plusieurs générations de technologies web pour relever ce défi.

Hyphe, a web crawler for social scientists developed by the SciencesPo médialab, introduced the novel concept of web entities to provide a flexible and evolutive way of grouping web pages in situations where the notion of website is not relevant enough (either too large, for instance with Twitter accounts, newspaper articles or Wikipedia pages, or too constrained to group together multiple domains or TLDs...). This comes with technical challenges since indexing a graph of linked web entities as a dynamic layer based on a large number of URLs is not as straightforward as it may seem. We aim at providing the graph community with some feedback about the design of an on-file index - part Graph, part Trie - named the "Traph", to solve this peculiar use-case. Additionally we propose to retrace the path we followed, from an old Lucene index, to our experiments with Neo4j, and lastly to our conclusion that we needed to develop our own data structure in order to be able to scale up.

TOFLIT18 is a project dedicated to French trade statistics from 1716 to 1821. It combines a historical trade database that covers French external trade comprising more than 500,000 flows at the level of partners and individual products with a range of tools that allow the exploration of the material world of the Early Modern period. TOFLIT18 is the result of the collaboration of data scientists, economists and historians. It started as a project funded by the Agence Nationale de la Recherche in 2014.

Publication date 2016-07-16 Conferance name Digital Humanities Collection Digital Humanities 2016: Conference Abstracts

We present the RICardo data visualization application ( designed to explore a XIXth century international trade statistics database. The tool offers 3 levels of exploration: a World trade level, a Country level detailing the commercial partners of a chosen country, and a Bilateral level revealing the differences in mirrored trade flows. We discuss the design choices made to provide an exploratory data analysis tool which respects and represents the data uncertainty and heterogeneity of our historical database. We briefly present the opportunities offered by our tool for researchers and students, and explain our own transdisciplinary research method combining skills from History, Economy, Information Technology and Information Design.

Bruno Latour wrote a book about philosophy (an inquiry into modes of existence). He decided that the paper book was no place for the numerous footnotes, documentation or glossary, instead giving access to all this information surrounding the book through a web application which would present itself as a reading companion. He also offered to the community of readers to submit their contributions to his inquiry by writing new documents to be added to the platform. The first version of our web application was built on PHP Yiii and MySQL on the server side. This soon proved to be a nightmare to maintain because of the ultra-relational nature of our data. We refactored it completely to use node.js and Neo4J. We went from a tree system with internal links modelized inside a relational database to a graph of paragraphs included into documents, subchapters etc. all sharing links between them. On the way, we've learned Neo4J thoroughly, from graph data modeling to cypher tricks and developped our custom cypher query graphical monitor using sigma.js in order to check our data trans-modeling consistency. During this journey, we've stumbled upon data model questions : ordered links, sub items grouping necessity, data output constraints from Neo4J, and finally the limitations of Neo4J community edition. Finally we feel much more confortable as developers in our new system. Reasoning about our data has become much easier and, moreover, our users are also happier since the platform's performance has never been better. Our intention is, therefore, to share our experience with the community: - our application's data needs - our shift from a MySQL data model to a Neo4J graph model - our feedbacks in using a graph database and more precisely Neo4J including our custom admin tool [Agent Smith]( - a very quick description of the admin tools we built to let the researchers write or modify contents (a markdown web editor) The research has received funding from the European Research Council under the European Union’s Seventh Framework Programme (FP7/2007-2013) / erc Grant ‘IDEAS’ 2010 n° 269567” Authors : Guillaume Plique A graduate student from Sciences-Po Lille and Waseda University, Guillaume Plique now offers the médialab his backend development skills as well as his profile in social sciences. He has been working since June 2013 on several projects such as IPCC mapping, AIME and develops scrapers aimed at social sciences researchers. Paul Girard Paul Girard is an Information Technology engineer specialized in driving collaborations between technology and non-technical domains. He graduated from the cultural industry engineering specialisation in Université de Technologie de Compiègne in 2004 where he studied the relationships between digital technologies and society and the mechanisms of collaborations. He worked in the research laboratories federation CITU (Paris 1 and Paris 8 universities) from 2005 to 2009 where he participated in research and creation projects, collaborations between artists and engineers working with interactivity, digital pictures, virtual and augmented reality. He joined the médialab laboratory at Sciences Po at its foundation during the spring of 2009, as the digital manager of this digital research laboratory dedicated to fostering the use of digital methods and tools in Social Sciences. Since then he oversees the technical direction of the many research projects as collaborations between social sciences, knowledge engineering and information design. His present research fields are digital methods for social sciences, exploratory data analysis and enhanced publication though digital story telling. Daniele Guido Daniele Guido is a visual interaction designer interested in data mining applications, text analysis and network tools. He collaborates with researchers in History and Social Science, designers and engineer to conceive and develop digital tools for the humanities. He recently joined the DIgital Humanities lab at CVCE team in Luxembourg after several years working at the Sciences-Po Medialab team in Paris, where he was engaged in the FORCCAST project ( and in the AIME project (

More and more people work with graphs nowadays, but it is not always easy to publish and share the graph interpretation on the web. Manylines is a web tool built at Sciences Po médialab to solve this issue. Some researchers and students use network visualizations to explore their data, but networks are not as clear as maps and sharing one’s interpretation is difficult. Manylines main innovation is to allow the user toexplain and share a narrative about his network: an interactive story where each “slide” is a particular zoom, pan and filtering of the network, completed by a title and description, with fluid transitions like in Prezi. Published as an open source prototype with the source code available on GitHub, Manylines is currently built around three screens: – The first screen allows applying the ForceAtlas2 layout to the network, in order to settle the “basemap”: the definitive positions of nodes and edges used to support further interpretations. – The second screen allows zooming and filtering the network to explore the data and“take snapshots” representing different insights. – The third screen allows composing narratives by building a series of snapshots, adding a title and short description for each step. The result is an interactive slideshow widget where the user’s exploration of the network is guided step by step, revealing the key interpretation points one by one. Manylines is a single webpage app built for HTML5 browsers. It makes an extensive use of the sigma.js library to deal with networks within the browser and it implements different features inspired from Gephi, the reference desktop graph viz platform. The WebGL visualization (with Canvas fallback) implemented by Sigma.js allows great performance for networks up to 1000 nodes on an average computer. To reach this level of performance, we optimized the javascript version of the ForceAtlas2 algorithm used by sigma.js. We ported it to use web workers and we optimized the Barnes-Hut quadtree approximation in this context by implementing it as an iterative and not recursive process. We made an extensive use of sigma.js’ custom renderers and cameras to build dynamic graph thumbnails, snapshots and widgets. The server side stores the networks, snapshots and narrative data in a Couchbase database(which we discovered in FOSDEM 2014) accessed by a Node.js express REST API.

Publication date 2014-10
GUIDO Daniele
ROGERS Richard
MUNK Anders Kristian

This website presents the results of the EU research project EMAPS, as well as its process: an experiment to use computation and visualization to harness the increasing availability of digital data and mobilize it for public debate. To do so, EMAPS gathered a team of social and data scientists, climate experts and information designers. It also reached out beyond the walls of Academia and engaged with the actors of the climate debate.