Coauthor
  • GIRARD Paul (10)
  • OOGHE Benjamin (6)
  • JACOMY Mathieu (5)
  • GUIDO Daniele (3)
  • Show more
Document Type
  • Conference contribution (5)
  • Web site (3)
  • Conference proceedings (2)
  • Working paper (2)
  • Show more
This study provides a large-scale mapping of the French media space using digital methods to estimate political polarization and to study information circuits. We collect data about the production and circulation of online news stories in France over the course of one year, adopting a multi-layer perspective on the media ecosystem. We source our data from websites, Twitter and Facebook. We also identify a certain number of important structural features. A stochastic block model of the hyperlinks structure shows the systematic rejection of counter-informational press in a separate cluster which hardly receives any attention from the mainstream media. Counter-informational sub-spaces are also peripheral on the consumption side. We measure their respective audiences on Twitter and Facebook and do not observe a large discrepancy between both social networks, with counter-information space, far right and far left media gathering limited audiences. Finally, we also measure the ideological distribution of news stories using Twitter data, which also suggests that the French media landscape is quite balanced. We therefore conclude that the French media ecosystem does not suffer from the same level of polarization as the US media ecosystem. The comparison with the American situation also allows us to consolidate a result from studies on disinformation: the polarization of the journalistic space and the circulation of fake news are phenomena that only become more widespread when dominant and influential actors in the political or journalistic space spread topics and dubious content originally circulating in the fringe of the information space.

14
views

0
downloads
Information retrieval and record linkage have always relied on crafty and heuristical routines aimed at implementing what is often called fuzzy matching. Indeed, even if fuzzy logic feels natural to humans, one needs to find various strategies to coerce computersinto acknowledging that strings, for instance, are not always strictly delimited. But if some of those techniques, such as the Soundex phonetic algorithm invented at the beginning of the 20th century, are still well known and used, a lot of them were unfortunately lost to time. As such, theTalisman JavaScript library aims at being an archive of a wide variety of tech-niques that have been used throughout computer sciences’ history to perform fuzzy comparisons between words, names, sentences etc. Thus, even if Talisman obviously provides state-of-the-art functions that are still being used in an industrial context, it also aims at being a safe harbor for less known or clunkier techniques, for historical and archival purposes.

Unfolding the Multi-layered Structure of the French Mediascape

14
views

0
downloads
France started to compile statistics about its trade in 1716. The "Bureau de la Balance du Commerce" (Balance of Trade's Office) centralized local reports of imports/exports by commodities produced by french tax regions. Many statistical manuscript volumes produced by this process have been preserved in French archives. This communication will relate how and why we used network technologies to create a research instrument based on the transcriptions of those archives in the TOFLIT18 research project. Our corpus composed of more than 500k yearly trade transactions of one commodity between a French local tax region or a foreign country between 1718 and 1838. We used a graph database to modelize it as a trade network where trade flows are edges between trade partners. We will explain why we had to design a classification system to reduce the heterogeneity of the commodity names and how such a system introduce the need for hyperedges. Our research instruments aiming at providing exploratory data analysis means to researchers, we will present the web application we've built on top of the neo4j database using JavaScript technologies (Decypher, Express, React, Baobab, SigmaJS). We will finally show how graph model was not only a convenient way to store and query our data but also a poweful visual object to explore trade geographical structures and trade products' specialization patterns. Project funded by the French Agence Nationale de la Recherche (TOFLIT18)

Publication date 2018-10 Conferance name WS.2 2018 International conference on Web Studies, Paris, France — October 03 - 05, 2018
47
views

0
downloads
The emergence and success of web platforms raised a gimmick into social studies: “Hyperlink is dead!“. Capturing web users into mobile applications and private web platforms to propose them a specific user experience (and a business model) created indeed new silos in the open World Wide Web space. The simplified availability of user behavioural data through these platforms APIs reinforced this idea in academic communities by providing scholars with a rich and easy way to collect user centric data for their research. After discussing the methodological and ethical aspects of the web divide between platforms and classical websites, we will argue in this communication that hyperlinks, although more complex to collect, manipulate and apprehend, remain an invaluable matter to use the web as a research field. We will illustrate it using Hyphe, a dedicated web corpus creation tool we developed to mine hypertexts.

Publication date 2018-04 Conferance name Nantes JS
4
views

0
downloads
Depuis 9 ans, le médialab de Sciences Po crée des instruments de recherche a l'intention des Sciences Humaines et Sociales. Son ambition est de concevoir des instruments rendant des moyens de traitement complexes accessibles à des équipes de recherches ou des citoyens au travers d'Interfaces Homme-Machine. Nous retracerons dans cette présentation comment nous avons utilisé plusieurs générations de technologies web pour relever ce défi.

Hyphe, a web crawler for social scientists developed by the SciencesPo médialab, introduced the novel concept of web entities to provide a flexible and evolutive way of grouping web pages in situations where the notion of website is not relevant enough (either too large, for instance with Twitter accounts, newspaper articles or Wikipedia pages, or too constrained to group together multiple domains or TLDs...). This comes with technical challenges since indexing a graph of linked web entities as a dynamic layer based on a large number of URLs is not as straightforward as it may seem. We aim at providing the graph community with some feedback about the design of an on-file index - part Graph, part Trie - named the "Traph", to solve this peculiar use-case. Additionally we propose to retrace the path we followed, from an old Lucene index, to our experiments with Neo4j, and lastly to our conclusion that we needed to develop our own data structure in order to be able to scale up.

TOFLIT18 is a project dedicated to French trade statistics from 1716 to 1821. It combines a historical trade database that covers French external trade comprising more than 500,000 flows at the level of partners and individual products with a range of tools that allow the exploration of the material world of the Early Modern period. TOFLIT18 is the result of the collaboration of data scientists, economists and historians. It started as a project funded by the Agence Nationale de la Recherche in 2014. http://toflit18.hypotheses.org

Publication date 2016-07-16 Conferance name Digital Humanities Collection Digital Humanities 2016: Conference Abstracts
21
views

0
downloads
We present the RICardo data visualization application (http://ricardo.medialab.sciences-po.fr) designed to explore a XIXth century international trade statistics database. The tool offers 3 levels of exploration: a World trade level, a Country level detailing the commercial partners of a chosen country, and a Bilateral level revealing the differences in mirrored trade flows. We discuss the design choices made to provide an exploratory data analysis tool which respects and represents the data uncertainty and heterogeneity of our historical database. We briefly present the opportunities offered by our tool for researchers and students, and explain our own transdisciplinary research method combining skills from History, Economy, Information Technology and Information Design.

Bruno Latour wrote a book about philosophy (an inquiry into modes of existence). He decided that the paper book was no place for the numerous footnotes, documentation or glossary, instead giving access to all this information surrounding the book through a web application which would present itself as a reading companion. He also offered to the community of readers to submit their contributions to his inquiry by writing new documents to be added to the platform. The first version of our web application was built on PHP Yiii and MySQL on the server side. This soon proved to be a nightmare to maintain because of the ultra-relational nature of our data. We refactored it completely to use node.js and Neo4J. We went from a tree system with internal links modelized inside a relational database to a graph of paragraphs included into documents, subchapters etc. all sharing links between them. On the way, we've learned Neo4J thoroughly, from graph data modeling to cypher tricks and developped our custom cypher query graphical monitor using sigma.js in order to check our data trans-modeling consistency. During this journey, we've stumbled upon data model questions : ordered links, sub items grouping necessity, data output constraints from Neo4J, and finally the limitations of Neo4J community edition. Finally we feel much more confortable as developers in our new system. Reasoning about our data has become much easier and, moreover, our users are also happier since the platform's performance has never been better. Our intention is, therefore, to share our experience with the community: - our application's data needs - our shift from a MySQL data model to a Neo4J graph model - our feedbacks in using a graph database and more precisely Neo4J including our custom admin tool [Agent Smith](https://github.com/Yomguithereal/agent-smith) - a very quick description of the admin tools we built to let the researchers write or modify contents (a markdown web editor) The research has received funding from the European Research Council under the European Union’s Seventh Framework Programme (FP7/2007-2013) / erc Grant ‘IDEAS’ 2010 n° 269567” Authors : Guillaume Plique A graduate student from Sciences-Po Lille and Waseda University, Guillaume Plique now offers the médialab his backend development skills as well as his profile in social sciences. He has been working since June 2013 on several projects such as IPCC mapping, AIME and develops scrapers aimed at social sciences researchers. https://github.com/Yomguithereal Paul Girard Paul Girard is an Information Technology engineer specialized in driving collaborations between technology and non-technical domains. He graduated from the cultural industry engineering specialisation in Université de Technologie de Compiègne in 2004 where he studied the relationships between digital technologies and society and the mechanisms of collaborations. He worked in the research laboratories federation CITU (Paris 1 and Paris 8 universities) from 2005 to 2009 where he participated in research and creation projects, collaborations between artists and engineers working with interactivity, digital pictures, virtual and augmented reality. He joined the médialab laboratory at Sciences Po at its foundation during the spring of 2009, as the digital manager of this digital research laboratory dedicated to fostering the use of digital methods and tools in Social Sciences. Since then he oversees the technical direction of the many research projects as collaborations between social sciences, knowledge engineering and information design. His present research fields are digital methods for social sciences, exploratory data analysis and enhanced publication though digital story telling. https://github.com/paulgirard Daniele Guido Daniele Guido is a visual interaction designer interested in data mining applications, text analysis and network tools. He collaborates with researchers in History and Social Science, designers and engineer to conceive and develop digital tools for the humanities. He recently joined the DIgital Humanities lab at CVCE team in Luxembourg after several years working at the Sciences-Po Medialab team in Paris, where he was engaged in the FORCCAST project (forccast.hypotheses.org) and in the AIME project (modesofexistence.org) https://github.com/danieleguido

Next