Project Objective
This project aims to use machine-based classification of text to build a searchable repository of papers, websites, training materials, and other useful knowledge for sustainable development.
This search engine will include content from across the Internet originating from credible organizations from academia, governments and private companies.
Unlike the leading search engines in the market which seek to aggregate as much content as possible, this one aims to provide a machine-curated result set from reputable sources.
Useful resources
- Common Crawl, an open dataset of web crawl data that can be accessed and analyzed by anyone.
- Common Search, a nonprofit search engine for the Web.
- Spyglass, a simple search results front-end for Apache Solr using EmberJS.
Project Team
Data acquisition
Ms. Sara Crouse — Common Crawl Foundation
Director Common Crawl Foundation
Sebastian Nagel
Crawl Engineer & Data Scientist
Data Analytics
W. “RP” Raghupathi, Ph.D- Fordham University
Professor of Information Systems
Director, MS in Business Analytics Program
Director, Center for Digital Transformation
Gabelli School of Business
Prof. Yilu Zhou, PhD - Fordham University
Associate Professor
Information Systems
Gabelli School of Business
Front-end web development
Roberto González Ibáñez, PhD
Assistant Professor
Departamento de Ingeniería Informática
Universidad de Santiago de Chile
Carolina Vásquez
Computer Engineering Student - Estudiante de Ingeniería Civil Informática