Skip to content

Architecture & Pipeline

PERLA's architecture is designed for scalable, automated extraction and validation of perovskite solar cell data from scientific literature. The system operates as a continuous pipeline that monitors publications, extracts data, validates findings, and updates the living database.

System Overview

The pipeline consists of several interconnected components that work together to maintain a continuously updated, high-quality data archive.

Core Components

1. Literature Monitoring (perla-extract)

  • Monitors major publishers
  • Filters publications based on perovskite-related keywords
  • Handles full-text access and PDF retrieval

2. Data Extraction Engine (perla-extract)

  • Extracts text from PDFs from papers
  • Uses LLMs for text analysis
  • Maintains publication metadata and provenance
  • Extracts device parameters, experimental conditions, and performance metrics
  • Validates extracted parameters against known physical constraints

3. Database Architecture (NOMAD plugin)(nomad-perovskite-solar-cells-database)

  • Built on the NOMAD platform
  • Follows NOMAD schema standards for perovskite solar cells
  • Uses existing tooling to, for example, process ions
  • Provides integration with broader materials science ecosystem

4. Web Portal and API Access Layer (PERLA in NOMAD)

  • RESTful API for data queries and retrieval
  • Web-based search and visualization tools
  • Interactive data exploration capabilities