Skip to main content

Lunar components

Components are distinguishable units of work that when combined, create workflows. Components enable the encapsulation of workflow logic into observable units that can be reused across workflows and sub-workflows.

Components can encapsulate any discrete task in a process, from textual input or file reading operations, to field selection, database and API querying or more complex AI/ML models.

The following is a list of pre-defined Lunar components, together with their descriptions.

Component nameComponent description
Arxiv ExtractorExtracts titles, authors, and latex code from Arxiv papers.
Audio PlayerPlays an audio file in base64 format.
Audio2Base64Converts an audio file to a base64 string.
Azure Open AI PromptConnects to Azure OpenAI's API, runs a natural language prompt, and outputs the generated text.
Azure Open AI VectorizerEncodes input texts as numerical vectors (embeddings) using Azure OpenAI models. Outputs embeddings for each input text.
Bar chartCreates a bar chart visualization based on numerical data, useful for representing categorical data comparisons.
Bing SearchSearches data using the Bing Search API. Returns relevant search results.
CAUCConverts CAUC data to SQL format.
CIViCSearches the CIViC database for clinical variant data and gene information relevant to cancer.
Capag2sqlConverts CAPAG data to SQL format.
Causal Discovery Algorithms with a LLMRuns causal discovery algorithms with the help of an LLM to apply different methods. Outputs include SEM object results and agent output.
Causal Graph Discovery with LLMRuns causal graph discovery using an LLM and Wikipedia to identify causal relations between variables. Outputs include the causal graph and a log of the process.
Causal Inference with a LLMUses an LLM to run causal inference methods (DoWhy and CausalPy), generating results and a log of the step-by-step reasoning.
Causal Structural Equation Model Refinement with LLMUses SemoPy to refine and interpret an initial SEM with the help of an LLM.
CausalGraphViewerDisplays JSON serializable graph (node-link format).
Csv UploadReads a CSV file with a header and extracts the content for further analysis or use in other components.
Csv ViewerDisplays the contents of a CSV file for easy viewing, providing a structured and organized layout.
Csv2sqlConverts CSV data to SQL format.
Cytoscape VisualizerReceives a Cytoscape formatted JSON and creates a graph visualization.
Elasticsearch clientQueries Elasticsearch for data, supporting advanced search and filtering operations.
Elasticsearch storeStores structured data in an Elasticsearch instance for future retrieval and indexing, enabling search functionalities.
Emails SenderSends emails.
Excel2TextConverts the content of an Excel file into plain text, enabling extraction and further manipulation of data.
File UploadUploads local files to the server.
Finance2sqlConverts finance data to SQL format.
Finance_api2sqlConverts finance data from API to SQL format.
Gemini AI promptConnects to Gemini’s API to run natural language prompts and retrieve AI-generated responses.
Gene Set UploadReads a CSV file containing gene data and outputs a list of gene names for downstream analysis.
GraphQL QueryFetches data from a GraphQL endpoint. Outputs the response for the query in JSON format.
HTML Reports BuilderGenerates custom HTML reports based on Jinja2 templates, allowing dynamic content rendering for web reports or dashboards.
Htmls2TextsConverts HTML content into plain text, stripping away tags and retaining the meaningful text content.
HuggingFace vectorizerEncodes texts into embeddings using HuggingFace models. Outputs the original text and corresponding embeddings.
Indra Network AssemblerRetrieves scientific literature relevant to a given set of genes, assisting in building gene interaction networks.
Inep2sqlConverts INEP data to SQL format.
JSON InputAllows the input of a JSON text (potentially with template variables) that can then be used in other downstream components. It can also be used as an output if useful.
Kitai downloadDownloads converted sound files using Kitai.
Kitai modelsRetrieves available voice models from Kitai.
Kitai splitterSplits sound files using Kitai.
Kitai splitter_downloadDownloads split sound files using Kitai.
KitaiVoice AI component for sound files and voice models.
LLM promptConnects to a language model API to process natural language prompts and returns text-based responses.
Latex CleanerCleans up Latex codes by removing comments and expanding restatables.
Latex Statements ExtractorExtracts definitions, axioms, lemmas, theorems, and corollaries from Latex.
Latex2HTMLConverts Latex codes to HTML with Mathjax.
Line chartPlots a line chart from numerical data. Outputs the chart as an encoded image and the original input data.
List Index GetterExtracts elements at given indices from a list.
LlamaIndex IndexingIndexes documents using Azure OpenAI models within LlamaIndex, allowing efficient document retrieval and query-based interaction.
LlamaIndex QueryingQueries data from an index built with LlamaIndex, supporting custom retrieval configurations and response formatting.
Lyrics GeneratorGenerates song lyrics from an inputted theme using Azure OpenAI's API.
Milvus RetrieverRetrieves embeddings from a Milvus server for similarity-based searches.
Milvus VectorstoreStores embeddings on a Milvus server. Outputs the number of stored embeddings.
NCI ThesaurusRetrieves biomedical terminology and data from the NCI Thesaurus using SPARQL queries.
Natural language to SQL QueryConverts a natural language query into an SQL query based on a given data definition schema. Outputs the generated SQL query.
NeXtProtRetrieves data from neXtProt, a comprehensive resource focused on human proteins, including their functions, localization, expression, interactions, and disease relevance.
Online Spreadsheet IOReads from and writes data to an online spreadsheet, supporting collaborative data management.
Online SpreadsheetDownloads and retrieves the content from an online spreadsheet for further processing or analysis.
OpenAI promptConnects to OpenAI's API, runs natural language prompts, and outputs the result as text.
OpenAI vectorizerEncodes text into vectors using OpenAI models for applications like search, clustering, and text analysis.
PDFExtractorExtracts title, sections, references, tables, and text from PDF files.
PROGENyProvides data on pathway-target gene interactions with weighted significance for each interaction.
Paper Database BuilderBuilds a JSON with data of scientific papers.
Picture ExtractorExtracts text, including mathematical formulas, from images. Useful for digitizing content from photos or scanned documents.
Property GetterExtracts the value of a specified key or attribute from an object or data structure, enabling easy access to nested properties.
Property SelectorRetrieves the values of specific properties (keys) from a dictionary.
Pubmed SearcherSearches for biomedical literature on PubMed using keyword queries and filters for year and page length.
Python coderExecutes Python code and returns the result, allowing the use of custom Python scripts.
R coderExecutes R code and returns the result, enabling integration of R scripts into workflows.
RangeGenerate a sequence of numbers, starting from 0 by default, and increments by 1 (by default), and stops before a specified number.
ReACT AgentImplements ReACT logic, which combines reasoning and acting for enhanced decision-making.
Reaper ControllerControls Reaper DAW by natural language commands.
Remf2sqlConverts REMF data to SQL format.
ReportCreates customizable reports from input data, allowing users to format and edit reports.
SBGN VisualizerVisualizes biological pathways using the Systems Biology Graphical Notation (SBGN) standard.
SPARQL QueryExecutes queries against SPARQL endpoints to retrieve structured data, typically from knowledge bases like Wikidata or the NCI Thesaurus.
SQL QueryExecutes SQL queries against a database and retrieves the result, useful for extracting information from relational databases.
SQL Schema ExtractorConnects to a SQL database and retrieves its schema (Data Definition Language). Outputs a JSON describing the database schema.
Sidra2sqlConverts SIDRA data to SQL format.
SleepDelays execution for a set time.
Spacy NERPerforms Named Entity Recognition (NER) on text, identifying entities like persons, organizations, dates, and more using Spacy’s NLP framework.
Spleeter_deezerSplits audio tracks into components using Spleeter.
SubworkflowSelects and runs another workflow.
Suno Music DownloaderDownloads songs from Suno.
Suno Music GeneratorGenerates music using Suno.
Table2TextConverts a CSV-formatted table into a text by generating sentences for each row, facilitating more natural readability of tabular data.
Text InputAllows for user-provided text input, including template-based inputs for other components to use.
URLs ScraperScrapes a list of provided URLs for data. Returns content or errors for each URL in a structured dictionary.
UniProtFetches comprehensive protein sequence and functional data from the UniProt database.
WikiPathwaysFetches data from WikiPathways, an open-source platform for community-contributed biological pathways.
Wikidata clientRetrieves data from Wikidata API, returning knowledge/metadata for a given search term. Outputs relevant results in a structured format.
Wikipedia clientRetrieves data from Wikipedia API.
WolframAlpha clientConnects to the WolframAlpha API and retrieves computational or factual information based on the input query.
Yahoo Finance APIConnects to Yahoo's public API using yfinance and retrieves financial data about companies and their stocks.
Zip file extractorExtracts files from a ZIP archive, returning the paths to the extracted files on the server.

Running components

Every component will include a run() function that defines the running behavior of the component. This behavior can be triggered by calling the run() function programmatically with a component instance (i.e., components are defined as Python objects) or by using the run button in the interface, as seen in the image below. At runtime, the component inputs are either provided by the user in the form of text inputs or data inputs (i.e., file upload) or received from downstream components - via in-edges, while the output is printed in the interface as seen below.

Lunar workflow


Inputs/output received/sent from/to downstream/upstream components requires data type compatibility. Generally, a component A with output of type T could only link to a component B that expects an input of the same type T. The only exception to this type compatibility requirement is the case where A outputs a list of multiple instance of T. In such a case component B will automatically run in a loop for every instance received from A.

Data types

Lunar provides a set of data types for data validation between components in a workflow and ensures the correct visual representation of the data on the interface (Lunarflow)

NamePrimitive TypeDescription
FILEFileRepresents a file
TEXTstrRepresents text
CSVstrRepresents CSV formatted text
INTintRepresents an integer
FLOATfloatRepresents a floating-point number
CODEstrRepresents Python code
R_CODEstrRepresents R code
EMBEDDINGSlistRepresents embeddings as a list of floats
JSONdictRepresents a JSON object
IMAGEstrThe base64 string representation of an image
REPORTstrRepresents a report. Allows the creation of an editable rich text editor
TEMPLATEstrRepresents a template with replaceable variables
LISTlistRepresents a list
AGGREGATEDdictOnly assignable to component inputs. Allows the input to receive multiple outputs as a dictionary
PROPERTY_SELECTORstrDisplays a property selector component on the interface
PROPERTY_GETTERstrDisplays a property getter component on the interface
WORKFLOWdictRepresents a workflow. Used to run workflows recursively
SQLstrRepresents an SQL query
GRAPHQLstrRepresents a GraphQL query
SPARQLstrRepresents a SPARQL query
PASSWORDstrRepresents a secret
ANYanyAny type

Creating components

There are four main ways of creating new components and extending the Lunar Component Library:

  1. Programmatically - this can serve well when Lunar is deployed locally. Creating a new component programmatically assumes familiarity with Python and Object Oriented Programming (OOP) concepts. Details can be consulted here. Components defined programmatically are included in the component library and will be available to all users of the local system.
  2. Web-programmatically - this is a version of the above that does not require local deployment. Components can be defined programmatically in the web interface provided by Lunar. The interface requires the definition of input/output types and the main function (i.e., run()) defining the component's functionality at the least. Components created web-programmatically will be available only to the defining user.
  3. Via inheritance from other components - this only allows for specifying the configuration of an existing component and saving it for future re-use - the component functionality remains unchanged. nd the latter specified by the user before running the component. The new component will only be available to the creating user.
  4. Via *Coder components - these special component types allow the user to write the code defining the component's functionality. This is similar to 2. above, except for the input/output definitions - *Coder components will allow arbitrary input types that are compatible with the underlying programming language. *Coder components can be used in the current workflow or saved for future re-use by the same user, similar to variants 2. and 3. above. Currently *Coder components can be defined using Python and R.