mirror of https://github.com/searx/searx synced 2024-12-30 18:38:02 +01:00

Table of Contents

Searx online IDE
Searx engine database

Database model
Database storage

Scenario

Scenario - An user adds a new engine
Scenario - An user edits an existing engine
Scenario - An user adds XPath engine
Scenario - An user moderate an change

The engines will keep failing. More checking at runtime can help to detect the errors, but it won't solve the issues. We lack time and hands to review all the changes.

Searx online IDE

Currently, to update an engine you have: git clone searx, install the dependencies, edit some files, make run, git commit, create a pull request. Same about the review: get the code, checks the different options, approve / comment.

What if we could make it more simple for new contributors ? Like editing a Jupyter notebook ?

More precisly, the idea is to have the "edit this file" button of Github on steroid:

starts an online IDE with autocompletion.
lxml and other dependencies are availabled.
there is a "run" button which allow to try the code (with the all the parameters).
a save button creates a PR in the git repository.
like github comments the code is automaticaly saved in the browser.

For the json and xpath engines won't even need an online IDE: just some text inputs, and a "try" button. Later, a tool similar to wuzz can help for example.

Technicaly:

Python can run inside a browser using WASM, but it may not work as expected. So, a Python inside a container would do the job:
- here we have to be very careful about the security : https://jupyter-notebook.readthedocs.io/en/stable/security.html
- the IP addresses will be blocked by some engines.
- there is running cost.
The idea, here is to provide an online sandbox to try searx engine online. To avoid abuse, a Github OAuth anthentification would be required (so there is no need for an additional account).
The IDE: fork of Jupyter or and monaco with language server ?
The result can be saved as a PR in the git repository.
Some engines reuse code from other engines. Example: the google engine, duckduckgo_images.

Searx engine database

Looking at the issue What is the data lifecycle ? #2052, there is another way to deal with that.

Why not make something similar to wikidata but only for searx: one entry per engine (and entries for currencies, DOI). The code could even be one field on a entry.

From the engine developer point of view:

a "page" per engine (similar to https://www.wikidata.org/wiki/Q12805 for example, but only with the Searx revelant data).
there are different fields, like engine name, bang shortcut name, categories, etc...
the code is one the field. An online IDE allows to edit the code and try it online.
save

Someone reviews the change request, and approve /deny.

Authentification can use OAuth from github.

The purpose here is to simply the maintenance/edit process to the maximum.

Disclaimers. I'm aware that:

Even if a framework like django can help a lot, it still a lot of work.
The 10 "write/update engine" tasks may turn into more than 10 "review" tasks.

Database model

Class Engine:

official URL
bang URL
autocomplete URL
Category / Subcategory
Short name
Engine name
favicon.ico
engine definition

Class Engine definition

version
default timeout
languages (equivalent of searx/data/engines_languages.json)

Class Engine XPath(Engine definition)

search_url
url_xpath
title_xpath
content_xpath

Class Engine JSON(Engine definition)

search_url
url_query
content_query
title_query
paging
suggestion_query
results_query

Class Engine Python(Engine definition)

code

Class Engine DSL(Engine definition)

code

Database storage

The storage can be a public git repository, so the backup is free.

Scenario

Scenario - An user adds a new engine

A user adds a search engine:

URL (mandatory)
Category (and Subcategory as DuckDuckGo) (optional)
Searx name & shortcut (optional)

A backend looks for:

opensearch.xml or sitelinks search box --> free external bang update/addition
favicon.ico
web site name

Add to engine definition:

search URL
autocomplete URL
favicon
name

Searx online IDE

Searx engine database

Database model

Database storage

Scenario

Scenario - An user adds a new engine

Scenario - An user edits an existing engine

Scenario - An user adds XPath engine

Scenario - An user moderate an change