Add documentation about offline engines

This commit is contained in:
Noémi Ványi 2022-09-29 22:24:00 +02:00
parent 31eef5b9db
commit 539e1a873e
7 changed files with 493 additions and 0 deletions

View File

@ -0,0 +1,129 @@
=====================================
Run shell commands from your instance
=====================================
Command line engines are custom engines that run commands in the shell of the
host. In this article you can learn how to create a command engine and how to
customize the result display.
The command
===========
When specifyng commands, you must make sure the commands are available on the
searx host. Searx will not install anything for you. Also, make sure that the
``searx`` user on your host is allowed to run the selected command and has
access to the required files.
Access control
==============
Be careful when creating command engines if you are running a public
instance. Do not expose any sensitive information. You can restrict access by
configuring a list of access tokens under tokens in your ``settings.yml``.
Available settings
==================
* ``command``: A comma separated list of the elements of the command. A special
token ``{{QUERY}}`` tells searx where to put the search terms of the
user. Example: ``['ls', '-l', '-h', '{{QUERY}}']``
* ``query_type``: The expected type of user search terms. Possible values:
``path`` and ``enum``. ``path`` checks if the uesr provided path is inside the
working directory. If not the query is not executed. ``enum`` is a list of
allowed search terms. If the user submits something which is not included in
the list, the query returns an error.
* ``delimiter``: A dict containing a delimiter char and the "titles" of each
element in keys.
* ``parse_regex``: A dict containing the regular expressions for each result
key.
* ``query_enum``: A list containing allowed search terms if ``query_type`` is
set to ``enum``.
* ``working_dir``: The directory where the command has to be executed. Default:
``.``
* ``result_separator``: The character that separates results. Default: ``\n``
Customize the result template
=============================
There is a default result template for displaying key-value pairs coming from
command engines. If you want something more tailored to your result types, you
can design your own template.
Searx relies on `Jinja2 <https://jinja.palletsprojects.com/>`_ for
templating. If you are familiar with Jinja, you will not have any issues
creating templates. You can access the result attributes with ``{{
result.attribute_name }}``.
In the example below the result has two attributes: ``header`` and ``content``.
To customize their diplay, you need the following template (you must define
these classes yourself):
.. code:: html
<div class="result">
<div class="result-header">
{{ result.header }}
</div>
<div class="result-content">
{{ result.content }}
</div>
</div>
Then put your template under ``searx/templates/{theme-name}/result_templates``
named ``your-template-name.html``. You can select your custom template with the
option ``result_template``.
.. code:: yaml
- name: your engine name
engine: command
result_template: your-template-name.html
Examples
========
Find files by name
------------------
The first example is to find files on your searx host. It uses the command
`find` available on most Linux distributions. It expects a path type query. The
path in the search request must be inside the ``working_dir``.
The results are displayed with the default `key-value.html` template. A result
is displayed in a single row table with the key "line".
.. code:: yaml
- name : find
engine : command
command : ['find', '.', '-name', '{{QUERY}}']
query_type : path
shortcut : fnd
tokens : []
disabled : True
delimiter :
chars : ' '
keys : ['line']
Find files by contents
-----------------------
In the second example, we define an engine that searches in the contents of the
files under the ``working_dir``. The search type is not defined, so the user can
input any string they want. To restrict the input, you can set the ``query_type``
to ``enum`` and only allow a set of search terms to protect
yourself. Alternatively, make the engine private, so no one malevolent accesses
the engine.
.. code:: yaml
- name : regex search in files
engine : command
command : ['grep', '{{QUERY}}']
shortcut : gr
tokens : []
disabled : True
delimiter :
chars : ' '
keys : ['line']

View File

@ -86,3 +86,60 @@ Show errors **DE**
{% endfor %}
.. flat-table:: Additional engines (commented out in settings.yml)
:header-rows: 1
:stub-columns: 2
* - Name
- Base URL
- Host
- Port
- Paging
* - elasticsearch
- localhost:9200
-
-
- False
* - meilicsearch
- localhost:7700
-
-
- True
* - mongodb
-
- 127.0.0.1
- 21017
- True
* - mysql_server
-
- 127.0.0.1
- 3306
- True
* - postgresql
-
- 127.0.0.1
- 5432
- True
* - redis_server
-
- 127.0.0.1
- 6379
- False
* - solr
- localhost:8983
-
-
- True
* - sqlite
-
-
-
- True

View File

@ -19,5 +19,9 @@ Administrator documentation
filtron
morty
engines
private-engines
command-engine
indexer-engines
no-sql-engines
plugins
buildhosts

View File

@ -0,0 +1,89 @@
==================
Search in indexers
==================
Searx supports three popular indexer search engines:
* Elasticsearch
* Meilisearch
* Solr
Elasticsearch
=============
Make sure that the Elasticsearch user has access to the index you are querying.
If you are not using TLS during your connection, set ``enable_http`` to ``True``.
.. code:: yaml
- name : elasticsearch
shortcut : es
engine : elasticsearch
base_url : http://localhost:9200
username : elastic
password : changeme
index : my-index
query_type : match
enable_http : True
Available settings
------------------
* ``base_url``: URL of Elasticsearch instance. By default it is set to ``http://localhost:9200``.
* ``index``: Name of the index to query. Required.
* ``query_type``: Elasticsearch query method to use. Available: ``match``,
``simple_query_string``, ``term``, ``terms``, ``custom``.
* ``custom_query_json``: If you selected ``custom`` for ``query_type``, you must
provide the JSON payload in this option.
* ``username``: Username in Elasticsearch
* ``password``: Password for the Elasticsearch user
Meilisearch
===========
If you are not using TLS during connection, set ``enable_http`` to ``True``.
.. code:: yaml
- name : meilisearch
engine : meilisearch
shortcut: mes
base_url : http://localhost:7700
index : my-index
enable_http: True
Available settings
------------------
* ``base_url``: URL of the Meilisearch instance. By default it is set to http://localhost:7700
* ``index``: Name of the index to query. Required.
* ``auth_key``: Key required for authentication.
* ``facet_filters``: List of facets to search in.
Solr
====
If you are not using TLS during connection, set ``enable_http`` to ``True``.
.. code:: yaml
- name : solr
engine : solr
shortcut : slr
base_url : http://localhost:8983
collection : my-collection
sort : asc
enable_http : True
Available settings
------------------
* ``base_url``: URL of the Meilisearch instance. By default it is set to http://localhost:8983
* ``collection``: Name of the collection to query. Required.
* ``sort``: Sorting of the results. Available: ``asc``, ``desc``.
* ``rows``: Maximum number of results from a query. Default value: 10.
* ``field_list``: List of fields returned from the query.
* ``default_fields``: Default fields to query.
* ``query_fields``: List of fields with a boost factor. The bigger the boost
factor of a field, the more important the field is in the query. Example:
``qf="field1^2.3 field2"``

View File

@ -0,0 +1,170 @@
===========================
Query SQL and NoSQL servers
===========================
SQL
===
SQL servers are traditional databases with predefined data schema. Furthermore,
modern versions also support BLOB data.
You can search in the following servers:
* `PostgreSQL`_
* `MySQL`_
* `SQLite`_
The configuration of the new database engines are similar. You must put a valid
SELECT SQL query in ``query_str``. At the moment you can only bind at most
one parameter in your query.
Do not include LIMIT or OFFSET in your SQL query as the engines
rely on these keywords during paging.
PostgreSQL
----------
Required PyPi package: ``psychopg2``
You can find an example configuration below:
.. code:: yaml
- name : postgresql
engine : postgresql
database : my_database
username : searx
password : password
query_str : 'SELECT * from my_table WHERE my_column = %(query)s'
shortcut : psql
Available options
~~~~~~~~~~~~~~~~~
* ``host``: IP address of the host running PostgreSQL. By default it is ``127.0.0.1``.
* ``port``: Port number PostgreSQL is listening on. By default it is ``5432``.
* ``database``: Name of the database you are connecting to.
* ``username``: Name of the user connecting to the database.
* ``password``: Password of the database user.
* ``query_str``: Query string to run. Keywords like ``LIMIT`` and ``OFFSET`` are not allowed. Required.
* ``limit``: Number of returned results per page. By default it is 10.
MySQL
-----
Required PyPi package: ``mysql-connector-python``
This is an example configuration for quering a MySQL server:
.. code:: yaml
- name : mysql
engine : mysql_server
database : my_database
username : searx
password : password
limit : 5
query_str : 'SELECT * from my_table WHERE my_column=%(query)s'
shortcut : mysql
Available options
~~~~~~~~~~~~~~~~~
* ``host``: IP address of the host running MySQL. By default it is ``127.0.0.1``.
* ``port``: Port number MySQL is listening on. By default it is ``3306``.
* ``database``: Name of the database you are connecting to.
* ``auth_plugin``: Authentication plugin to use. By default it is ``caching_sha2_password``.
* ``username``: Name of the user connecting to the database.
* ``password``: Password of the database user.
* ``query_str``: Query string to run. Keywords like ``LIMIT`` and ``OFFSET`` are not allowed. Required.
* ``limit``: Number of returned results per page. By default it is 10.
SQLite
------
You can read from your database ``my_database`` using this example configuration:
.. code:: yaml
- name : sqlite
engine : sqlite
shortcut: sq
database : my_database
query_str : 'SELECT * FROM my_table WHERE my_column=:query'
Available options
~~~~~~~~~~~~~~~~~
* ``database``: Name of the database you are connecting to.
* ``query_str``: Query string to run. Keywords like ``LIMIT`` and ``OFFSET`` are not allowed. Required.
* ``limit``: Number of returned results per page. By default it is 10.
NoSQL
=====
NoSQL data stores are used for storing arbitrary data without first defining their
structure. To query the supported servers, you must install their drivers using PyPi.
You can search in the following servers:
* `Redis`_
* `MongoDB`_
Redis
-----
Reqired PyPi package: ``redis``
Example configuration:
.. code:: yaml
- name : mystore
engine : redis_server
exact_match_only : True
host : 127.0.0.1
port : 6379
password : secret-password
db : 0
shortcut : rds
enable_http : True
Available options
~~~~~~~~~~~~~~~~~
* ``host``: IP address of the host running Redis. By default it is ``127.0.0.1``.
* ``port``: Port number Redis is listening on. By default it is ``6379``.
* ``password``: Password if required by Redis.
* ``db``: Number of the database you are connecting to.
* ``exact_match_only``: Enable if you need exact matching. By default it is ``True``.
MongoDB
-------
Required PyPi package: ``pymongo``
Below is an example configuration for using a MongoDB collection:
.. code:: yaml
- name : mymongo
engine : mongodb
shortcut : icm
host : '127.0.0.1'
port : 27017
database : personal
collection : income
key : month
enable_http: True
Available options
~~~~~~~~~~~~~~~~~
* ``host``: IP address of the host running MongoDB. By default it is ``127.0.0.1``.
* ``port``: Port number MongoDB is listening on. By default it is ``27017``.
* ``password``: Password if required by Redis.
* ``database``: Name of the database you are connecting to.
* ``collection``: Name of the collection you want to search in.
* ``exact_match_only``: Enable if you need exact matching. By default it is ``True``.

Binary file not shown.

After

Width:  |  Height:  |  Size: 74 KiB

View File

@ -0,0 +1,44 @@
=============================
How to create private engines
=============================
If you are running your public searx instance, you might want to restrict access
to some engines. Maybe you are afraid of bots might abusing the engine. Or the
engine might return private results you do not want to share with strangers.
Server side configuration
=========================
You can make any engine private by setting a list of tokens in your settings.yml
file. In the following example, we set two different tokens that provide access
to the engine.
.. code:: yaml
- name: my-private-google
engine: google
shortcut: pgo
tokens: ['my-secret-token-1', 'my-secret-token-2']
To access the private engine, you must distribute the tokens to your searx
users. It is up to you how you let them know what the access token is you
created.
Client side configuration
=========================
As a searx instance user, you can add any number of access tokens on the
Preferences page. You have to set a comma separated lists of strings in "Engine
tokens" input, then save your new preferences.
.. image:: prefernces-private.png
:width: 600px
:align: center
:alt: location of token textarea
Once the Preferences page is loaded again, you can see the information of the
private engines you got access to. If you cannot see the expected engines in the
engines list, double check your token. If there is no issue with the token,
contact your instance administrator.