Add documentation about offline engines

2025-03-27 08:50:20 +01:00 · 2022-09-29 22:24:00 +02:00 · 2022-09-29 22:24:00 +02:00 · 539e1a873e
commit 539e1a873e
parent 31eef5b9db
7 changed files with 493 additions and 0 deletions
--- a/docs/admin/command-engine.rst
+++ b/docs/admin/command-engine.rst
@ -0,0 +1,129 @@
+=====================================
+Run shell commands from your instance
+=====================================
+
+Command line engines are custom engines that run commands in the shell of the
+host. In this article you can learn how to create a command engine and how to
+customize the result display.
+
+The command
+===========
+
+When specifyng commands, you must make sure the commands are available on the
+searx host. Searx will not install anything for you. Also, make sure that the
+``searx`` user on your host is allowed to run the selected command and has
+access to the required files.
+
+Access control
+==============
+
+Be careful when creating command engines if you are running a public
+instance. Do not expose any sensitive information. You can restrict access by
+configuring a list of access tokens under tokens in your ``settings.yml``.
+
+Available settings
+==================
+
+* ``command``: A comma separated list of the elements of the command. A special
+  token ``{{QUERY}}`` tells searx where to put the search terms of the
+  user. Example: ``['ls', '-l', '-h', '{{QUERY}}']``
+* ``query_type``: The expected type of user search terms. Possible values:
+  ``path`` and ``enum``. ``path`` checks if the uesr provided path is inside the
+  working directory. If not the query is not executed. ``enum`` is a list of
+  allowed search terms. If the user submits something which is not included in
+  the list, the query returns an error.
+* ``delimiter``: A dict containing a delimiter char and the "titles" of each
+  element in keys.
+* ``parse_regex``: A dict containing the regular expressions for each result
+  key.
+* ``query_enum``: A list containing allowed search terms if ``query_type`` is
+  set to ``enum``.
+* ``working_dir``: The directory where the command has to be executed. Default:
+  ``.``
+* ``result_separator``: The character that separates results. Default: ``\n``
+
+Customize the result template
+=============================
+
+There is a default result template for displaying key-value pairs coming from
+command engines. If you want something more tailored to your result types, you
+can design your own template.
+
+Searx relies on `Jinja2 <https://jinja.palletsprojects.com/>`_ for
+templating. If you are familiar with Jinja, you will not have any issues
+creating templates. You can access the result attributes with ``{{
+result.attribute_name }}``.
+
+In the example below the result has two attributes: ``header`` and ``content``.
+To customize their diplay, you need the following template (you must define
+these classes yourself):
+
+.. code:: html
+
+    <div class="result">
+        <div class="result-header">
+            {{ result.header }}
+        </div>
+        <div class="result-content">
+            {{ result.content }}
+        </div>
+    </div>
+
+Then put your template under ``searx/templates/{theme-name}/result_templates``
+named ``your-template-name.html``. You can select your custom template with the
+option ``result_template``.
+
+.. code:: yaml
+
+  - name: your engine name
+    engine: command
+    result_template: your-template-name.html
+
+Examples
+========
+
+Find files by name
+------------------
+
+The first example is to find files on your searx host. It uses the command
+`find` available on most Linux distributions. It expects a path type query. The
+path in the search request must be inside the ``working_dir``.
+
+The results are displayed with the default `key-value.html` template.  A result
+is displayed in a single row table with the key "line".
+
+.. code:: yaml
+
+  - name : find
+    engine : command
+    command : ['find', '.', '-name', '{{QUERY}}']
+    query_type : path
+    shortcut : fnd
+    tokens : []
+    disabled : True
+    delimiter :
+        chars : ' '
+        keys : ['line']
+
+
+Find files by contents
+-----------------------
+
+In the second example, we define an engine that searches in the contents of the
+files under the ``working_dir``. The search type is not defined, so the user can
+input any string they want. To restrict the input, you can set the ``query_type``
+to ``enum`` and only allow a set of search terms to protect
+yourself. Alternatively, make the engine private, so no one malevolent accesses
+the engine.
+
+.. code:: yaml
+
+  - name : regex search in files
+    engine : command
+    command : ['grep', '{{QUERY}}']
+    shortcut : gr
+    tokens : []
+    disabled : True
+    delimiter :
+        chars : ' '
+        keys : ['line']
--- a/docs/admin/engines.rst
+++ b/docs/admin/engines.rst
@ -86,3 +86,60 @@ Show errors   **DE**

     {% endfor %}

+   .. flat-table:: Additional engines (commented out in settings.yml)
+      :header-rows: 1
+      :stub-columns: 2
+
+      * - Name
+        - Base URL
+        - Host
+        - Port
+        - Paging
+
+      * - elasticsearch
+        - localhost:9200
+        - 
+        - 
+        - False
+
+      * - meilicsearch
+        - localhost:7700
+        - 
+        - 
+        - True
+
+      * - mongodb
+        - 
+        - 127.0.0.1
+        - 21017
+        - True
+
+      * - mysql_server
+        - 
+        - 127.0.0.1
+        - 3306
+        - True
+
+      * - postgresql
+        - 
+        - 127.0.0.1
+        - 5432
+        - True
+
+      * - redis_server
+        - 
+        - 127.0.0.1
+        - 6379
+        - False
+
+      * - solr
+        - localhost:8983
+        - 
+        - 
+        - True
+
+      * - sqlite
+        - 
+        - 
+        - 
+        - True
--- a/docs/admin/index.rst
+++ b/docs/admin/index.rst
@ -19,5 +19,9 @@ Administrator documentation
   filtron
   morty
   engines
+   private-engines
+   command-engine
+   indexer-engines
+   no-sql-engines
   plugins
   buildhosts
--- a/docs/admin/indexer-engines.rst
+++ b/docs/admin/indexer-engines.rst
@ -0,0 +1,89 @@
+==================
+Search in indexers
+==================
+
+Searx supports three popular indexer search engines:
+
+* Elasticsearch
+* Meilisearch
+* Solr
+
+Elasticsearch
+=============
+
+Make sure that the Elasticsearch user has access to the index you are querying.
+If you are not using TLS during your connection, set ``enable_http`` to ``True``.
+
+.. code:: yaml
+
+  - name : elasticsearch
+    shortcut : es
+    engine : elasticsearch
+    base_url : http://localhost:9200
+    username : elastic
+    password : changeme
+    index : my-index
+    query_type : match
+    enable_http : True
+
+Available settings
+------------------
+
+* ``base_url``: URL of Elasticsearch instance. By default it is set to ``http://localhost:9200``.
+* ``index``: Name of the index to query. Required.
+* ``query_type``: Elasticsearch query method to use. Available: ``match``,
+  ``simple_query_string``, ``term``, ``terms``, ``custom``.
+* ``custom_query_json``: If you selected ``custom`` for ``query_type``, you must
+  provide the JSON payload in this option.
+* ``username``: Username in Elasticsearch
+* ``password``: Password for the Elasticsearch user
+
+Meilisearch
+===========
+
+If you are not using TLS during connection, set ``enable_http`` to ``True``.
+
+.. code:: yaml
+
+  - name : meilisearch
+    engine : meilisearch
+    shortcut: mes
+    base_url : http://localhost:7700
+    index : my-index
+    enable_http: True
+
+Available settings
+------------------
+
+* ``base_url``: URL of the Meilisearch instance. By default it is set to http://localhost:7700
+* ``index``: Name of the index to query. Required.
+* ``auth_key``: Key required for authentication.
+* ``facet_filters``: List of facets to search in.
+
+Solr
+====
+
+If you are not using TLS during connection, set ``enable_http`` to ``True``.
+
+.. code:: yaml
+
+  - name : solr
+    engine : solr
+    shortcut : slr
+    base_url : http://localhost:8983
+    collection : my-collection
+    sort : asc
+    enable_http : True
+
+Available settings
+------------------
+
+* ``base_url``: URL of the Meilisearch instance. By default it is set to http://localhost:8983
+* ``collection``: Name of the collection to query. Required.
+* ``sort``: Sorting of the results. Available: ``asc``, ``desc``.
+* ``rows``: Maximum number of results from a query. Default value: 10.
+* ``field_list``: List of fields returned from the query.
+* ``default_fields``: Default fields to query.
+* ``query_fields``: List of fields with a boost factor. The bigger the boost
+  factor of a field, the more important the field is in the query. Example:
+  ``qf="field1^2.3 field2"``
--- a/docs/admin/no-sql-engines.rst
+++ b/docs/admin/no-sql-engines.rst
@ -0,0 +1,170 @@
+===========================
+Query SQL and NoSQL servers
+===========================
+
+SQL
+===
+
+SQL servers are traditional databases with predefined data schema. Furthermore,
+modern versions also support BLOB data.
+
+You can search in the following servers:
+
+* `PostgreSQL`_
+* `MySQL`_
+* `SQLite`_
+
+The configuration of the new database engines are similar. You must put a valid
+SELECT SQL query in ``query_str``. At the moment you can only bind at most
+one parameter in your query.
+
+Do not include LIMIT or OFFSET in your SQL query as the engines
+rely on these keywords during paging.
+
+PostgreSQL
+----------
+
+Required PyPi package: ``psychopg2``
+
+You can find an example configuration below:
+
+.. code:: yaml
+
+  - name : postgresql
+    engine : postgresql
+    database : my_database
+    username : searx
+    password : password
+    query_str : 'SELECT * from my_table WHERE my_column = %(query)s'
+    shortcut : psql
+
+
+Available options
+~~~~~~~~~~~~~~~~~
+* ``host``: IP address of the host running PostgreSQL. By default it is ``127.0.0.1``.
+* ``port``: Port number PostgreSQL is listening on. By default it is ``5432``.
+* ``database``: Name of the database you are connecting to.
+* ``username``: Name of the user connecting to the database.
+* ``password``: Password of the database user.
+* ``query_str``: Query string to run. Keywords like ``LIMIT`` and ``OFFSET`` are not allowed. Required.
+* ``limit``: Number of returned results per page. By default it is 10.
+
+MySQL
+-----
+
+Required PyPi package: ``mysql-connector-python``
+
+This is an example configuration for quering a MySQL server:
+
+.. code:: yaml
+
+  - name : mysql
+    engine : mysql_server
+    database : my_database
+    username : searx
+    password : password
+    limit : 5
+    query_str : 'SELECT * from my_table WHERE my_column=%(query)s'
+    shortcut : mysql
+
+
+Available options
+~~~~~~~~~~~~~~~~~
+* ``host``: IP address of the host running MySQL. By default it is ``127.0.0.1``.
+* ``port``: Port number MySQL is listening on. By default it is ``3306``.
+* ``database``: Name of the database you are connecting to.
+* ``auth_plugin``: Authentication plugin to use. By default it is ``caching_sha2_password``.
+* ``username``: Name of the user connecting to the database.
+* ``password``: Password of the database user.
+* ``query_str``: Query string to run. Keywords like ``LIMIT`` and ``OFFSET`` are not allowed. Required.
+* ``limit``: Number of returned results per page. By default it is 10.
+
+SQLite
+------
+
+You can read from your database ``my_database`` using this example configuration:
+
+.. code:: yaml
+
+  - name : sqlite
+    engine : sqlite
+    shortcut: sq
+    database : my_database
+    query_str : 'SELECT * FROM my_table WHERE my_column=:query'
+
+
+Available options
+~~~~~~~~~~~~~~~~~
+* ``database``: Name of the database you are connecting to.
+* ``query_str``: Query string to run. Keywords like ``LIMIT`` and ``OFFSET`` are not allowed. Required.
+* ``limit``: Number of returned results per page. By default it is 10.
+
+NoSQL
+=====
+
+NoSQL data stores are used for storing arbitrary data without first defining their
+structure. To query the supported servers, you must install their drivers using PyPi.
+
+You can search in the following servers:
+
+* `Redis`_
+* `MongoDB`_
+
+Redis
+-----
+
+Reqired PyPi package: ``redis``
+
+Example configuration:
+
+.. code:: yaml
+
+  - name : mystore
+    engine : redis_server
+    exact_match_only : True
+    host : 127.0.0.1
+    port : 6379
+    password : secret-password
+    db : 0
+    shortcut : rds
+    enable_http : True
+
+Available options
+~~~~~~~~~~~~~~~~~
+
+* ``host``: IP address of the host running Redis. By default it is ``127.0.0.1``.
+* ``port``: Port number Redis is listening on. By default it is ``6379``.
+* ``password``: Password if required by Redis.
+* ``db``: Number of the database you are connecting to.
+* ``exact_match_only``: Enable if you need exact matching. By default it is ``True``.
+
+
+MongoDB
+-------
+
+Required PyPi package: ``pymongo``
+
+Below is an example configuration for using a MongoDB collection:
+
+.. code:: yaml
+
+  - name : mymongo
+    engine : mongodb
+    shortcut : icm
+    host : '127.0.0.1'
+    port : 27017
+    database : personal
+    collection : income
+    key : month
+    enable_http: True
+
+
+Available options
+~~~~~~~~~~~~~~~~~
+
+* ``host``: IP address of the host running MongoDB. By default it is ``127.0.0.1``.
+* ``port``: Port number MongoDB is listening on. By default it is ``27017``.
+* ``password``: Password if required by Redis.
+* ``database``: Name of the database you are connecting to.
+* ``collection``: Name of the collection you want to search in.
+* ``exact_match_only``: Enable if you need exact matching. By default it is ``True``.
--- a/docs/admin/prefernces-private.png
+++ b/docs/admin/prefernces-private.png
--- a/docs/admin/private-engines.rst
+++ b/docs/admin/private-engines.rst
@ -0,0 +1,44 @@
+=============================
+How to create private engines
+=============================
+
+If you are running your public searx instance, you might want to restrict access
+to some engines. Maybe you are afraid of bots might abusing the engine. Or the
+engine might return private results you do not want to share with strangers.
+
+Server side configuration
+=========================
+
+You can make any engine private by setting a list of tokens in your settings.yml
+file. In the following example, we set two different tokens that provide access
+to the engine.
+
+.. code:: yaml
+
+    - name: my-private-google
+      engine: google
+      shortcut: pgo
+      tokens: ['my-secret-token-1', 'my-secret-token-2']
+
+
+To access the private engine, you must distribute the tokens to your searx
+users. It is up to you how you let them know what the access token is you
+created.
+
+Client side configuration
+=========================
+
+As a searx instance user, you can add any number of access tokens on the
+Preferences page. You have to set a comma separated lists of strings in "Engine
+tokens" input, then save your new preferences.
+
+.. image:: prefernces-private.png
+    :width: 600px
+    :align: center
+    :alt: location of token textarea
+
+Once the Preferences page is loaded again, you can see the information of the
+private engines you got access to. If you cannot see the expected engines in the
+engines list, double check your token. If there is no issue with the token,
+contact your instance administrator.
+