searx/blog/intro-offline.html

122 lines
7.8 KiB
HTML
Raw Normal View History

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Preparation for offline engines &#8212; searx 0.12.0 documentation</title>
<link rel="stylesheet" href="../_static/style.css" type="text/css" />
<link rel="stylesheet" href="../_static/pygments.css" type="text/css" />
<script type="text/javascript">
var DOCUMENTATION_OPTIONS = {
URL_ROOT: '../',
VERSION: '0.12.0',
COLLAPSE_INDEX: false,
FILE_SUFFIX: '.html',
HAS_SOURCE: true,
SOURCELINK_SUFFIX: '.txt'
};
</script>
<script type="text/javascript" src="../_static/jquery.js"></script>
<script type="text/javascript" src="../_static/underscore.js"></script>
<script type="text/javascript" src="../_static/doctools.js"></script>
<link rel="index" title="Index" href="../genindex.html" />
<link rel="search" title="Search" href="../search.html" />
<link media="only screen and (max-device-width: 480px)" href="../_static/small_flask.css" type= "text/css" rel="stylesheet" />
<meta name="viewport" content="width=device-width, initial-scale=0.9, maximum-scale=0.9">
</head>
<body>
<div class="document">
<div class="documentwrapper">
<div class="bodywrapper">
<div class="body" role="main">
<div class="section" id="preparation-for-offline-engines">
<h1>Preparation for offline engines<a class="headerlink" href="#preparation-for-offline-engines" title="Permalink to this headline"></a></h1>
<div class="section" id="offline-engines">
<h2>Offline engines<a class="headerlink" href="#offline-engines" title="Permalink to this headline"></a></h2>
<p>To extend the functionality of searx, offline engines are going to be introduced. An offline engine is an engine which does not need Internet connection to perform a search and does not use HTTP to communicate.</p>
<p>Offline engines can be configured as online engines, by adding those to the <cite>engines</cite> list of <cite>settings.yml</cite>. Thus, searx finds the engine file and imports it.</p>
<p>Example skeleton for the new engines:</p>
<div class="code python highlight-default"><div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">subprocess</span> <span class="k">import</span> <span class="n">PIPE</span><span class="p">,</span> <span class="n">Popen</span>
<span class="n">categories</span> <span class="o">=</span> <span class="p">[</span><span class="s1">&#39;general&#39;</span><span class="p">]</span>
<span class="n">offline</span> <span class="o">=</span> <span class="kc">True</span>
<span class="k">def</span> <span class="nf">init</span><span class="p">(</span><span class="n">settings</span><span class="p">):</span>
<span class="k">pass</span>
<span class="k">def</span> <span class="nf">search</span><span class="p">(</span><span class="n">query</span><span class="p">,</span> <span class="n">params</span><span class="p">):</span>
<span class="n">process</span> <span class="o">=</span> <span class="n">Popen</span><span class="p">([</span><span class="s1">&#39;ls&#39;</span><span class="p">,</span> <span class="n">query</span><span class="p">],</span> <span class="n">stdout</span><span class="o">=</span><span class="n">PIPE</span><span class="p">)</span>
<span class="n">return_code</span> <span class="o">=</span> <span class="n">process</span><span class="o">.</span><span class="n">wait</span><span class="p">()</span>
<span class="k">if</span> <span class="n">return_code</span> <span class="o">!=</span> <span class="mi">0</span><span class="p">:</span>
<span class="k">raise</span> <span class="ne">RuntimeError</span><span class="p">(</span><span class="s1">&#39;non-zero return code&#39;</span><span class="p">,</span> <span class="n">return_code</span><span class="p">)</span>
<span class="n">results</span> <span class="o">=</span> <span class="p">[]</span>
<span class="n">line</span> <span class="o">=</span> <span class="n">process</span><span class="o">.</span><span class="n">stdout</span><span class="o">.</span><span class="n">readline</span><span class="p">()</span>
<span class="k">while</span> <span class="n">line</span><span class="p">:</span>
<span class="n">result</span> <span class="o">=</span> <span class="n">parse_line</span><span class="p">(</span><span class="n">line</span><span class="p">)</span>
<span class="n">results</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">results</span><span class="p">)</span>
<span class="n">line</span> <span class="o">=</span> <span class="n">process</span><span class="o">.</span><span class="n">stdout</span><span class="o">.</span><span class="n">readline</span><span class="p">()</span>
<span class="k">return</span> <span class="n">results</span>
</pre></div>
</div>
</div>
<div class="section" id="development-progress">
<h2>Development progress<a class="headerlink" href="#development-progress" title="Permalink to this headline"></a></h2>
<p>First, a proposal has been created as a Github issue. Then it was moved to the wiki as a design document. You can read it here: <a class="reference external" href="https://github.com/asciimoo/searx/wiki/Offline-engines">https://github.com/asciimoo/searx/wiki/Offline-engines</a></p>
<p>In this development step, searx core was prepared to accept and perform offline searches. Offline search requests are scheduled together with regular offline requests.</p>
<p>As offline searches can return arbitrary results depending on the engine, the current result templates were insufficient to present such results. Thus, a new template is introduced which is caplable of presenting arbitrary key value pairs as a table. You can check out the pull request for more details: <a class="reference external" href="https://github.com/asciimoo/searx/pull/1700">https://github.com/asciimoo/searx/pull/1700</a></p>
</div>
<div class="section" id="next-steps">
<h2>Next steps<a class="headerlink" href="#next-steps" title="Permalink to this headline"></a></h2>
<p>Today, it is possible to create/run an offline engine. However, it is going to be publicly available for everyone who knows the searx instance. So the next step is to introduce token based access for engines. This way administrators are able to limit the access to private engines.</p>
</div>
<div class="section" id="acknowledgement">
<h2>Acknowledgement<a class="headerlink" href="#acknowledgement" title="Permalink to this headline"></a></h2>
<p>This development was sponsored by <a class="reference external" href="https://nlnet.nl/discovery">Search and Discovery Fund</a> of <a class="reference external" href="https://nlnet.nl/">NLnet Foundation</a> .</p>
<div class="line-block">
<div class="line">Happy hacking.</div>
<div class="line">kvch // 2019.10.21 17:03</div>
</div>
</div>
</div>
</div>
</div>
</div>
<div class="sphinxsidebar" role="navigation" aria-label="main navigation">
<div class="sphinxsidebarwrapper"><div class="sidebar_container body">
<h1>Searx</h1>
<ul>
<li><a href="../index.html">Home</a></li>
<li><a href="https://github.com/asciimoo/searx">Source</a></li>
<li><a href="blog.html">Blog</a></li>
<li><a href="https://github.com/asciimoo/searx/wiki">Wiki</a></li>
<li><a href="https://github.com/asciimoo/searx/wiki/Searx-instances">Public instances</a></li>
</ul>
<hr />
<ul>
<li><a href="https://twitter.com/Searx_engine">Twitter</a></li>
</ul>
</div>
</div>
</div>
<div class="clearer"></div>
</div>
<div class="footer">
&copy; Copyright 2015-2019, Adam Tauber, Noémi Ványi.
</div>
</body>
</html>