* [fix] Rename ccengine engine to openverse
The CC engine was merged with WordPress and renamed to Openverse
Source: https://wordpress.org/news/2021/05/welcome-to-openverse/
* [fix] ccengine engine - avoid unwanted redirects
api.openverse.engineering is a little picky and wants to have a trailing slash
in the path:
/v1/images? -->/ v1/images/?
otherwise it redirects, here is the debug log:
DEBUG searx.network.openverse : HTTP Request: GET https://api.openverse.engineering/v1/images?&page=1&page_size=20&format=json&q=foo "HTTP/2 301 Moved Permanently" (text/html; charset=utf-8)
DEBUG searx.network.openverse : HTTP Request: GET https://api.openverse.engineering/v1/images/?&page=1&page_size=20&format=json&q=foo "HTTP/2 200 OK" (application/json)
WARNING searx.engines.openverse : ErrorContext('searx/search/processors/online.py', 105, 'count_error(', None, '1 redirects, maximum: 0', ('200', 'OK', 'api.openverse.engineering')) True
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
* [fix] FutureWarning from lxml
Just in case if content is None, the original code will skip extract_text(), and
just append the None value to 'content'. So just add allow_none=True, and this
will return None without raising a ValueError in extract_text().
* [enh] Add pagination to Brave
Also added ```&spellcheck=1``` because now it is disabled by default, not returning any ```suggestion_xpath```.
Co-authored-by: Léon Tiekötter <leon@tiekoetter.com>
Co-authored-by: Markus Heiser <markus.heiser@darmarit.de>
Co-authored-by: capric98 <42015599+capric98@users.noreply.github.com>
Co-authored-by: Allen <64094914+allendema@users.noreply.github.com>
## What does this PR do?
This should fix#3164.
The problem is that `httpx` keeps making breaking changes to their library, so we just have to adjust the code a little bit to make it work with the new version of the library.
## Related issues
Closes #3164
* [mod] /image_proxy: don't decompress images
* [fix] image_proxy: always close the httpx respone
previously, when the content type was not an image and some other error,
the httpx response was not closed
* [mod] /image_proxy: use HTTP/1 instead of HTTP/2
httpx: HTTP/2 is slow when a lot data is downloaded.
https://github.com/dalf/pyhttp-benchmark
also, the usage of HTTP/1 decreases the load average
* [mod] searx.utils.dict_subset: rewrite with comprehension
Co-authored-by: Alexandre Flament <alex@al-f.net>
* [enh] Add Tineye reverse image search
Other optional parametesr:
"&sort=crawl_date" can be appended to search_string to sort results by date.
"&domain=example.org" can be implemented to search_string to get results from just one domain.
Public instances could get relatively fast timed-out for 3600s.
* [enh] Add TIneye to settings.yml
Check if that's the right shortcut.
* [mod] Fix checks
* [mod] Try to fix checks
* [mod] Use Four spaces for indentation
And set paging back to True
Co-authored-by: Noémi Ványi <kvch@users.noreply.github.com>
In case of CAPTCHA raise a SearxEngineCaptchaException and suspend for 7 days.
When get_sc_code() fails raise a SearxEngineResponseException and suspend for 7
days.
[1] https://github.com/searxng/searxng/pull/695
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Startpage has introduced new anti-scraping measures that make SearXNG instances
run into captchas:
1. some arguments has been removed and a new `sc` has been added.
2. search path changed from `do/search` to `sp/search`
3. POST request is no longer needed
Closes: https://github.com/searxng/searxng/issues/692
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
## What does this PR do?
Gives the user the possibility to search their own prowlarr instances.
Info: https://wiki.servarr.com/en/prowlarr
Github: https://github.com/Prowlarr/Prowlarr
## Why is this change important?
Prowlarr searchs multiple upstream search providers, thus allows to use that functionality through searx.
* [enh] Add autocompleter from Brave
Raw response example: https://search.brave.com/api/suggest?q=how%20to:%20with%20j
Headers are needed in order to get a 200 response, thus Searx user-agent is used.
Other URL param could be '&rich=false' or '&rich=true'.
Languages are supported by mapping the language to a domain. If domain is not
found in :py:obj:`lang2domain` URL ``<lang>.search.yahoo.com`` is used.
Closes#3020
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Add configurable setting to rank search results higher when part of the
domain (e.g. 'en' in 'en.wikipedia.org' or 'de' in 'beispiel.de')
matches the selected search language. Does not apply to e.g. 'be' in
'youtube.com'.
Closes#206
The patch introduced earlier broke the behaviour for instance
admins running searx from packages. This fix aims to provide
compatibility for everyone.
Closes#3061
The implementation uses the Qwant API (https://api.qwant.com/v3). The API is
undocumented but can be reverse engineered by reading the network log of
https://www.qwant.com/ queries.
This implementation is used by different qwant engines in the settings.yml::
- name: qwant
categories: general
...
- name: qwant news
categories: news
...
- name: qwant images
categories: images
...
- name: qwant videos
categories: videos
...
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
* [fix] Update about section of Invidious
Another website and new documentation
* [fix] Correct engine name in for Rumble (#11)
* [fix] Wording for Filtron error message (#12)
- Change Libgen provider and use https by default.
- Umcomment Urbandictionary but disable it by default, it is working.
- Uncomment Ebay as it is working correctly.
(For ebay in the future: base_url should be changed from settings.yml just like peertube or invidious)
searx/data/engines_languages.json stores language information for
several searchengines in a json endoded dict that maps engine-"types" to
their supported languages; for instance there is an entry "google",
mapping to the supported languages of the google engine.
However, the lookup code did not use the engine 'type' (as in: the
filename searx/engines/<enginetype>.py), but instead the manually
configured engine name from settings.yml when querying. This is
problematic as soon as users start to specify additional engine
instances with custom names in the config file, as for instance
suggested as a workaround for multilingual search in the manual[0]:
> engines:
> - name : google english
> engine : google
> language : english
Here, the engine name "google english" will be used for the lookup in
the json file, which does not exist. The empty supported_languages then
lead to a type error later in the processing callchain.
This patch changes the behaviour to use the engine's entry-"type"
("google" in the above example) for the lookup. This should fix bug #2928.
0: https://searx.github.io/searx/user/search_syntax.html#multilingual-search