settings.yml:
* outgoing.networks:
* can contains network definition
* propertiers: enable_http, verify, http2, max_connections, max_keepalive_connections,
keepalive_expiry, local_addresses, support_ipv4, support_ipv6, proxies, max_redirects, retries
* retries: 0 by default, number of times searx retries to send the HTTP request (using different IP & proxy each time)
* local_addresses can be "192.168.0.1/24" (it supports IPv6)
* support_ipv4 & support_ipv6: both True by default
see https://github.com/searx/searx/pull/1034
* each engine can define a "network" section:
* either a full network description
* either reference an existing network
* all HTTP requests of engine use the same HTTP configuration (it was not the case before, see proxy configuration in master)
Springer Nature is a global publisher dedicated to providing service to research
community [1] with official API [2].
To test this PR, first get your API key following this page:
https://dev.springernature.com/signup
In searx/engines/springer.py at line 24, add this API key. I left my own key,
commented out in the line aboce. Feel free to use it, if needed.
[1] https://www.springernature.com/
[2] https://dev.springernature.com/
In the EU there exists a "General Data Protection Regulation" [1] aka GDPR (BTW:
very user friendly!) which requires consent to tracking. To get the consent
from the user, youtube requests are redirected to confirm and get a CONSENT
Cookie from https://consent.youtube.com
This patch adds a CONSENT Cookie to the youtube request to avoid redirection.
[1] https://en.wikipedia.org/wiki/General_Data_Protection_Regulation
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
Reported-by: https://github.com/searx/searx/issues/2774
I also found some items missing a thumbnail and I used text_extract for content
and title, to remove unneeded whitespaces.
BTW: added bandcamp's favicon
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
fr.wikipedia.org (and it seems not other wikipedia websites),
adds HTML to api_result['displayTitle'].
(Search for '!wp :fr Braid' for example)
The commit uses api_result['title']
The get_cliend_id() function:
* fetches https://soundcloud.com
* then fetches each referenced javascript URL to get the client id.
This commit fetches the javascript URLs in the reverse order: the client id is in the last javascript URL.
Many things have been changed since last review of this engine. This patch fix
xpath selectors, implements suggestion and is a complete review / rewrite of the
engine.
Signed-off-by: Markus Heiser <markus@darmarit.de>
When initing engines a "SearxEngineResponseException" is logged very verbose,
including full traceback information:
ERROR:searx.engines:yggtorrent engine: Fail to initialize
Traceback (most recent call last):
File "share/searx/searx/engines/__init__.py", line 293, in engine_init
init_fn(get_engine_from_settings(engine_name))
File "share/searx/searx/engines/yggtorrent.py", line 42, in init
resp = http_get(url, allow_redirects=False)
File "share/searx/searx/poolrequests.py", line 197, in get
return request('get', url, **kwargs)
File "share/searx/searx/poolrequests.py", line 190, in request
raise_for_httperror(response)
File "share/searx/searx/raise_for_httperror.py", line 60, in raise_for_httperror
raise_for_captcha(resp)
File "share/searx/searx/raise_for_httperror.py", line 43, in raise_for_captcha
raise_for_cloudflare_captcha(resp)
File "share/searx/searx/raise_for_httperror.py", line 30, in raise_for_cloudflare_captcha
raise SearxEngineCaptchaException(message='Cloudflare CAPTCHA', suspended_time=3600 * 24 * 15)
searx.exceptions.SearxEngineCaptchaException: Cloudflare CAPTCHA, suspended_time=1296000
For SearxEngineResponseException this is not needed. Those types of exceptions
can be a normal use case. E.g. for CAPTCHA errors like shown in the example
above. It should be enough to log a warning for such issues:
WARNING:searx.engines:yggtorrent engine: Fail to initialize // Cloudflare CAPTCHA, suspended_time=1296000
closes: #2612
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
The old xpath configuration for google scholar did not work and is replaced by a
python implementation.
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>