Commit Graph

40 Commits

Author SHA1 Message Date
Kian-Meng Ang 629ebb426f
Fix typos (#3366)
Found via `codespell -S ./searx/translations,./searx/data,./searx/static -L ans,te,fo,doubleclick,tthe,dum`
2022-09-29 23:06:59 +02:00
Noémi Ványi 5b50d7455a Do not consent to tracking when using google 2022-08-02 19:22:37 +02:00
Noémi Ványi 05fe2ee093
pick engine fixes (#3306)
* [fix] google engine: results XPath

* [fix] google & youtube - set EU consent cookie

This change the previous bypass method for Google consent using
``ucbcb=1`` (6face215b8) to accept the consent using ``CONSENT=YES+``.

The youtube_noapi and google have a similar API, at least for the consent[1].

Get CONSENT cookie from google reguest::

    curl -i "https://www.google.com/search?q=time&tbm=isch" \
         -A "Mozilla/5.0 (X11; Linux i686; rv:102.0) Gecko/20100101 Firefox/102.0" \
         | grep -i consent
    ...
    location: https://consent.google.com/m?continue=https://www.google.com/search?q%3Dtime%26tbm%3Disch&gl=DE&m=0&pc=irp&uxe=eomtm&hl=en-US&src=1
    set-cookie: CONSENT=PENDING+936; expires=Wed, 24-Jul-2024 11:26:20 GMT; path=/; domain=.google.com; Secure
    ...

PENDING & YES [2]:

  Google change the way for consent about YouTube cookies agreement in EU
  countries. Instead of showing a popup in the website, YouTube redirects the
  user to a new webpage at consent.youtube.com domain ...  Fix for this is to
  put a cookie CONSENT with YES+ value for every YouTube request

[1] https://github.com/iv-org/invidious/pull/2207
[2] https://github.com/TeamNewPipe/NewPipeExtractor/issues/592

Closes: https://github.com/searxng/searxng/issues/1432

* [fix] sjp engine - convert enginename to a latin1 compliance name

The engine name is not only a *name* its also a identifier that is used in
logs, HTTP headers and more.  Unicode characters in the name of an engine could
cause various issues.

Closes: https://github.com/searxng/searxng/issues/1544
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>

* [fix] engine tineye: handle 422 response of not supported img format

Closes: https://github.com/searxng/searxng/issues/1449
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>

* bypass google consent with ucbcb=1

* [mod] Adds Lingva translate engine

Add the lingva engine (which grabs data from google translate).  Results from
Lingva are added to the infobox results.

* openstreetmap engine: return the localized named.

For example: display "Tokyo" instead of "東京都" when the language is English.

* [fix] engines/openstreetmap.py typo: user_langage --> user_language

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>

* Wikidata engine: ignore dummy entities

* Wikidata engine: minor change of the SPARQL request

The engine can be slow especially when the query won't return any answer.
See https://www.mediawiki.org/wiki/Wikidata_Query_Service/User_Manual/MWAPI#Find_articles_in_Wikipedia_speaking_about_cheese_and_see_which_Wikibase_items_they_correspond_to

Co-authored-by: Léon Tiekötter <leon@tiekoetter.com>
Co-authored-by: Emilien Devos <contact@emiliendevos.be>
Co-authored-by: Markus Heiser <markus.heiser@darmarit.de>
Co-authored-by: Emilien Devos <github@emiliendevos.be>
Co-authored-by: ta <alt3753.7@gmail.com>
Co-authored-by: Alexandre Flament <alex@al-f.net>
2022-07-30 21:45:07 +02:00
Markus Heiser cbc50a9bc5 [fix] google-news engine - KeyError: 'hl in request
Since we added

- 1c67b6aec [enh] google engine: supports "default language"

there is a KeyError: 'hl in request,error pattern::

    ERROR:searx.searx.search.processor.online:engine google news : exception : 'hl'
    Traceback (most recent call last):
      File "searx/search/processors/online.py", line 144, in search
        search_results = self._search_basic(query, params)
      File "searx/search/processors/online.py", line 118, in _search_basic
        self.engine.request(query, params)
      File "searx/engines/google_news.py", line 97, in request
        if lang_info['hl'] == 'en':
      KeyError: 'hl'

Closes: https://github.com/searxng/searxng/issues/154
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-07-03 16:53:31 +02:00
Alexandre Flament 3863f5a83f [enh] google engine: supports "default language"
Same behaviour behaviour than Whoogle [1].  Only the google engine with the
"Default language" choice "(all)"" is changed by this patch.

When searching for a locate place, the result are in the expect language,
without missing results [2]:

  > When a language is not specified, the language interpretation is left up to
  > Google to decide how the search results should be delivered.

The query parameters are copied from Whoogle.  With the ``all`` language:

- add parameter ``source=lnt``
- don't use parameter ``lr``
- don't add a ``Accept-Language`` HTTP header.

The new signature of function ``get_lang_info()`` is:

    lang_info = get_lang_info(params, lang_list, custom_aliases, supported_any_language)

Argument ``supported_any_language`` is True for google.py and False for the other
google engines.  With this patch the function now returns:

- query parameters: ``lang_info['params']``
- HTTP headers: ``lang_info['headers']``
- and as before this patch:
  - ``lang_info['subdomain']``
  - ``lang_info['country']``
  - ``lang_info['language']``

[1] https://github.com/benbusby/whoogle-search
[2] https://github.com/benbusby/whoogle-search/releases/tag/v0.5.4
2021-07-03 16:53:31 +02:00
Alexandre Flament ca93a01844 [mod] dynamically set language_support variable
The language_support variable is set to True by default,
and set to False in only 5 engines.

Except the documentation and the /config URL, this variable is not used.

This commit remove the variable definition in the engines, and
set value according to supported_languages length: False when the length is 0,
True otherwise.

Close #2485
2021-02-01 17:10:37 +01:00
Markus Heiser b1fefec40d [fix] normalize the language & region aspects of all google engines
BTW: make the engines ready for search.checker:

- replace eval_xpath by eval_xpath_getindex and eval_xpath_list
- google_images: remove outer try/except block

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-01-28 10:08:46 +01:00
Alexandre Flament 8c46b767d0 [fix] google_news: avoid one HTTP redirect except for the English results
also add
params['soft_max_redirects'] = 1
to avoid false error reporting in /stats/errors
2021-01-24 08:53:35 +01:00
Markus Heiser 5f92dfcdbe [fix] google-news: query uses locale without country tag
Wthout country-region tag google will redirect to correct the contry tag [1]:

    SEARX_DEBUG=1 searx-checker -v "google news"
    ...
    https://news.google.com:443 "GET /search?q=computer&hl=en...      HTTP/1.1" 302 0
    https://news.google.com:443 "GET /search?q=computer&hl=en-US&.... HTTP/1.1" 200 None
    ...

[1] https://github.com/searx/searx/pull/2483#issuecomment-765600849

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-01-23 11:37:14 +01:00
Markus Heiser baec54c492 [fix] revise of the google-news engine
This revise is based on the methods developed in the revise of the google engine
(see commit 410c2f9).

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-01-22 18:49:45 +01:00
Alexandre Flament a4dcfa025c [enh] engines: add about variable
move meta information from comment to the about variable
so the preferences, the documentation can show these information
2021-01-14 20:57:17 +01:00
Alexandre Flament b00d108673 [mod] pylint: numerous minor code fixes 2020-12-01 15:21:19 +01:00
Alexandre Flament 3038052c79 [mod] remove unused import
use
from searx.engines.duckduckgo import _fetch_supported_languages, supported_languages_url  # NOQA
so it is possible to easily remove all unused import using autoflake:
autoflake --in-place --recursive --remove-all-unused-imports searx tests
2020-11-14 14:11:02 +01:00
Dalf 1022228d95 Drop Python 2 (1/n): remove unicode string and url_utils 2020-09-10 10:39:04 +02:00
Markus Heiser c89c05bceb bugfix: google-news and bing-news has changed the language parameter
closes: https://github.com/asciimoo/searx/issues/1838

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2020-02-25 18:44:28 +01:00
Noémi Ványi b63d645a52 Revert "remove 'all' option from search languages"
This reverts commit 4d1770398a.
2019-01-07 21:19:00 +01:00
Marc Abonce Seguin 5568f24d6c [fix] check language aliases when setting search language 2019-01-06 20:31:57 -06:00
Marc Abonce Seguin 835d1edd58 [fix] google news xpath 2018-04-08 20:56:05 -05:00
Marc Abonce Seguin 772c048d01 refactor engine's search language handling
Add match_language function in utils to match any user given
language code with a list of engine's supported languages.

Also add language_aliases dict on each engine to translate
standard language codes into the custom codes used by the engine.
2018-03-27 00:08:03 -06:00
marc 4d1770398a remove 'all' option from search languages 2017-12-06 01:20:15 -06:00
misnyo 3182ba7069 [fix] google news dom xpath fix 2017-08-31 17:48:07 +02:00
Adam Tauber 52e615dede [enh] py3 compatibility 2017-05-15 12:02:30 +02:00
Adam Tauber 108392f8da [fix] skip non-complete google news results 2017-01-10 11:03:05 +01:00
Adam Tauber 8bff42f049 Merge branch 'master' into languages 2016-12-28 20:00:53 +01:00
Adam Tauber 0171db5c3f [fix] handle missing images in google news 2016-12-23 12:59:52 +01:00
marc af35eee10b tests for _fetch_supported_languages in engines
and refactor method to make it testable without making requests
2016-12-15 00:40:21 -06:00
marc f62ce21f50 [mod] fetch supported languages for several engines
utils/fetch_languages.py gets languages supported by each engine and
generates engines_languages.json with each engine's supported language.
2016-12-13 19:58:10 -06:00
marc 149802c569 [enh] add supported_languages on engines and auto-generate languages.py 2016-12-13 19:32:00 -06:00
Noémi Ványi c59c76e6ee add year to time range to engines which support "Last year"
Engines:
 * Bing images
 * Flickr (noapi)
 * Google
 * Google Images
 * Google News
2016-12-11 16:58:31 +01:00
Adam Tauber a2c94895c1 [fix] google news engine change follow-up 2016-12-11 01:03:52 +01:00
Alexandre Flament 4689fe341c update versions.cfg to use the current up-to-date packages 2015-05-02 15:45:17 +02:00
Cqoicebordel b7dc1fb9d5 Google news' unit test 2015-01-31 16:38:03 +01:00
dalf 7c13d630e4 [fix] pep8 : engines (errors E121, E127, E128 and E501 still exist) 2014-12-07 16:37:56 +01:00
Thomas Pointhuber 144f89bf78 add comments to google-engines 2014-09-01 15:10:05 +02:00
Adam Tauber ce08abe223 [fix] remove unused imports ++ pep8 2014-03-18 19:24:01 +01:00
Thomas Pointhuber 337bd6d907 simplify datetime extraction 2014-03-18 13:19:50 +01:00
Adam Tauber b735093564 [fix] pep8 2014-03-15 20:20:41 +01:00
Thomas Pointhuber b88146d669 showing publishedDate for news 2014-03-14 09:55:11 +01:00
Adam Tauber 98b6313d5d [fix] pep8 2014-03-04 14:20:29 +01:00
Thomas Pointhuber 0549f8c7c8 Create google_news.py 2014-03-04 13:11:04 +01:00