Commit Graph

56 Commits

Author SHA1 Message Date
Noémi Ványi 05fe2ee093
pick engine fixes (#3306)
* [fix] google engine: results XPath

* [fix] google & youtube - set EU consent cookie

This change the previous bypass method for Google consent using
``ucbcb=1`` (6face215b8) to accept the consent using ``CONSENT=YES+``.

The youtube_noapi and google have a similar API, at least for the consent[1].

Get CONSENT cookie from google reguest::

    curl -i "https://www.google.com/search?q=time&tbm=isch" \
         -A "Mozilla/5.0 (X11; Linux i686; rv:102.0) Gecko/20100101 Firefox/102.0" \
         | grep -i consent
    ...
    location: https://consent.google.com/m?continue=https://www.google.com/search?q%3Dtime%26tbm%3Disch&gl=DE&m=0&pc=irp&uxe=eomtm&hl=en-US&src=1
    set-cookie: CONSENT=PENDING+936; expires=Wed, 24-Jul-2024 11:26:20 GMT; path=/; domain=.google.com; Secure
    ...

PENDING & YES [2]:

  Google change the way for consent about YouTube cookies agreement in EU
  countries. Instead of showing a popup in the website, YouTube redirects the
  user to a new webpage at consent.youtube.com domain ...  Fix for this is to
  put a cookie CONSENT with YES+ value for every YouTube request

[1] https://github.com/iv-org/invidious/pull/2207
[2] https://github.com/TeamNewPipe/NewPipeExtractor/issues/592

Closes: https://github.com/searxng/searxng/issues/1432

* [fix] sjp engine - convert enginename to a latin1 compliance name

The engine name is not only a *name* its also a identifier that is used in
logs, HTTP headers and more.  Unicode characters in the name of an engine could
cause various issues.

Closes: https://github.com/searxng/searxng/issues/1544
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>

* [fix] engine tineye: handle 422 response of not supported img format

Closes: https://github.com/searxng/searxng/issues/1449
Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>

* bypass google consent with ucbcb=1

* [mod] Adds Lingva translate engine

Add the lingva engine (which grabs data from google translate).  Results from
Lingva are added to the infobox results.

* openstreetmap engine: return the localized named.

For example: display "Tokyo" instead of "東京都" when the language is English.

* [fix] engines/openstreetmap.py typo: user_langage --> user_language

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>

* Wikidata engine: ignore dummy entities

* Wikidata engine: minor change of the SPARQL request

The engine can be slow especially when the query won't return any answer.
See https://www.mediawiki.org/wiki/Wikidata_Query_Service/User_Manual/MWAPI#Find_articles_in_Wikipedia_speaking_about_cheese_and_see_which_Wikibase_items_they_correspond_to

Co-authored-by: Léon Tiekötter <leon@tiekoetter.com>
Co-authored-by: Emilien Devos <contact@emiliendevos.be>
Co-authored-by: Markus Heiser <markus.heiser@darmarit.de>
Co-authored-by: Emilien Devos <github@emiliendevos.be>
Co-authored-by: ta <alt3753.7@gmail.com>
Co-authored-by: Alexandre Flament <alex@al-f.net>
2022-07-30 21:45:07 +02:00
dependabot[bot] e271d6d1e1 Bump pylint from 2.9.6 to 2.10.2
Bumps [pylint](https://github.com/PyCQA/pylint) from 2.9.6 to 2.10.2.
- [Release notes](https://github.com/PyCQA/pylint/releases)
- [Changelog](https://github.com/PyCQA/pylint/blob/main/ChangeLog)
- [Commits](https://github.com/PyCQA/pylint/compare/v2.9.6...v2.10.2)

---
updated-dependencies:
- dependency-name: pylint
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2021-08-22 20:53:12 +02:00
Émilien Devos ee443d9739 Fix google images
Proposed fix in https://github.com/searx/searx/pull/2115#issuecomment-876716010

Closes #2914
2021-08-02 20:14:54 +02:00
Alexandre Flament 3863f5a83f [enh] google engine: supports "default language"
Same behaviour behaviour than Whoogle [1].  Only the google engine with the
"Default language" choice "(all)"" is changed by this patch.

When searching for a locate place, the result are in the expect language,
without missing results [2]:

  > When a language is not specified, the language interpretation is left up to
  > Google to decide how the search results should be delivered.

The query parameters are copied from Whoogle.  With the ``all`` language:

- add parameter ``source=lnt``
- don't use parameter ``lr``
- don't add a ``Accept-Language`` HTTP header.

The new signature of function ``get_lang_info()`` is:

    lang_info = get_lang_info(params, lang_list, custom_aliases, supported_any_language)

Argument ``supported_any_language`` is True for google.py and False for the other
google engines.  With this patch the function now returns:

- query parameters: ``lang_info['params']``
- HTTP headers: ``lang_info['headers']``
- and as before this patch:
  - ``lang_info['subdomain']``
  - ``lang_info['country']``
  - ``lang_info['language']``

[1] https://github.com/benbusby/whoogle-search
[2] https://github.com/benbusby/whoogle-search/releases/tag/v0.5.4
2021-07-03 16:53:31 +02:00
Alexandre Flament ca93a01844 [mod] dynamically set language_support variable
The language_support variable is set to True by default,
and set to False in only 5 engines.

Except the documentation and the /config URL, this variable is not used.

This commit remove the variable definition in the engines, and
set value according to supported_languages length: False when the length is 0,
True otherwise.

Close #2485
2021-02-01 17:10:37 +01:00
Markus Heiser b1fefec40d [fix] normalize the language & region aspects of all google engines
BTW: make the engines ready for search.checker:

- replace eval_xpath by eval_xpath_getindex and eval_xpath_list
- google_images: remove outer try/except block

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-01-28 10:08:46 +01:00
Markus Heiser baec54c492 [fix] revise of the google-news engine
This revise is based on the methods developed in the revise of the google engine
(see commit 410c2f9).

Signed-off-by: Markus Heiser <markus.heiser@darmarit.de>
2021-01-22 18:49:45 +01:00
Alexandre Flament a4dcfa025c [enh] engines: add about variable
move meta information from comment to the about variable
so the preferences, the documentation can show these information
2021-01-14 20:57:17 +01:00
Alexandre Flament 64cccae99e [mod] various engines: use eval_xpath* functions and searx.exceptions.*
Engine list: ahmia, duckduckgo_images, elasticsearch, google, google_images, google_videos, youtube_api
2020-12-03 10:22:48 +01:00
Alexandre Flament b00d108673 [mod] pylint: numerous minor code fixes 2020-12-01 15:21:19 +01:00
Alexandre Flament 3038052c79 [mod] remove unused import
use
from searx.engines.duckduckgo import _fetch_supported_languages, supported_languages_url  # NOQA
so it is possible to easily remove all unused import using autoflake:
autoflake --in-place --recursive --remove-all-unused-imports searx tests
2020-11-14 14:11:02 +01:00
Alexandre Flament 2006eb4680 [mod] move extract_text, extract_url to searx.utils 2020-10-02 18:13:56 +02:00
Dalf 1022228d95 Drop Python 2 (1/n): remove unicode string and url_utils 2020-09-10 10:39:04 +02:00
Vlad f678388dbc
Fix google images 'get image' button bug from issue #2103 (#2115)
Closes #2103
2020-08-08 19:35:22 +02:00
Adam Tauber 52eba0c721 [fix] pep8 2020-07-08 00:46:03 +02:00
Markus Heiser 16f8ec894a [fix] revise google images engine
this commit is picked from #1985
2020-07-07 21:59:15 +02:00
Marc Abonce Seguin bb4d223770 [fix] google images 2019-08-26 21:54:01 -07:00
Frank de Lange 4b7332286a Use string formatter to create source and img_format labels (#1566)
google_images :  use JSON embedded in HTML (engine expected pure JSON)
2019-05-28 12:33:31 +09:00
Nick Espig 1c6ab79b9f
Fix google image search
- Because there is not full image url in the dom, we replace "image_url" with the same url as the "url" (url of source).
  See example HTML https://gist.github.com/Nachtalb/2dea8a4d2c723c49226ad9645838121f
- Remove unused import
- Fix google image search title
- Keep google image safe value up to date
2019-04-14 12:03:25 +02:00
Adam Tauber 57e7e9da98 [fix] use html result page in google images (previous endpoint stopped working) 2018-06-14 11:40:39 +02:00
Noémi Ványi c361811cb5 [fix] fix xpath of google images 2017-06-13 19:47:56 +02:00
Adam Tauber 52e615dede [enh] py3 compatibility 2017-05-15 12:02:30 +02:00
Noémi Ványi c59c76e6ee add year to time range to engines which support "Last year"
Engines:
 * Bing images
 * Flickr (noapi)
 * Google
 * Google Images
 * Google News
2016-12-11 16:58:31 +01:00
Adam Tauber eb57481450 [fix] google images paging - closes #571 2016-08-13 01:13:41 +02:00
Adam Tauber 350a84520d [fix] time range detection 2016-07-26 00:28:48 +02:00
Noemi Vanyi a7c8d5882c fix pep8 2016-07-25 23:28:14 +02:00
Noemi Vanyi e9a78f1434 add time range search for google images 2016-07-25 23:28:14 +02:00
Adam Tauber 9331fc28a8 [fix] broken google images parsing 2016-04-07 08:07:17 +02:00
Adam Tauber 029291eca1 [fix] remove debug message 2015-12-22 20:00:31 +01:00
Adam Tauber 8b155f78a5 [doc] correct google images docstring 2015-12-09 01:23:05 +01:00
Adam Tauber 439cf0559a [fix] replace the dead google images ajax api with a working one 2015-12-09 01:20:46 +01:00
Adam Tauber 93fd1e4c76 Merge pull request #308 from dalf/versions_upgrade
update versions.cfg to use the current up-to-date packages
2015-05-02 14:58:32 -04:00
Alexandre Flament 4689fe341c update versions.cfg to use the current up-to-date packages 2015-05-02 15:45:17 +02:00
Alexandre Flament 78edc16e66 [enh] reduce the number of http outgoing connections.
engines that still use http : gigablast, bing image for thumbnails, 1x and dbpedia autocompleter
2015-05-02 11:43:12 +02:00
Thomas Pointhuber 7ac6361b51 [enh] set google safesearch filter more restictive 2015-02-08 22:29:26 +01:00
Thomas Pointhuber 10666fd7c0 [enh] add safesearch to google_images 2015-02-08 22:15:25 +01:00
Cqoicebordel d5b8005ee1 Google images' unit test 2015-01-31 16:16:30 +01:00
Cqoicebordel 2c15546518 Tiny forgots 2015-01-17 19:28:11 +01:00
Cqoicebordel cb4a3fe598 Add thumbnails in images results
- Modify engines to create/fetch an URL for the thumbnails
- Modify themes to show thumbnails instead of full images.

In Courgette, the result is not very beautiful. Should we change it ?
2015-01-17 19:21:09 +01:00
Thomas Pointhuber a508d540ac [fix] pep8 2014-12-16 17:26:16 +01:00
Cqoicebordel b973081134 [fix] Google image with special chars
It seems like Google image is doing a double urlencode on the url of the images. So we need to unquote once before sending to the browser the urls.
It solves the 404 we could see with some image with specials chars in url. 
Exemple https://searx.laquadrature.net/?q=etes&pageno=1&category_images (there are two of those in the list)
2014-12-08 21:12:50 +01:00
dalf 7c13d630e4 [fix] pep8 : engines (errors E121, E127, E128 and E501 still exist) 2014-12-07 16:37:56 +01:00
Thomas Pointhuber 144f89bf78 add comments to google-engines 2014-09-01 15:10:05 +02:00
Thomas Pointhuber 6450082987 little fix for google images engine 2014-08-31 23:00:54 +02:00
asciimoo 8b4d445c42 [enh] paging support for google images 2014-01-30 01:21:33 +01:00
asciimoo b2492c94f4 [fix] pep/flake8 compatibility 2014-01-20 02:31:20 +01:00
Dalf e88cf0a0a8 [mod] minor fixes (duckduck_definitions : if a ddg bang is in the query, avoid a useless redirect) 2014-01-05 13:50:17 +01:00
asciimoo f4fdb1e756 [fix] url encoding 2013-10-25 11:20:46 +02:00
asciimoo 74b6be3991 [enh] engine cfg compatibilty 2013-10-23 23:55:37 +02:00
asciimoo 0d6368a092 [fix] skipping empty urls 2013-10-22 23:35:17 +02:00