searx/searx/engines/www500px.py

"""
 500px (Images)

 @website     https://500px.com
 @provide-api yes (https://developers.500px.com/)

 @using-api   no
 @results     HTML
 @stable      no (HTML can change)
 @parse       url, title, thumbnail, img_src, content

 @todo        rewrite to api
"""


from urllib import urlencode
from urlparse import urljoin
from lxml import html
import re
from searx.engines.xpath import extract_text

# engine dependent config
categories = ['images']
paging = True

# search-url
base_url = 'https://500px.com'
search_url = base_url + '/search?search?page={pageno}&type=photos&{query}'


# do search-request
def request(query, params):
    params['url'] = search_url.format(pageno=params['pageno'],
                                      query=urlencode({'q': query}))

    return params


# get response from search-request
def response(resp):
    results = []

    dom = html.fromstring(resp.text)
    regex = re.compile(r'3\.jpg.*$')

    # parse results
    for result in dom.xpath('//div[@class="photo"]'):
        link = result.xpath('.//a')[0]
        url = urljoin(base_url, link.attrib.get('href'))
        title = extract_text(result.xpath('.//div[@class="title"]'))
        thumbnail_src = link.xpath('.//img')[0].attrib.get('src')
        # To have a bigger thumbnail, uncomment the next line
        # thumbnail_src = regex.sub('4.jpg', thumbnail_src)
        content = extract_text(result.xpath('.//div[@class="info"]'))
        img_src = regex.sub('2048.jpg', thumbnail_src)

        # append result
        results.append({'url': url,
                        'title': title,
                        'img_src': img_src,
                        'content': content,
                        'thumbnail_src': thumbnail_src,
                        'template': 'images.html'})

    # return results
    return results
update versions.cfg to use the current up-to-date packages 2015-05-02 15:45:17 +02:00			`"""`
			`500px (Images)`

			`@website https://500px.com`
			`@provide-api yes (https://developers.500px.com/)`

			`@using-api no`
			`@results HTML`
			`@stable no (HTML can change)`
			`@parse url, title, thumbnail, img_src, content`

			`@todo rewrite to api`
			`"""`
Add 500px and Searchcode engines Allow to search for images on 500px. It doesn't use the official API, but the page result. Less stable, but less API key to possess... Two engines were necessary for Searchcode because there are to search mode : search for documentation or search for code example. Both use open APIs. 2014-12-20 07:07:32 +01:00

			`from urllib import urlencode`
			`from urlparse import urljoin`
			`from lxml import html`
Add thumbnails in images results - Modify engines to create/fetch an URL for the thumbnails - Modify themes to show thumbnails instead of full images. In Courgette, the result is not very beautiful. Should we change it ? 2015-01-17 19:21:09 +01:00			`import re`
500px unit test 2015-02-01 13:43:10 +01:00			`from searx.engines.xpath import extract_text`
Add 500px and Searchcode engines Allow to search for images on 500px. It doesn't use the official API, but the page result. Less stable, but less API key to possess... Two engines were necessary for Searchcode because there are to search mode : search for documentation or search for code example. Both use open APIs. 2014-12-20 07:07:32 +01:00
			`# engine dependent config`
			`categories = ['images']`
			`paging = True`

			`# search-url`
			`base_url = 'https://500px.com'`
500px unit test 2015-02-01 13:43:10 +01:00			`search_url = base_url + '/search?search?page={pageno}&type=photos&{query}'`
Add 500px and Searchcode engines Allow to search for images on 500px. It doesn't use the official API, but the page result. Less stable, but less API key to possess... Two engines were necessary for Searchcode because there are to search mode : search for documentation or search for code example. Both use open APIs. 2014-12-20 07:07:32 +01:00

			`# do search-request`
			`def request(query, params):`
			`params['url'] = search_url.format(pageno=params['pageno'],`
			`query=urlencode({'q': query}))`

			`return params`


			`# get response from search-request`
			`def response(resp):`
			`results = []`
Flake8 and Twitter corrections Lots of Flake8 corrections Maybe we should change the rule to allow lines of 120 chars. It seems more usable. Big twitter correction : now it outputs the words in right order... 2014-12-29 21:31:04 +01:00
Add 500px and Searchcode engines Allow to search for images on 500px. It doesn't use the official API, but the page result. Less stable, but less API key to possess... Two engines were necessary for Searchcode because there are to search mode : search for documentation or search for code example. Both use open APIs. 2014-12-20 07:07:32 +01:00			`dom = html.fromstring(resp.text)`
Fix anomalous backslash in string 2016-07-11 15:29:47 +02:00			`regex = re.compile(r'3\.jpg.*$')`
Flake8 and Twitter corrections Lots of Flake8 corrections Maybe we should change the rule to allow lines of 120 chars. It seems more usable. Big twitter correction : now it outputs the words in right order... 2014-12-29 21:31:04 +01:00
Add 500px and Searchcode engines Allow to search for images on 500px. It doesn't use the official API, but the page result. Less stable, but less API key to possess... Two engines were necessary for Searchcode because there are to search mode : search for documentation or search for code example. Both use open APIs. 2014-12-20 07:07:32 +01:00			`# parse results`
			`for result in dom.xpath('//div[@class="photo"]'):`
			`link = result.xpath('.//a')[0]`
			`url = urljoin(base_url, link.attrib.get('href'))`
500px unit test 2015-02-01 13:43:10 +01:00			`title = extract_text(result.xpath('.//div[@class="title"]'))`
			`thumbnail_src = link.xpath('.//img')[0].attrib.get('src')`
Add thumbnails in images results - Modify engines to create/fetch an URL for the thumbnails - Modify themes to show thumbnails instead of full images. In Courgette, the result is not very beautiful. Should we change it ? 2015-01-17 19:21:09 +01:00			`# To have a bigger thumbnail, uncomment the next line`
500px unit test 2015-02-01 13:43:10 +01:00			`# thumbnail_src = regex.sub('4.jpg', thumbnail_src)`
			`content = extract_text(result.xpath('.//div[@class="info"]'))`
Add thumbnails in images results - Modify engines to create/fetch an URL for the thumbnails - Modify themes to show thumbnails instead of full images. In Courgette, the result is not very beautiful. Should we change it ? 2015-01-17 19:21:09 +01:00			`img_src = regex.sub('2048.jpg', thumbnail_src)`
Add 500px and Searchcode engines Allow to search for images on 500px. It doesn't use the official API, but the page result. Less stable, but less API key to possess... Two engines were necessary for Searchcode because there are to search mode : search for documentation or search for code example. Both use open APIs. 2014-12-20 07:07:32 +01:00
			`# append result`
			`results.append({'url': url,`
			`'title': title,`
			`'img_src': img_src,`
			`'content': content,`
Add thumbnails in images results - Modify engines to create/fetch an URL for the thumbnails - Modify themes to show thumbnails instead of full images. In Courgette, the result is not very beautiful. Should we change it ? 2015-01-17 19:21:09 +01:00			`'thumbnail_src': thumbnail_src,`
Add 500px and Searchcode engines Allow to search for images on 500px. It doesn't use the official API, but the page result. Less stable, but less API key to possess... Two engines were necessary for Searchcode because there are to search mode : search for documentation or search for code example. Both use open APIs. 2014-12-20 07:07:32 +01:00			`'template': 'images.html'})`

			`# return results`
			`return results`