玩蛇网提供最新Python编程技术信息以及Python资源下载!

Pythom Scrapy框架Imagepipeline组件下载gif类型文件处理问题

默认情况下,使用Scrapy的ImagePipeline组件下载图片的时候,不论之前的图片格式是png还是gif,都会被保存成jpeg格式。
通过重写file_path方法,可以将图片以原来的格式和原图片名称进行保存。


重写file_path方法

__author__ = 'Fly'  
#coding:utf-8  
from scrapy.contrib.pipeline.images import ImagesPipeline  
from scrapy.http import Request  
from scrapy.exceptions import DropItem  

class MyImagesPipeline(ImagesPipeline):  
    def file_path(self, request, response=None, info=None):  
        image_guid = request.url.split('/')[-1]  
        return 'full/%s' % (image_guid)  

    def get_media_requests(self, item, info):  
        for image_url in item['image_urls']:  
            yield Request(image_url)  

    def item_completed(self, results, item, info):  
        image_paths = [x['path'] for ok, x in results if ok]  
        if not image_paths:  
            raise DropItem("Item contains no images")  
        return item

运行结果

图片URL:http://www.baidu.com/1.gif
保存到本地:1.gif
但是,当打开1.gif的时候,发现原本动态的图片现在却变成静态的了。
请问,有谁知道怎么处理吗?

试着覆盖convertimage

https://github.com/scrapy/scrapy/blob/master/scrapy/contrib/pipeline/images.py#L87

__author__ = 'Fly'  
#coding:utf-8  
from scrapy.contrib.pipeline.images import ImagesPipeline  
from scrapy.http import Request  
from scrapy.exceptions import DropItem  

class MyImagesPipeline(ImagesPipeline):  
    def file_path(self, request, response=None, info=None):  
        image_guid = request.url.split('/')[-1]  
        return 'full/%s' % (image_guid)  

    def get_media_requests(self, item, info):  
        for image_url in item['image_urls']:  
            yield Request(image_url)  

    def item_completed(self, results, item, info):  
        image_paths = [x['path'] for ok, x in results if ok]  
        if not image_paths:  
            raise DropItem("Item contains no images")  
        return item

    def convert_image(self, image, size=None): 
        buf = StringIO()
        image.save(buf)
        return image, buf

试试,可能会出错,文档上说这个pipeline会:

Convert all downloaded images to a common format (JPG) and mode (RGB)
Avoid re-downloading images which were downloaded recently
Thumbnail generation
Check images width/height to make sure they meet a minimum constraint

玩蛇网文章,转载请注明出处和文章网址:https://www.iplaypy.com/wenda/wd19837.html

相关文章 Recommend

玩蛇网Python互助QQ群,欢迎加入-->: 106381465 玩蛇网Python新手群
修订日期:2017年05月12日 - 15时56分12秒 发布自玩蛇网

您现在的位置: 玩蛇网首页 > Python问题解答 > 正文内容
我要分享到:

必知PYTHON教程 Must Know PYTHON Tutorials

必知PYTHON模块 Must Know PYTHON Modules