Python scrapy爬取163新闻exceptions.NameError问题
items.py
import scrapy
class News163Item(scrapy.Item):
title = scrapy.Field()
url = scrapy.Field()
source = scrapy.Field()
content = scrapy.Field()
news_spider.py
#coding:utf-8
from scrapy.contrib.linkextractors import LinkExtractor
from scrapy.contrib.spiders import CrawlSpider,Rule
class ExampleSpider(CrawlSpider):
name = "news"
allowed_Domains = ["news.163.com"]
start_urls = ['http://news.163.com/']
rules = [
Rule(LinkExtractor(allow=r"/14/12\d+/\d+/*"),
'parse_news')
]
def parse_news(self,response):
news = News163Item()
news['title'] = response.xpath("//*[@id="h1title"]/text()").extract()
news['source'] = response.xpath("//*[@id="ne_article_source"]/text()").extract()
news['content'] = response.xpath("//*[@id="endText"]/text()").extract()
news['url'] = response.url
return news
cd进入所在目录后,命令行执行:
scrapy crawl news -o news163.json
会跳出如下错误:
Traceback (most recent call last):
File "/usr/bin/scrapy", line 9, in <module>
load_entry_point('Scrapy==0.24.4', 'console_scripts', 'scrapy')()
File "/usr/lib/pymodules/python2.7/scrapy/cmdline.py", line 143, in execute
_run_print_help(parser, _run_command, cmd, args, opts)
File "/usr/lib/pymodules/python2.7/scrapy/cmdline.py", line 89, in _run_print_help
func(*a, **kw)
File "/usr/lib/pymodules/python2.7/scrapy/cmdline.py", line 150, in _run_command
cmd.run(args, opts)
File "/usr/lib/pymodules/python2.7/scrapy/commands/crawl.py", line 57, in run
crawler = self.crawler_process.create_crawler()
File "/usr/lib/pymodules/python2.7/scrapy/crawler.py", line 87, in create_crawler
self.crawlers[name] = Crawler(self.settings)
File "/usr/lib/pymodules/python2.7/scrapy/crawler.py", line 25, in __init__
self.spiders = spman_cls.from_crawler(self)
File "/usr/lib/pymodules/python2.7/scrapy/spidermanager.py", line 35, in from_crawler
sm = cls.from_settings(crawler.settings)
File "/usr/lib/pymodules/python2.7/scrapy/spidermanager.py", line 31, in from_settings
return cls(settings.getlist('SPIDER_MODULES'))
File "/usr/lib/pymodules/python2.7/scrapy/spidermanager.py", line 22, in __init__
for module in walk_modules(name):
File "/usr/lib/pymodules/python2.7/scrapy/utils/misc.py", line 68, in walk_modules
submod = import_module(fullpath)
File "/usr/lib/python2.7/importlib/__init__.py", line 37, in import_module
__import__(name)
File "/home/gao/news/news/spiders/news_spider.py", line 15
news['title'] = response.xpath("//*[@id="h1title"]/text()").extract()
^
SyntaxError: invalid syntax
请问是哪里出错了?python新手,scrapy也是最近才用的,很生疏,求指点。
谢谢:@捏造的信仰 的回答,但是更改过之后,还是有错误。
2014-12-02 20:13:02+0800 [news] ERROR: Spider error processing <GET http://lady.163.com/14/1201/09/ACCAN4MB002649P6.html>
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/twisted/internet/base.py", line 824, in runUntilCurrent
call.func(*call.args, **call.kw)
File "/usr/lib/python2.7/dist-packages/twisted/internet/task.py", line 638, in _tick
taskObj._oneWorkUnit()
File "/usr/lib/python2.7/dist-packages/twisted/internet/task.py", line 484, in _oneWorkUnit
result = next(self._iterator)
File "/usr/lib/pymodules/python2.7/scrapy/utils/defer.py", line 57, in <genexpr>
work = (callable(elem, *args, **named) for elem in iterable)
--- <exception caught here> ---
File "/usr/lib/pymodules/python2.7/scrapy/utils/defer.py", line 96, in iter_errback
yield next(it)
File "/usr/lib/pymodules/python2.7/scrapy/contrib/spidermiddleware/offsite.py", line 26, in process_spider_output
for x in result:
File "/usr/lib/pymodules/python2.7/scrapy/contrib/spidermiddleware/referer.py", line 22, in <genexpr>
return (_set_referer(r) for r in result or ())
File "/usr/lib/pymodules/python2.7/scrapy/contrib/spidermiddleware/urllength.py", line 33, in <genexpr>
return (r for r in result or () if _filter(r))
File "/usr/lib/pymodules/python2.7/scrapy/contrib/spidermiddleware/depth.py", line 50, in <genexpr>
return (r for r in result or () if _filter(r))
File "/usr/lib/pymodules/python2.7/scrapy/contrib/spiders/crawl.py", line 67, in _parse_response
cb_res = callback(response, **cb_kwargs) or ()
File "/home/gao/news/news/spiders/news_spider.py", line 14, in parse_news
news = News163Item()
exceptions.NameError: global name 'News163Item' is not defined
请问这又是什么原因呢?
字符串中的引号没有转码导致的语法错误。应该改为
news['title'] = response.xpath("//*[@id=\"h1title\"]/text()").extract()
下面几行也是的。
你需要做的是导包"
from 你的项目名称.items import News163Item
字符串外部使用的是双引号,在双引号内部还需要使用引号的话可以使用单引号。例如
news['title'] = response.xpath("//*[@id='h1title']/text()").extract()
玩蛇网文章,转载请注明出处和文章网址:https://www.iplaypy.com/wenda/wd19022.html
相关文章 Recommend
- • 2019年3月最新消息: Python 3.4.10 现已推出
- • [上海]招Python量化系统开发工程师
- • 优集品网络科技有限公司招Python中/高级工程师
- • 爱因互动科技发展有限公司招募Python开发攻城狮
- • mozio招聘Python/Django工程师
- • Kavout金融科技公司招Python研发工程师
- • Python数组逆向输出,编程练习题实例四十
- • Python数组插入排序,编程练习题实例三十九
- • Python矩阵for循环应用,编程练习题实例三十八
- • Python操作Redis数据库方面的问题
- • 请python高手帮我看看这段python代码中函数setter的
- • Python什么方法可以快速将两个队列变成字典
您现在的位置: 玩蛇网首页 > Python问题解答 > 正文内容
我要分享到:
必知PYTHON教程 Must Know PYTHON Tutorials
- • python 解释器
- • python idle
- • python dir函数
- • python 数据类型
- • python type函数
- • python 字符串
- • python 整型数字
- • python 列表
- • python 元组
- • python 字典
- • python 集合
- • python 变量
- • python print
- • python 函数
- • python 类定义
- • python import
- • python help
- • python open
- • python 异常处理
- • python 注释
- • python continue
- • python pass
- • python return
- • python global
- • python assert
- • python if语句
- • python break
- • python for循环
- • python while循环
- • python else/elif
- • lambda匿名函数
必知PYTHON模块 Must Know PYTHON Modules
- • os 模块
- • sys 模块
- • re 正则表达式
- • time 日期时间
- • pickle 持久化
- • random 随机
- • csv 模块
- • logging 日志
- • socket网络通信
- • json模块
- • urlparse 解析URL
- • urllib 模块
- • urllib2 模块
- • robotparser 解析
- • Cookie 模块
- • smtplib 邮件
- • Base64 编码
- • xmlrpclib客户端
- • string 文本
- • Queue 线程安全
- • math数学计算
- • linecache缓存
- • threading多线程
- • sqlite3数据库
- • gzip压缩解压
最新内容 NEWS
- • django app提供pv信息的方法是什么
- • Django项目版本升级如何操作?
- • django较多数据传递如何优雅的呈现
- • django1.7获取参数问题求助
- • Django1.7使用内置comment遇到问题
- • python mysql数据库做insert操作时报_mysql_ex
- • 关于python mysql的duplicate insert机制的疑问
- • pymongo使用insert函数批量插入被中断要怎么
- • Python程序员解决棘手问题的常用库
- • 求助关于restfull api接口几个问题
图文精华 RECOMMEND
-
django1.7获取参数问题求助
-
Python程序员解决棘手问题的常用库
-
求问str()同__str__原理上有什么不同
-
scrapy框架里面用link extractor怎么能
-
python {}.fromkeys创建字典append添加操
-
python3 类型Type str doesn't support th
热点文章 HOT
- 学习Python有什么好的书籍推荐?
- Python匿名函数 Lambda表达式作用
- Python与Java、C、Ruby、PHP等编程语言有什么
- Python 正则中文网页字符串提取问题
- 如何为实时性应用存取经纬度?django my
- 想用python做个客户端,在二维码登录这个地
- 有让IDE可识别Python函数参数类型的方法吗
- Python字符串转换成列表正则疑问