Order A Json By Field Using Scrapy
Solution 1:
If I needed my output file to be sorted (I will assume you have a valid reason to want this), I'd probably write a custom exporter.
This is how Scrapy's built-in JsonItemExporter
is implemented.
With a few simple changes, you can modify it to add the items to a list in export_item()
, and then sort the items and write out the file in finish_exporting()
.
Since you're only scraping a few hundred items, the downsides of storing a list of them and not writing to a file until the crawl is done shouldn't be a problem to you.
Solution 2:
By now I've found a working solution using pipeline:
import json
classJsonWriterPipeline(object):
defopen_spider(self, spider):
self.list_items = []
self.file = open('euler.json', 'w')
defclose_spider(self, spider):
ordered_list = [Nonefor i inrange(len(self.list_items))]
self.file.write("[\n")
for i in self.list_items:
ordered_list[int(i['id']-1)] = json.dumps(dict(i))
for i in ordered_list:
self.file.write(str(i)+",\n")
self.file.write("]\n")
self.file.close()
defprocess_item(self, item, spider):
self.list_items.append(item)
return item
Though it may be non optimal, because the guide suggests in another example:
The purpose of JsonWriterPipeline is just to introduce how to write item pipelines. If you really want to store all scraped items into a JSON file you should use the Feed exports.
Post a Comment for "Order A Json By Field Using Scrapy"