下午好,有json
var model = {"ALLSKUS":["84664020","07961015","84664113","84664116"],"NBR":"137127","PRICERANGE":"$186.99 - $189.99","GENDER_AGE":"Men's","PRICEADJUSTDATE":"","AVAILABLE_SIZES":[" 07.5"," 08.0"," 08.5"," 09.0"," 09.5"," 10.0"," 10.5"," 11.0"," 11.5"," 12.0"," 12.5"," 13.0","14.0","15.0"],"DISCOUNT_PERCENT":"15","isFieldTestable":false,"SORT":"152","HASCUSTOMPRODUCTTEMPLATE":false,"PR_LIST":"224.99","SPORTS":[{"ID":"3","NM":"Basketball"},{"ID":"39","NM":"Casual"}],"SIZECHART_CD":"S0584","HASSIZES":true,"PR_SALE":"189.99","LOCALIZATION":{},"MODELTEMPLATE":{"ISMODELTEMPLATEACTIVE":"N","MODELTEMPLATE_IMAGE":""},"ISCUSTOMPRODUCT":false,"INTRODUCTIONDATE":"","SKU":"84664020","ISINTANGIBLE":false,"PROD_TP":"Shoes","CUSTPROD_CD":"","NM":"Jordan Retro 6 - Men's","REVIEWS":
我在找它
"AVAILABLE_SIZES":[" 07.5"," 08.0"," 08.5"," 09.0"," 09.5"," 10.0"," 10.5"," 11.0"," 11.5"," 12.0"," 12.5"," 13.0"," 14.0"," 15.0"]
然后我清理一切
输出应该是一个 table.csv
|размер|размер|размер|размер|размер|размер|размер|размер|размер|размер| |07.0|07.5|08.0|10.0|10.5|11.5|12.0"|13.0|14.0|15|我用csv写
我正在通过正则表达式寻找数据:
ad = requests.get('http://www.footlocker.com/product/model:132512/sku:A1781919/timberland-roll-top-mens/tan/tan/').text #сылка для примера
bb = re.findall(r'"AVAILABLE_SIZES":(.*)"DISCOUNT_PERCENT"', ad)
out: ['[" 07.0"," 07.5"," 08.0"," 10.0"," 10.5"," 11.5"," 12.0"," 13.0"," 14.0"," 15.0"],']
然后我删除他们的额外数据
现在如何去除多余的?在更换发誓。输出中是否有错误,不正确的 json 输出中有空格?
out:
bb = re.findall(r'"AVAILABLE_SIZES":(.*)],"DISCOUNT_PERCENT"', ad).str(var).replace('[', ' ')
AttributeError: 'list' object has no attribute 'str'
更新
bb_strings = re.findall(r'var model = ({.*})', ad)
bp = {}
if bb_strings:
bp = json.loads(bb_strings[0])
out: {'ALLSKUS': ['A1781919', '6635A001', '6634A'], 'NBR': '132512', 'PRICERANGE': '$99.99 - $125.99', 'GENDER_AGE': "Men's", 'PRICEADJUSTDATE': '', 'AVAILABLE_SIZES': [' 07.0', ' 07.5', ' 08.0', ' 10.0', ' 10.5', ' 11.5', ' 12.0', ' 13.0', ' 14.0', ' 15.0'], 'DISCOUNT_PERCENT': '10', 'isFieldTestable': False, 'SORT': '1036', 'HASCUSTOMPRODUCTTEMPLATE': False, 'PR_LIST': '139.99', 'SPORTS': [{'ID': '31', 'NM': 'Snow'}, {'ID': '39', 'NM': 'Casual'}], 'SIZECHART_CD': 'S0629', 'HASSIZES': True, 'PR_SALE': '125.99', 'LOCALIZATION': {}, 'MODELTEMPLATE': {'ISMODELTEMPLATEACTIVE': 'N', 'MODELTEMPLATE_IMAGE': ''}, 'ISCUSTOMPRODUCT': False, 'INTRODUCTIONDATE': '', 'SKU': '6635A001', 'ISINTANGIBLE': False, 'PROD_TP': 'Shoes', 'CUSTPROD_CD': '', 'NM': "Timberland Roll-Top - Men's", 'REVIEWS': {'HASREVIEWS': True, 'TOTALREVIEWCOUNT': '17', 'WEIGHTEDAVERAGERATING': '4.82', 'WEIGHTEDAVERAGERECOMMENDED': '16'}, 'BRAND': 'Timberland', 'INET_COPY': 'A style unlike any other. The Timberland Roll Top Boot rolls down for a little built-in air conditioning and a whole lotta style. Premium, full-grain leather upper provides comfort, durability and abrasion resistance. Direct-attach seam construction promises lasting durability. Padded collar provides a comfortable fit around the ankle and keeps out debris. Rubber lug sole for traction and durability. Embossed Timberland tree logo on the side.'}
for bl in bp['AVAILABLE_SIZES']:
footlocker.append(('размер', bl))
所有规则都有效,现在如何做到所有数据都以 csv 格式写入而不是第一个值?
现在你需要得到你需要的数据
re.findall()返回一个列表(类型list)。该列表没有str(). 除了(可变)序列的通用方法外,list 仅提供.sort()以交互方式,在 REPL(ptpython、ipython)中尝试:查看
re.findall()返回的内容,列表的自动完成显示的方法。完整的方法列表可以在输出中看到help(list)。请参阅如何使用 python3.x 从 html 页面内的 Javascript 代码中指定的 json 字符串获取信息?