RError.com

RError.com Logo RError.com Logo

RError.com Navigation

  • 主页

Mobile menu

Close
  • 主页
  • 系统&网络
    • 热门问题
    • 最新问题
    • 标签
  • Ubuntu
    • 热门问题
    • 最新问题
    • 标签
  • 帮助
主页 / user-265568

максим ильин's questions

Martin Hope
максим ильин
Asked: 2022-07-03 19:18:51 +0000 UTC

如何从列中的当前值中找到最近的上相反值的索引

  • 1
import pandas as pd
import numpy as np

df=pd.DataFrame.from_dict({'Color':['black','black','white','white']},orient='index').transpose()
df['serch_index']=np.nan

对于第一个值'white'最接近的相反第二个值'black',对于2值'white'最接近的相反第二个值'black',您需要为单元格返回第二个值'black'的索引第一个值 'white' 和第二个值 'white' '。

动作算法如下:1)循环遍历列2)扫描当前值及其索引3)从索引中减去当前值与所有相反值索引的值4)找到最小值与 3) 的区别。5) 将结果写入当前值的地址

从查找最接近顶部的索引的任务来看,我认为您可以使用 iloc[:value] 参数

代码和伪代码:

df['serch_index']=[
                   for x in df.loc[df.iloc[:item.index,0],'Color']
                       if bool(re.match(r'black', x.values))==True #3) действие
                               min([item.index-x.index])         #4) действие
                   
                   if bool(re.match(r'white', item.values))==True #2)действие 
                   else 
                        for y in df.loc[df.iloc[:item.index,0],'Color']
                             if bool(re.match(r'white', y.values))==True  #3)действие
                                  min([item.index-y.index])             #4) действие
                   
                   for item in df['Color'] #1)действие
                   ] #5 действие

print(df['serch_index'])

如何正确组织伪代码算法;例如:item.values、item.index、x.index、min()

python pandas
  • 1 个回答
  • 50 Views
Martin Hope
максим ильин
Asked: 2022-06-28 05:51:24 +0000 UTC

如何找到与两个列表比较的索引匹配项?

  • 0

有df索引:

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99]

是的,由 df 形成的列表:

l_figure_s= [5,12,19,24,29,66,72,   76, 84, 91]

是的,由 df 形成的列表:

l_figure_e= [11,15,22,27,62,69,75,82,   87, 99]

由列表组成

df1 = pd.DataFrame.from_dict({'starts':l_figure_s,'ends':l_figure_e},dtype=int,orient='index').transpose()

如何在 pandas 中编写 df 索引与 df1 中的 'starts' 和 'ends' 列的值的比较,根据以下条件:

df.index >= df1['starts'] & df.index <= df1['ends'] &  df1['starts'] < df1['ends'] & df1['starts'].shift(-1) > df1['ends']

结果,您得到 df ,其中将有按索引排列的行(5,6,7,8,9,10,11) (12,13,14,15),依此类推..

python pandas
  • 1 个回答
  • 57 Views
Martin Hope
максим ильин
Asked: 2022-04-27 23:12:32 +0000 UTC

如何设置范围并传递数据帧索引?

  • 0

是的,从dataframe获得的索引值列表:list_data = [5,10]

有必要从 list_data 列表的每个元素形成这样的范围,对于 '5' 的值:2-6,对于 '10':7-11,其中开头是 -3 到列表元素,并为结尾 +1 到列表元素的值。

结果范围被传递给数据框,其中范围的每个元素都是该数据框的索引,以填充数据框属性。

伪示例:

df
list_data = [5,10]
for i in list_data:
 df['att1'].loc[list_data(i).shift(-3) between list_data(i).shift(1)]=='Key_val'
python
  • 1 个回答
  • 10 Views
Martin Hope
максим ильин
Asked: 2022-04-16 00:01:00 +0000 UTC

从字符串中选择一个或多个唯一的数字序列

  • 0
declare
p_dbs varchar(1000):='MK~RU_AR~ZEN~ADFORM_CP~3301071970_CI~FY22QO_OB~1X1_TS~APV-123895_VV~NONE_PI~123895_ID~GLD0009ADF__CV~PPV-123895_FF~CPM';
type values_t is table of varchar(150);
    t_values     values_t := values_t();
    l_char                  varchar2(1 char);
    l_buffer                varchar2(150 char);
    l_sp    boolean :=false;
    l_ep    boolean :=false;
begin
    for i in 1..length(p_dbs)
    loop
        l_char := substr(p_dbs, i, 1);
        if nvl(length(replace(translate(l_char, '0123456789', rpad(chr(1), 10, chr(1))), chr(1))), 0) = 0 and l_sp=false  and l_ep=false
        then -- char is number
            l_buffer := l_buffer || l_char;
            l_sp := true;
            end if;
        
        if nvl(length(replace(translate(l_char, '0123456789', rpad(chr(1), 10, chr(1))), chr(1))), 0) = 0 and l_sp=true  and l_ep=false 
        then l_buffer := l_buffer || l_char;
        end if;
        
        if nvl(length(replace(translate(l_char, '0123456789', rpad(chr(1), 10, chr(1))), chr(1))), 0) > 0 and l_sp=true  and l_ep=false
        then 
             if --length(l_buffer)=5 or length(l_buffer)=6
              not l_buffer  MEMBER OF t_values then 
             t_values.EXTEND;
                 t_values(t_values.COUNT) := l_buffer;
                 dbms_output.put_line(l_buffer);
                 l_buffer := '';
                 l_sp := false;
                 l_ep := true;
           end if;
        end if;
        
        if nvl(length(replace(translate(l_char, '0123456789', rpad(chr(1), 10, chr(1))), chr(1))), 0) = 0 and l_sp=false  and l_ep=true 
        then 
            l_buffer := l_buffer || l_char;
            l_sp := true;
            l_ep := false;
        end if;
        end loop;
end;

无法理解的行为,显示的值不是 1123895:

33301071970
22
1
1123895
123895
0009

如果您删除对成员的检查,它会显示正确的数据:

33301071970
22
1
1
123895
123895
0009
123895

如果检查字符数,则结果集为空。

如何实现这样的结果,即只有一个值会写入类型:

123895
oracle
  • 1 个回答
  • 10 Views
Martin Hope
максим ильин
Asked: 2022-04-06 22:55:22 +0000 UTC

如何更改.csv?

  • 2

适用于读取,但不会更改文件中的值:

import csv

directory="C:\\Users\\pyth\\output_3.csv"

with open(directory, 'r+') as f:
    d_reader = csv.DictReader(f,delimiter='\t')
    for row in d_reader:
        if float(row['Open']) > float(row['Close']):
            row['CLINE_TYPE'] = '1_MIN'
        else : row["CLINE_TYPE"] = '1_MAX'

如何在 Pandas 中编写嵌套分支语句?

目的:找出相差 5 分钟的行,前提是第一行打开 > 关闭,第二行打开 < 关闭或第一行打开 < 关闭,第二行打开 > 关闭。

现实生活中的例子:去年哪两天昨天早上的温度高于昨天晚上的温度,第二天晚上的温度又高于第二天早上的温度,反之亦然。

逻辑:从输入数据中,需要选择满足以下条件的一对行: 1)“OPEN TIME”字段中的行之间相差5分钟(即一行中的下一行) ) 2) 如果在第一行中“OPEN”字段中的值严格大于“Close”字段中的值,则在第二行中,OPEN 严格小于 CLOSE,反之亦然。如果满足条件,则在“CLOSE”>“OPEN”的情况下将时间值写入date_cline_up变量,在“CLOSE”<“OPEN”的情况下写入date_cline_dow,并写入文件。接下来,循环查找满足条件的第二对字符串。如果不满足条件,则 date_cline_up 或 date_cline_dow 的值在 Nan 中被覆盖,具体取决于哪一行是该对中的第一个或随后的每个对中的哪一行。从输入数据中,获得满足条件的以下行:第 0-1 行将包含字段 CLINE_TYPE -'1_MAX' 中的铭文,第 2-3 行将包含 1_MAX。“完整条件”中的代码完全涵盖了所描述的逻辑。

输入数据 :

    Open Time   Close time  Open    Close       STATUS  CLINE_TYPE
0   2021-11-06 13:25:00 2021-11-06 13:29:59.999000064   60534.13    60509.9         
1   2021-11-06 13:30:00 2021-11-06 13:34:59.999000064   60509.89    60570.01        
2   2021-11-06 13:35:00 2021-11-06 13:39:59.999000064   60570.01    60469.34        
3   2021-11-06 13:40:00 2021-11-06 13:44:59.999000064   60469.34    60546.6

满状态:

date_cline_up=None
date_cline_dow=None

for row in d_reader:
    curent_date = datetime.strptime(row["Open Time"], '%Y-%m-%d %H:%M:%S')
    if date_cline_up is None and date_cline_dow is None:
        if float(row["Open"]) > float(row["Close"]):
            date_cline_dow = datetime.strptime(row["Open Time"], '%Y-%m-%d %H:%M:%S')
        elif float(row["Open"]) < float(row["Close"]):
            date_cline_up = datetime.strptime(row["Open Time"], '%Y-%m-%d %H:%M:%S')
    elif date_cline_up is None and date_cline_dow is not None:
        if (((curent_date - date_cline_dow).total_seconds()) % 3600 // 60) == 5 and float(row["Open"]) < float(row["Close"]):
            row["CLINE_TYPE"] = '1_MIN'
        else:
            date_cline_dow = None
    elif date_cline_up is not None and date_cline_dow is None:
        if (((curent_date - date_cline_up).total_seconds()) % 3600 // 60) == 5 and float(row["Open"]) > float(row["Close"]):
            row["CLINE_TYPE"] = '1_MAX'
        else:
            date_cline_up = None
python
  • 1 个回答
  • 10 Views
Martin Hope
максим ильин
Asked: 2022-09-02 23:59:32 +0000 UTC

DBMS_MVIEW.REFRESH 过程的 LIST 参数中对象的顺序是否重要?

  • 3
dbms_mview.REFRESH(
    LIST=>'MV_CREATIVES_MIX_WITH_NAME_DCO,MV_CREATIVES_STATISTICS_MIX_DCO,
           MV_AUTOCHECKER_UPD,MV_AUTOLINKING_DCO'
           /*, atomic_refresh => false, out_of_place => true*/);

在LIST对象参数中:

MV_CREATIVES_MIX_WITH_NAME_DCO,MV_CREATIVES_STATISTICS_MIX_DCO

是对象的数据源:

MV_AUTOCHECKER_UPD,MV_AUTOLINKING_DCO

在这种情况下,对象会按照给定的顺序更新吗?

还是应该在单独的 JOB 中更新数据源以保持执行顺序?

sql
  • 1 个回答
  • 10 Views
Martin Hope
максим ильин
Asked: 2022-07-15 00:13:11 +0000 UTC

变量的值用引号括起来,怎么去掉?

  • 2

此代码有效:

declare
--sql_stmt clob;
p_file_name varchar2(100);
TYPE c_crn IS TABLE OF VARCHAR2(1000);
l_crn_tab c_crn;
begin
p_file_name:='DCO_CRN_TEST.xlsx';
--sql_stmt:='select STRING_VAL bulk collect into l_crn_tab from TABLE( TABLEAU.as_read_xlsx.READ( TABLEAU.as_read_xlsx.file2blob(''EXRATE'','''||p_file_name||''') ) )';
select STRING_VAL bulk collect into l_crn_tab from TABLE( TABLEAU.as_read_xlsx.READ( TABLEAU.as_read_xlsx.file2blob('EXRATE',p_file_name) ) );
 --EXECUTE IMMEDIATE stmt;
  end;

如果它是动态执行的,它就不起作用:

declare
sql_stmt clob;
p_file_name varchar2(100);
TYPE c_crn IS TABLE OF VARCHAR2(1000);
l_crn_tab c_crn;
begin
p_file_name:='DCO_CRN_TEST.xlsx';
sql_stmt:='select STRING_VAL bulk collect into l_crn_tab from TABLE( TABLEAU.as_read_xlsx.READ( TABLEAU.as_read_xlsx.file2blob(''EXRATE'','||p_file_name||')))';
--select STRING_VAL bulk collect into l_crn_tab from TABLE( TABLEAU.as_read_xlsx.READ( TABLEAU.as_read_xlsx.file2blob('EXRATE',p_file_name) ) );
 EXECUTE IMMEDIATE sql_stmt;
  end;

错误:错误报告 -
ORA-00904:“DCO_CRN_TEST”。“XLSX”:无效标识符
ORA-06512:在第 10 00904 行
。00000 -“%s:无效标识符”

如何让操作员lob.open看到文件名 DCO_CRN_TEST.XLSX?

sql_stmt:=q'[select STRING_VAL bulk collect into l_crn_tab from TABLE( TABLEAU.as_read_xlsx.READ( TABLEAU.as_read_xlsx.file2blob('EXRATE','||p_file_name||')))]';

ORA-03001:功能未实现
ORA-06512:在线 10
03001. 00000 - “未实现的功能”
*原因:此功能未实现。

这条线上的 ORA-06512 在哪里:

dbms_lob.open( file_lob, dbms_lob.file_readonly );
sql
  • 1 个回答
  • 10 Views
Martin Hope
максим ильин
Asked: 2022-06-17 07:34:50 +0000 UTC

为什么会话没有关闭,而是在结果数据集输出后继续工作?

  • 3

有一张大桌子。
在此表上的每个查询之后,会快速显示 50 行的结果集,但会话继续工作。如果您运行类似的查询,则会出现一个新的 SID,并且每个人都在 EVENT 'send bkld' 中。

如何调用SELECT大表,使得在显示N行结果集后关闭会话,即'bkld'中的表的加载不会继续?

sql
  • 1 个回答
  • 10 Views
Martin Hope
максим ильин
Asked: 2022-05-26 23:57:53 +0000 UTC

如何实现集合差分操作,rowid集合的值在哪里?

  • 2

如何从第一个集合中得到一组不在第二个集合中的值?

第一组:l_id_tab

第二套:RESOLUTION_T

错误:

PLS-00306:访问“MULTISET_EXCEPT_ALL”时参数的数量或类型错误

declare 
    TYPE RESOLUTION_COLL IS TABLE OF rowid;
    RESOLUTION_T RESOLUTION_COLL:=RESOLUTION_COLL();
    TYPE t_id_tab IS TABLE OF rowid;
    l_id_tab t_id_tab:=t_id_tab();
    TYPE nested_typ IS TABLE OF rowid;
    answer nested_typ; 
begin
    select rowid 
    BULK COLLECT INTO l_id_tab
    from MONITORING;
select s.rowid 
        BULK COLLECT INTO  RESOLUTION_T                  
        from RESEARCH_METRICS_NONAUD_pr ss                          
        join RESEARCH_MEDIA_SOURCE source_dict on source_dict.id=ss.media_id                            
        join RESEARCH_MS_CLIP_DICT cd on cd.media_id=ss.media_id and cd.clip_id=ss.CLIP_ID                          
        join RESEARCH_MS_CLIP_TYPE ct on ct.id=cd.clip_type and ct.media_id=cd.media_id                         
        join research_period pers on pers.prid=ss.period                            
        join MONITORING s on s.action_date=pers.start_date                          
        left join DE_MART.UNICOMPET_MS_CLIP_ATTR_DICT atd on atd.clip_id=cd.clip_id and 
        atd.media_id=cd.media_id and atd.BUSINESS_CATEGORY_ID=21                            
        group by s.rowid;
    answer := l_id_tab MULTISET EXCEPT RESOLUTION_T;
    for iter in answer.first .. answer.last
    loop
    dbms_output.put_line(answer(iter));
    end loop;
end;

existsPS收集方法也有类似的错误。

oracle
  • 1 个回答
  • 10 Views
Martin Hope
максим ильин
Asked: 2022-02-12 17:47:12 +0000 UTC

如何从时间戳中提取小时,以便小时 00 分钟保持时间格式?

  • -1

有:2020-06-13 14:18:05 必须:14:00

postgresql
  • 1 个回答
  • 10 Views
Martin Hope
максим ильин
Asked: 2020-09-11 18:45:51 +0000 UTC

在 IN 类型的参数函数定义中初始化

  • 3

我经常在示例中看到类型参数的分配IN:

Create or replace function atr (sode in number, code in number :=1000) return varchar2

为什么需要在函数中分配一个已知值?

下面是分支运算符:

If code in  (1000,2000)

因此,为什么要检查已知分配值的一致性?

oracle
  • 1 个回答
  • 10 Views
Martin Hope
максим ильин
Asked: 2020-09-05 04:43:00 +0000 UTC

如何将过程参数传递给显式游标?

  • 2

这样的过程O_ID参数、游标参数的传递记录是否正确O_ID?他们的名字一样,不应该不一样吗?

我这样做:

CREATE OR REPLACE PROCEDURE test_insert (O_ID IN NUMBER)
IS
CURSOR cur_data(O_ID Order_Pos.order_id%TYPE) IS 
SELECT * FROM Order_Pos WHERE order_id=O_ID;
TYPE cur_data_s IS TABLE OF cur_data%ROWTYPE;
cur_data_e cur_data_s;
oracle
  • 1 个回答
  • 10 Views
Martin Hope
максим ильин
Asked: 2020-08-15 22:33:43 +0000 UTC

哪些任务使用显式或隐式游标?

  • 2

在哪些任务中应该使用游标:显式还是隐式?

在 Oracle PL/SQL For Professionals 一书的第 15 章中,他强调了以下思想:

相同类型的重复请求的显式游标

隐含的怎么办?类似的请求是什么意思?

oracle
  • 1 个回答
  • 10 Views
Martin Hope
максим ильин
Asked: 2020-12-24 17:24:05 +0000 UTC

如果发生错误,如何将 id 从循环传递给 Exeption?

  • 1

更新时,会弹出限制错误。您需要确定更新在哪条记录上捕获了错误。我不能将此条目传递给异常块,因为它发誓:

必须声明标识符“CUR.CSRES_RESOLUTION_ID”

我了解循环已关闭。在那种情况下如何转移CSRES_RESOLUTION_ID异常?

DECLARE
    err_code VARCHAR2(100);
BEGIN 
    FOR CUR IN (
        SELECT a.csres_resolution_id FROM CS_RESOLUTION A
        WHERE A.CSRES_RESOLUTION_NUMBER IN ('0356040','0356043','0356044')
    ) LOOP
        UPDATE CS_CASE B
        SET B.DCCST_CASE_STATUS_CODE='CLOSED BY PAYMENT'
        WHERE B.csres_resolution_id =CUR.csres_resolution_id;
    END LOOP;
COMMIT;
EXCEPTION WHEN OTHERS THEN
    err_code := SQLCODE;
    dbms_output.put_line(err_code||' '||CUR.csres_resolution_id );
END;
oracle
  • 2 个回答
  • 10 Views
Martin Hope
максим ильин
Asked: 2020-11-13 01:55:20 +0000 UTC

错误:ORA-06550。如何声明局部变量,如何为变量赋值,如何将变量插入谓词条件?

  • 2
DECLARE
    L_DATE_1 DATE :='2018-09-01';
    L_DATE_2 DATE :='2018-11-12';
    begin
    SELECT C."APPCM_PHOTO_CASE_MATERIAL_ID" ID,
           C.APPFE_EVENT_ID ev_id 
    FROM table_1 C
WHERE C.t_appcm_violation_date between L_DATE_1 and L_DATE_2

error 错误报告 -
ORA-06550:第 5 行,第 1 列:
PLS-00428:此 SELECT 子句中应有 INTO 子句

如果我已经确定了变量,他想从我这里得到什么?


    with period as (
        select
            to_date('2018-09-01','yyyy-mm-dd') L_DATE_1,
            to_date('2018-11-12','yyyy-mm-dd') L_DATE_2
        from
            dual
    )
 SELECT C.APPCM_PHOTO_CASE_MATERIAL_ID ID,
        C.APPFE_EVENT_ID ev_id 
 FROM period,table_1 C
 WHERE C.t_appcm_violation_date between L_DATE_1 and L_DATE_2

这就是我没有错误地得到我想要的东西的方式。如何通过声明局部变量进行类比?

sql
  • 2 个回答
  • 10 Views
Martin Hope
максим ильин
Asked: 2020-05-06 21:29:17 +0000 UTC

如何将向量 (1) 的行作为文本返回另一个向量 (2) 中出现的数字(运行“sklearn.preprocessing.LabelEncoder”的结果)?

  • 1

有一个由 2 个向量组成的数据框:其中 1 是向量 ('cleanUrl'),2 是向量 ('code_url')。

在第一个向量的url记录中,在第二个向量的url记录中使用from sklearn导入预处理库的preprocessing.LabelEncoder() 方法转换为数字

一种可能的解决方案是将向量 2 反向转换为文本,或将向量 1 转换为数字。

文件示例:

cleanUrl,code_url
amerikan-gruzovik.ru,4590
tinatube.net,74861
sextelevizor.net,66791
ru.anysex.com,62743
www.asiamobil.ru,86865
www.chinamobil.ru,90045
ad-k.ru,2637
www.nik-store.ru,105112
video-seks.net,80108
russkoe-porno.info,63946
www.foxporns.com,94819
www.chrono24.com.ru,90117
www.wibes.ru,118283
german242.com,26297
santdom.ru,65100
treningchess.com,76231
razvedem.web-3.ru,60517
aktis-stroy.ru,3525
www.aktis-stroy.ru,85600
plot.name,56170
www.lichnycabinet.ru,100979
www.worldfishing.narod.ru,118532
sekretka.su,66123
www.a-centre.ru,85011
www.suzukirus.ru,113986
pornogl.com,57123
wmid234ru.ru,83678
hsi.ru,29794
infometer.ru,31244
www.git77.rostrud.ru,95784
www.packagetrackr.com,106632
www.tns-global.ru,115139
www.vipgroup.net,117281
www.toysrus.com,115433
moskva.wisell.ru,46046
www.shopjustice.com,111904
deti75.ru,16625
crimeacity.info,15195
baza.crimea.ua,8838
atelica-oazis.bron.me,6647
gokurort.ru,26990
mitula17.imhonet.ru,44811
foxbrest.imhonet.ru,24645
xavi.imhonet.ru,120090
ural.kp.ru,78539
spb.kp.ru,69996
pinkmarie.com,55650
geneva2015.cars.ru,26188
domodedovo.rujazi.com,18057
xn------5cdjccgu2avckptly3ad8p.xn--e1arcbfn.xn--p1ai,120241
baikalpress.ru,8328
klimovsk.mnogonado.net,35750
svet-modern.ru,72656
www.forex-kf.ru,94627
www.uniq-ip.com,116401
www.terrawoman.ua,114714
www.gorsovet.mk.ua,96192
vmr.gov.ua,81250
helpstu.su,28874
www.helpstu.su,96823
zab-nanny.ru,122892
kursak-diplom.com.ua,37838
kgu-journalist.ucoz.ru,34771
mospf.ru,46093
newdiplom.ucoz.ru,49231
www.autoezda.com,87258
referats.nashisrael.ru,60990
www.hotdiplom.ru,97129
fotorakom.com,24577
redirect.disqus.com,60900
www.sq.com.ua,113207
member.newsnet.in.ua,43580
bankomet.com.ua,8537
po4emu.ru,56252
www.po4emu.ru,107650
tric.info,76258
myotpusk.com,47714
yspehx.narod.ru,122777
vozhatiki.ru,81885
kirent.narod.ru,35483
www.festivalsearcher.com,94080
hotasianz.com.6716069.yupiromo.ru,29549
starblag.ucoz.ua,70955
www.medalbum.ru,102495
ab28ru.narod.ru,2336
diel.ks.ua,16931
aniplay.tv,5091
ugolzreniya.narod.ru,77854
vrn.vestipk.ru,81990
afg-hist.ucoz.ru,3023
www.shanson-plus.ru,111700
www.vsmolenske.ru,117854
vsetutonline.com,82254
stomatologmova.ucoz.ua,71506
xn----8sbgjprccxgonf4d1dya7b.xn--p1ai,120742
yarcube.ru,122335
www.pion.com.ru,107364
76yar.ru,1961
loveplanet-online.ru,40510

答案将包含来自矢量 2 的数据帧格式的非重复行: [4590, 4591, 4594, 4595, 4597, 4598]一个数组示例,我不知道如何数据帧。

python
  • 1 个回答
  • 10 Views
Martin Hope
максим ильин
Asked: 2020-05-06 20:21:38 +0000 UTC

如何在矢量数据框中查找重复项?

  • 1

数据帧由向量“cleanUrl”和“code_url”组成,其中“cleanUrl”是一个引用,“code_url”是一个转换为数字的引用,使用:

from sklearn import preprocessing
le = preprocessing.LabelEncoder()

文件示例:

cleanUrl,code_url
amerikan-gruzovik.ru,4590
tinatube.net,74861
sextelevizor.net,66791
ru.anysex.com,62743
www.asiamobil.ru,86865
www.chinamobil.ru,90045
ad-k.ru,2637
www.nik-store.ru,105112
video-seks.net,80108
russkoe-porno.info,63946
www.foxporns.com,94819
www.chrono24.com.ru,90117
www.wibes.ru,118283
german242.com,26297
santdom.ru,65100
treningchess.com,76231
razvedem.web-3.ru,60517
aktis-stroy.ru,3525
www.aktis-stroy.ru,85600
plot.name,56170
www.lichnycabinet.ru,100979
www.worldfishing.narod.ru,118532
sekretka.su,66123
www.a-centre.ru,85011
www.suzukirus.ru,113986
pornogl.com,57123
wmid234ru.ru,83678
hsi.ru,29794
infometer.ru,31244
www.git77.rostrud.ru,95784
www.packagetrackr.com,106632
www.tns-global.ru,115139
www.vipgroup.net,117281
www.toysrus.com,115433
moskva.wisell.ru,46046
www.shopjustice.com,111904
deti75.ru,16625
crimeacity.info,15195
baza.crimea.ua,8838
atelica-oazis.bron.me,6647
gokurort.ru,26990
mitula17.imhonet.ru,44811
foxbrest.imhonet.ru,24645
xavi.imhonet.ru,120090
ural.kp.ru,78539
spb.kp.ru,69996
pinkmarie.com,55650
geneva2015.cars.ru,26188
domodedovo.rujazi.com,18057
xn------5cdjccgu2avckptly3ad8p.xn--e1arcbfn.xn--p1ai,120241
baikalpress.ru,8328
klimovsk.mnogonado.net,35750
svet-modern.ru,72656
www.forex-kf.ru,94627
www.uniq-ip.com,116401
www.terrawoman.ua,114714
www.gorsovet.mk.ua,96192
vmr.gov.ua,81250
helpstu.su,28874
www.helpstu.su,96823
zab-nanny.ru,122892
kursak-diplom.com.ua,37838
kgu-journalist.ucoz.ru,34771
mospf.ru,46093
newdiplom.ucoz.ru,49231
www.autoezda.com,87258
referats.nashisrael.ru,60990
www.hotdiplom.ru,97129
fotorakom.com,24577
redirect.disqus.com,60900
www.sq.com.ua,113207
member.newsnet.in.ua,43580
bankomet.com.ua,8537
po4emu.ru,56252
www.po4emu.ru,107650
tric.info,76258
myotpusk.com,47714
yspehx.narod.ru,122777
vozhatiki.ru,81885
kirent.narod.ru,35483
www.festivalsearcher.com,94080
hotasianz.com.6716069.yupiromo.ru,29549
starblag.ucoz.ua,70955
www.medalbum.ru,102495
ab28ru.narod.ru,2336
diel.ks.ua,16931
aniplay.tv,5091
ugolzreniya.narod.ru,77854
vrn.vestipk.ru,81990
afg-hist.ucoz.ru,3023
www.shanson-plus.ru,111700
www.vsmolenske.ru,117854
vsetutonline.com,82254
stomatologmova.ucoz.ua,71506
xn----8sbgjprccxgonf4d1dya7b.xn--p1ai,120742
yarcube.ru,122335
www.pion.com.ru,107364
76yar.ru,1961
loveplanet-online.ru,40510

我们需要返回与数据帧格式中的“cleanUrl”条目匹配的“code_url”条目。

该文件的完整版本包含 130,000 条记录。我尝试了一个嵌套循环,但是这个过程很长一段时间都起来了。:

d=[]
for a in range(len(df_label_url)):
    for b in range(len(df_label_url)):
        if df_label_url['code_url'][a]==df_label_url['cleanUrl'][b]:
            d.append(df_label_url['code_url'][a])

大概只有数据框格式:

[4590, 4590, 4590, 4590, 4590, 4590, 4590, 4590, 4590, 4590, 74861, 74861, 74861, 74861, 74861, 74861, 74861, 74861, 74861, 74861, 66791, 66791, 66791, 66791, 66791, 66791, 66791, 66791, 66791, 66791, 62743, 62743, 62743, 62743, 62743, 62743, 62743, 62743, 62743, 62743, 86865, 86865, 86865, 86865, 86865, 86865, 86865, 86865, 86865, 86865, 90045, 90045, 90045, 90045, 90045, 90045, 90045, 90045, 90045, 90045, 2637, 2637, 2637, 2637, 2637, 2637, 2637, 2637, 2637, 2637, 105112, 105112, 105112, 105112, 105112, 105112, 105112, 105112, 105112, 105112, 80108, 80108, 80108, 80108, 80108, 80108, 80108, 80108, 80108, 80108, 63946, 63946, 63946, 63946, 63946, 63946, 63946, 63946, 63946, 63946]
python
  • 1 个回答
  • 10 Views
Martin Hope
максим ильин
Asked: 2020-05-03 02:06:51 +0000 UTC

从 URL (scheme://netloc/path;parameters?query) 中快速提取 [netloc] 并将 UNIX 纪元转换为 DataFrame 列中的 [datetime]

  • 1

输入是一个大小为 5 万行的小 txt 文件

data_dir = 'data'
filename = 'gender_age_dataset.txt'
file_path = '/'.join([data_dir,filename])
df = pd.read_csv(file_path, sep='\t')

在此处输入图像描述

该文件有一列包含 json 字符串 user_json ,由 2 个属性组成:url 和时间戳。

任务是在将json字符串反序列化到整个数据集之前处理2个json属性(对于每个json字符串,数据集中的一个字符串将被复制多少个json字符串中的记录)

结果,将有 500 万行,在弱机器上进行后处理将需要数小时。

代码反序列化 json 字符串,并在其上复制 json 的主字符串:

data = df.drop('user_json',1).join(df.user_json.apply(lambda x: json.loads(x))).to_dict('records')
res = json_normalize(data, [['user_json','visits']], ['uid','gender','age'])

文件中的 2 行:

gender  age     uid     user_json
F       18-24   d50192e5-c44e-4ae8-ae7a-7cfe67c8b777    {"visits": [{"url": "http://zebra-zoya.ru/200028-chehol-organayzer-dlja-macbook-11-grid-it.html?utm_campaign=397720794&utm_content=397729344&utm_medium=cpc&utm_source=begun", "timestamp": 1419688144068}, {"url": "http://news.yandex.ru/yandsearch?cl4url=chezasite.com/htc/htc-one-m9-delay-86327.html&lr=213&rpt=story", "timestamp": 1426666298001}, {"url": "http://www.sotovik.ru/news/240283-htc-one-m9-zaderzhivaetsja.html", "timestamp": 1426666298000}, {"url": "http://news.yandex.ru/yandsearch?cl4url=chezasite.com/htc/htc-one-m9-delay-86327.html&lr=213&rpt=story", "timestamp": 1426661722001}, {"url": "http://www.sotovik.ru/news/240283-htc-one-m9-zaderzhivaetsja.html", "timestamp": 1426661722000}]}
M       25-34   d502331d-621e-4721-ada2-5d30b2c3801f    {"visits": [{"url": "http://sweetrading.ru/?p=900", "timestamp": 1419717886224}, {"url": "http://sweetrading.ru/?p=884", "timestamp": 1419717884437}, {"url": "http://sweetrading.ru/?p=1002", "timestamp": 1419717816375}, {"url": "http://101.ru/?an=port_channel_mp3", "timestamp": 1419717804934}, {"url": "http://sweetrading.ru/?cat=62", "timestamp": 1419714194423}, {"url": "http://sweetrading.ru/?p=1046", "timestamp": 1419713998481}, {"url": "http://sweetrading.ru/?p=978", "timestamp": 1419713927085}, {"url": "http://sweetrading.ru/?cat=171", "timestamp": 1419713908863}, {"url": "http://sweetrading.ru/?cat=62", "timestamp": 1419713908679}, {"url": "http://sweetrading.ru/?p=3648", "timestamp": 1419713798879}, {"url": "http://oesex.ru/955457", "timestamp": 1419595564407}, {"url": "http://www.interfax.ru/russia/408800", "timestamp": 1419542965224}, {"url": "http://101.ru/?an=port_channel_mp3&channel=30", "timestamp": 1418818241900}, {"url": "http://www.interfax.ru/russia/413508", "timestamp": 1418802080857}, {"url": "http://www.euroavtoprokat.ru/sitemap/car-rental/france.htm", "timestamp": 1418722961181}, {"url": "http://www.euroavtoprokat.ru/sitemap/car-rental.htm", "timestamp": 1418722945825}, {"url": "http://www.euroavtoprokat.ru/car-rental/germany.htm", "timestamp": 1418722937847}, {"url": "http://www.euroavtoprokat.ru/car-rental/germany.htm", "timestamp": 1418722923196}, {"url": "http://www.euroavtoprokat.ru/sitemap/car-rental.htm", "timestamp": 1418722909804}, {"url": "http://www.eavtoprokat.ru/prokat-avto/france", "timestamp": 1418646101953}, {"url": "http://www.wordparts.ru/numeral/", "timestamp": 1418592793587}, {"url": "http://rsdn.ru/forum/alg/3305190.flat", "timestamp": 1418591162814}, {"url": "http://www.euroavtoprokat.ru/car-rental/turkey/istanbul.htm", "timestamp": 1418571531780}, {"url": "http://citieslist.ru/", "timestamp": 1418488992092}, {"url": "http://www.euroavtoprokat.ru/car-rental/turkey/istanbul.htm", "timestamp": 1418480798674}, {"url": "http://rutv.ru/brand/show/episode/453757", "timestamp": 1418253037406}, {"url": "http://www.fodors.com/community/europe/best-car-rental-company-in-italy.cfm", "timestamp": 1418247198586}, {"url": "http://wheelsabroad.com/car-rental/united-kingdom/england/london?gclid=cjwkeaia-5-kbrdylpg5096r8masjabqedm4cmiichc-_-ewkbtsqyci5bu9ucwvjmxp4o0tficaarocljdw_wcb", "timestamp": 1418245144696}, {"url": "http://lestinet.com/site/stopagent.ru", "timestamp": 1418243376170}, {"url": "http://android-help.ru/q2a/16774/\u043a\u0430\u043a-\u043f\u043e\u043b\u0443\u0447\u0438\u0442\u044c-root-\u043f\u0440\u0430\u0432\u0430-\u043d\u0430-philips-w832-android-4-0-4", "timestamp": 1418169606439}, {"url": "http://club.dns-shop.ru/rabinovich/blog/\u044f-\u0432\u0441\u0435-\u0435\u0449\u0435-\u0434\u0435\u0440\u0436\u0443\u0441\u044c-\u043e\u0431\u0437\u043e\u0440-\u0441\u043c\u0430\u0440\u0442\u0444\u043e\u043d\u0430-philips-xenium-w832/", "timestamp": 1418169602505}, {"url": "http://www.supportforum.philips.com/ru/showthread.php?1529-philips-xenium-w832/page6", "timestamp": 1418167859617}, {"url": "http://www.supportforum.philips.com/ru/showthread.php?842-\u043d\u0435-\u0440\u0430\u0431\u043e\u0442\u0430\u0435\u0442-gps-\u0432-\u0441\u043c\u0430\u0440\u0442\u0444\u043e\u043d\u0435-philips-xenium-w832", "timestamp": 1418166430112}, {"url": "http://rabota.ua/info/jobsearcher/post/umora.aspx", "timestamp": 1418114698621}, {"url": "http://www.enter.ru/product/appliances/myasorubka-philips-hr2728-2020103007131", "timestamp": 1418053557067}, {"url": "http://www.ferra.ru/ru/byt/news/2013/12/02/polaris-pmg-1805/", "timestamp": 1417866883735}, {"url": "http://www.ferra.ru/ru/byt/news/2013/10/12/bosch-mfw6-propower/", "timestamp": 1417862586856}, {"url": "http://www.linotype.com/1266/neuehelvetica-family.html", "timestamp": 1417856979616}, {"url": "http://www.linotype.com/1546/tradegothic-family.html?site=webfonts", "timestamp": 1417812010753}, {"url": "http://www.vandelaydesign.com/best-ecommerce-website-designs/", "timestamp": 1417807232287}, {"url": "http://www.awwwards.com/20-of-the-very-best-e-commerce-web-sites.html", "timestamp": 1417805189928}, {"url": "http://101.ru/?an=port_channel_mp3&channel=82", "timestamp": 1417711286305}, {"url": "http://www.just.ru/myasorubki/56658_elektromyasorybky_kenwood_mg_450/?from=yandex_msk&utm_source=yandex&utm_medium=cpc&utm_campaign=10817239_model_bytovaya-tehnika-melkaya_msk_p_api&utm_content=612422293_2792852770_\u043c\u044f\u0441\u043e\u0440\u0443\u0431\u043a\u0443 mg 450&position_type=premi", "timestamp": 1417701042306}, {"url": "http://101.ru/?an=port_channel_mp3&channel=5", "timestamp": 1417695760398}, {"url": "http://101.ru/?an=port_channel_mp3&channel=5", "timestamp": 1417689964129}, {"url": "http://101.ru/?an=port_channel_mp3&channel=17", "timestamp": 1417683034834}, {"url": "http://www.kenwoodworld.com/ru-ru/all-products/blenders-mixers-and-meat-grinders/meat-grinders-ru/-mg350-0w21910001", "timestamp": 1417608945879}, {"url": "http://101.ru/?an=port_channel_mp3&channel=24", "timestamp": 1417605700777}, {"url": "http://101.ru/?an=port_channel_mp3&channel=24", "timestamp": 1417605639264}, {"url": "http://101.ru/?an=port_channel_mp3&channel=82", "timestamp": 1417605624817}, {"url": "http://www.kenwoodworld.com/ru-ru/all-products/blenders-mixers-and-meat-grinders/meat-grinders-ru/mg470-meat-grinder-0wmg470008", "timestamp": 1417604804579}, {"url": "http://livedemo00.template-help.com/magento_48517/blackberry-bold-9000-phone.html", "timestamp": 1417604730951}, {"url": "http://www.kenwoodworld.com/ru-ru/all-products/blenders-mixers-and-meat-grinders/meat-grinders-ru/mg470-meat-grinder-0wmg470008", "timestamp": 1417548651645}, {"url": "http://www.kenwoodworld.com/en-int/products/blenders/meat-grinders/mg474-meat-grinder", "timestamp": 1417548321763}, {"url": "http://www.kenwoodworld.com/ru-ru/all-products/blenders-mixers-and-meat-grinders/meat-grinders-ru/-mg350-0w21910001", "timestamp": 1417548310507}, {"url": "http://www.kenwoodworld.com/ru-ru/all-products/blenders-mixers-and-meat-grinders/meat-grinders-ru/-mg350-0w21910001", "timestamp": 1417548309162}, {"url": "http://www.kenwoodworld.com/ru-ru/all-products/blenders-mixers-and-meat-grinders/meat-grinders-ru/-mg350-0w21910001?feat=6405fda1-43cc-42cc-8860-1c2a492555c5&tabsegment=key-features", "timestamp": 1417548297576}, {"url": "http://www.kenwoodworld.com/ru-ru/all-products/blenders-mixers-and-meat-grinders/meat-grinders-ru/-mg350-0w21910001?tabsegment=key-features", "timestamp": 1417548284970}, {"url": "http://www.kenwoodworld.com/ru-ru/all-products/blenders-mixers-and-meat-grinders/meat-grinders-ru/-mg350-0w21910001", "timestamp": 1417548264964}, {"url": "http://www.kenwoodworld.com/en-int/products/blenders/meat-grinders", "timestamp": 1417546314287}, {"url": "http://www.kenwoodworld.com/ru-ru/all-products/blenders-mixers-and-meat-grinders/meat-grinders-ru/mg700-meat-grinder-0wmg700006", "timestamp": 1417545459520}, {"url": "http://www.kenwoodworld.com/ru-ru/all-products/blenders-mixers-and-meat-grinders/meat-grinders-ru/mg700-meat-grinder-0wmg700006", "timestamp": 1417545200191}, {"url": "http://www.kenwoodworld.com/ru-ru/all-products/kmix-by-kenwood/kmix-kitchen-machines-/kmx51-kmix-kitchen-machine-0wkmx51002", "timestamp": 1417545116313}, {"url": "http://www.kenwoodworld.com/ru-ru/all-products/blenders-mixers-and-meat-grinders/meat-grinders-ru/---mg517---0wmg517007", "timestamp": 1417544991760}, {"url": "http://www.kenwoodworld.com/ru-ru/all-products/blenders-mixers-and-meat-grinders/meat-grinders-ru/-mg350-0w21910001", "timestamp": 1417544967371}, {"url": "http://www.kenwoodworld.com/ru-ru/all-products/blenders-mixers-and-meat-grinders/meat-grinders-ru/-mg350-0w21910001?feat=ac86d868-3ea4-4523-93e1-885bbf4222cd&tabsegment=key-features", "timestamp": 1417544772661}, {"url": "http://www.kenwoodworld.com/ru-ru/all-products/blenders-mixers-and-meat-grinders/meat-grinders-ru/-mg350-0w21910001?feat=3a288c22-e5f2-448e-a573-ccde95fd2341&tabsegment=key-features", "timestamp": 1417544765049}, {"url": "http://www.kenwoodworld.com/ru-ru/all-products/blenders-mixers-and-meat-grinders/meat-grinders-ru/-mg350-0w21910001?feat=ac86d868-3ea4-4523-93e1-885bbf4222cd&tabsegment=key-features", "timestamp": 1417544748628}, {"url": "http://www.kenwoodworld.com/ru-ru/all-products/blenders-mixers-and-meat-grinders/meat-grinders-ru/-mg350-0w21910001?tabsegment=key-features", "timestamp": 1417544731238}, {"url": "http://www.kenwoodworld.com/ru-ru/all-products/blenders-mixers-and-meat-grinders/meat-grinders-ru/-mg350-0w21910001", "timestamp": 1417544522237}, {"url": "http://www.kenwoodworld.com/ru-ru/all-products/blenders-mixers-and-meat-grinders/meat-grinders-ru/-mg350-0w21910001", "timestamp": 1417544351791}, {"url": "http://www.kenwoodworld.com/ru-ru/all-products/blenders-mixers-and-meat-grinders/meat-grinders-ru/-mg350-0w21910001", "timestamp": 1417544282950}, {"url": "http://www.kenwoodworld.com/ru-ru", "timestamp": 1417544269909}, {"url": "http://www.kenwoodworld.com/ru-ru/all-products/blenders-mixers-and-meat-grinders/meat-grinders-ru/mg516-meat-grinder-and-roto-food-cutter-0wmg516006?tabsegment=specifications", "timestamp": 1417544204394}, {"url": "http://www.kenwoodworld.com/ru-ru/all-products/blenders-mixers-and-meat-grinders/meat-grinders-ru/mg516-meat-grinder-and-roto-food-cutter-0wmg516006", "timestamp": 1417544190747}, {"url": "http://www.kenwoodworld.com/ru-ru/all-products/blenders-mixers-and-meat-grinders/meat-grinders-ru/mg516-meat-grinder-and-roto-food-cutter-0wmg516006", "timestamp": 1417544045014}, {"url": "http://www.kenwoodworld.com/ru-ru/all-products/blenders-mixers-and-meat-grinders/meat-grinders-ru/mg516-meat-grinder-and-roto-food-cutter-0wmg516006?tabsegment=specifications", "timestamp": 1417544035023}, {"url": "http://www.kenwoodworld.com/ru-ru/all-products/blenders-mixers-and-meat-grinders/meat-grinders-ru/mg516-meat-grinder-and-roto-food-cutter-0wmg516006", "timestamp": 1417544015196}, {"url": "http://www.kenwoodworld.com/ru-ru", "timestamp": 1417544004579}, {"url": "http://www.kenwoodworld.com/ru-ru/all-products/blenders-mixers-and-meat-grinders/meat-grinders-ru/mg510-meat-grinder-0wmg510009?tabsegment=specifications", "timestamp": 1417543914820}, {"url": "http://www.kenwoodworld.com/uk/search-results", "timestamp": 1417543814629}, {"url": "http://www.kenwoodworld.com/uk/search-results", "timestamp": 1417543642699}, {"url": "http://www.kenwoodworld.com/uk/search-results", "timestamp": 1417543628088}, {"url": "http://www.kenwoodworld.com/uk/search-results", "timestamp": 1417543616074}, {"url": "http://www.kenwoodworld.com/uk/products/food-mixers/chef-major-attachments/potato-peeler-at444-awat444001", "timestamp": 1417543439173}, {"url": "http://www.kenwoodworld.com/uk/search-results", "timestamp": 1417543352117}, {"url": "http://www.kenwoodworld.com/uk/search-results", "timestamp": 1417543294005}, {"url": "http://www.kenwoodworld.com/uk/search-results", "timestamp": 1417543192107}, {"url": "http://www.kenwoodworld.com/uk", "timestamp": 1417543022466}, {"url": "http://www.kenwoodworld.com/ru-ru/all-products/blenders-mixers-and-meat-grinders/meat-grinders-ru/mg510-meat-grinder-0wmg510009?tabsegment=specifications", "timestamp": 1417542940415}, {"url": "http://www.kenwoodworld.com/ru-ru/all-products/blenders-mixers-and-meat-grinders/meat-grinders-ru/mg510-meat-grinder-0wmg510009?tabsegment=support", "timestamp": 1417542907491}, {"url": "http://www.kenwoodworld.com/ru-ru/all-products/blenders-mixers-and-meat-grinders/meat-grinders-ru/mg510-meat-grinder-0wmg510009?tabsegment=specifications", "timestamp": 1417542866623}, {"url": "http://www.kenwoodworld.com/ru-ru/all-products/blenders-mixers-and-meat-grinders/meat-grinders-ru/mg510-meat-grinder-0wmg510009", "timestamp": 1417542858206}, {"url": "http://www.kenwoodworld.com/ru-ru/all-products/blenders-mixers-and-meat-grinders/meat-grinders-ru/mg510-meat-grinder-0wmg510009", "timestamp": 1417542839578}, {"url": "http://www.kenwoodworld.com/ru-ru/all-products/blenders-mixers-and-meat-grinders/meat-grinders-ru/mg510-meat-grinder-0wmg510009", "timestamp": 1417542795850}, {"url": "http://www.kenwoodworld.com/ru-ru/all-products/blenders-mixers-and-meat-grinders/meat-grinders-ru/mg510-meat-grinder-0wmg510009", "timestamp": 1417542742883}, {"url": "http://www.kenwoodworld.com/ru-ru/all-products/blenders-mixers-and-meat-grinders/meat-grinders-ru/mg510-meat-grinder-0wmg510009", "timestamp": 1417542725367}, {"url": "http://www.kenwoodworld.com/ru-ru/all-products/blenders-mixers-and-meat-grinders/meat-grinders-ru/mg510-meat-grinder-0wmg510009", "timestamp": 1417542659966}, {"url": "http://www.kenwoodworld.com/ru-ru", "timestamp": 1417542501523}, {"url": "http://101.ru/?an=port_channel_mp3&channel=24", "timestamp": 1417542435930}, {"url": "http://www.shop-script.ru/platform/", "timestamp": 1417473193974}, {"url": "http://101.ru/?an=port_channel_mp3&channel=34", "timestamp": 1417451297674}]}

来自 json 的 url 的后处理代码:

import re
import os, sys
import json
from urllib.parse import urlparse
from urllib.request import urlretrieve, unquote
for c in range(len(res)):
        a=urlparse(unquote(res['url'][c]))
        res['url'][c]=str(re.search("(?:www\.)?(.*)",a.netloc).group(1))

Было: http://news.yandex.ru/yandsearch?cl4url=chezasite.com/htc/htc-one-m9-delay-86327.html&lr=213&rpt=story   
Стало:news.yandex.ru

来自json的'timestamp'时间后处理代码:

import datetime
from datetime import timedelta
mytime=datetime.datetime.fromtimestamp(int(res['timestamp'][1]/1000))
if  mytime.replace(minute=30) < mytime:
    mytime = mytime.replace(second=0, microsecond=0, minute=0) + timedelta(hours=1)
else:
    mytime=mytime.replace(second=0, microsecond=0, minute=0)

Было 1426666298001   Стало:2015-03-18 11:00:00

如何在代码中实现后处理代码以反序列化 json 字符串,并在 which json 上复制主字符串?如果有不清楚的地方我会写

python
  • 1 个回答
  • 10 Views
Martin Hope
максим ильин
Asked: 2020-05-01 21:34:40 +0000 UTC

将 json 字符串反序列化为整个数据集

  • 0

数据集有一个向量“ user_json ”,其中包含一些序列化的 json 字符串。

data_dir = 'data'
filename = 'gender_age_dataset.txt'
file_path = '/'.join([data_dir,filename])
df = pd.read_csv(file_path, sep='\t')

一行'user_json'包含什么-属性'url'和'timestamp'

print(json.loads(df.iloc[0].user_json)):
{'visits': [{'url': 'http://zebra-zoya.ru/200028-chehol-organayzer-dlja-macbook-11-grid-it.html?utm_campaign=397720794&utm_content=397729344&utm_medium=cpc&utm_source=begun', 'timestamp': 1419688144068}, {'url': 'http://news.yandex.ru/yandsearch?cl4url=chezasite.com/htc/htc-one-m9-delay-86327.html&lr=213&rpt=story', 'timestamp': 1426666298001}, {'url': 'http://www.sotovik.ru/news/240283-htc-one-m9-zaderzhivaetsja.html', 'timestamp': 1426666298000}, {'url': 'http://news.yandex.ru/yandsearch?cl4url=chezasite.com/htc/htc-one-m9-delay-86327.html&lr=213&rpt=story', 'timestamp': 1426661722001}, {'url': 'http://www.sotovik.ru/news/240283-htc-one-m9-zaderzhivaetsja.html', 'timestamp': 1426661722000}]}

带有标题的示例 2 行:

gender  age     uid     user_json
F       18-24   d50192e5-c44e-4ae8-ae7a-7cfe67c8b777    {"visits": [{"url": "http://zebra-zoya.ru/200028-chehol-organayzer-dlja-macbook-11-grid-it.html?utm_campaign=397720794&utm_content=397729344&utm_medium=cpc&utm_source=begun", "timestamp": 1419688144068}, {"url": "http://news.yandex.ru/yandsearch?cl4url=chezasite.com/htc/htc-one-m9-delay-86327.html&lr=213&rpt=story", "timestamp": 1426666298001}, {"url": "http://www.sotovik.ru/news/240283-htc-one-m9-zaderzhivaetsja.html", "timestamp": 1426666298000}, {"url": "http://news.yandex.ru/yandsearch?cl4url=chezasite.com/htc/htc-one-m9-delay-86327.html&lr=213&rpt=story", "timestamp": 1426661722001}, {"url": "http://www.sotovik.ru/news/240283-htc-one-m9-zaderzhivaetsja.html", "timestamp": 1426661722000}]}
M       25-34   d502331d-621e-4721-ada2-5d30b2c3801f    {"visits": [{"url": "http://sweetrading.ru/?p=900", "timestamp": 1419717886224}, {"url": "http://sweetrading.ru/?p=884", "timestamp": 1419717884437}, {"url": "http://sweetrading.ru/?p=1002", "timestamp": 1419717816375}, {"url": "http://101.ru/?an=port_channel_mp3", "timestamp": 1419717804934}, {"url": "http://sweetrading.ru/?cat=62", "timestamp": 1419714194423}, {"url": "http://sweetrading.ru/?p=1046", "timestamp": 1419713998481}, {"url": "http://sweetrading.ru/?p=978", "timestamp": 1419713927085}, {"url": "http://sweetrading.ru/?cat=171", "timestamp": 1419713908863}, {"url": "http://sweetrading.ru/?cat=62", "timestamp": 1419713908679}, {"url": "http://sweetrading.ru/?p=3648", "timestamp": 1419713798879}, {"url": "http://oesex.ru/955457", "timestamp": 1419595564407}, {"url": "http://www.interfax.ru/russia/408800", "timestamp": 1419542965224}, {"url": "http://101.ru/?an=port_channel_mp3&channel=30", "timestamp": 1418818241900}, {"url": "http://www.interfax.ru/russia/413508", "timestamp": 1418802080857}, {"url": "http://www.euroavtoprokat.ru/sitemap/car-rental/france.htm", "timestamp": 1418722961181}, {"url": "http://www.euroavtoprokat.ru/sitemap/car-rental.htm", "timestamp": 1418722945825}, {"url": "http://www.euroavtoprokat.ru/car-rental/germany.htm", "timestamp": 1418722937847}, {"url": "http://www.euroavtoprokat.ru/car-rental/germany.htm", "timestamp": 1418722923196}, {"url": "http://www.euroavtoprokat.ru/sitemap/car-rental.htm", "timestamp": 1418722909804}, {"url": "http://www.eavtoprokat.ru/prokat-avto/france", "timestamp": 1418646101953}, {"url": "http://www.wordparts.ru/numeral/", "timestamp": 1418592793587}, {"url": "http://rsdn.ru/forum/alg/3305190.flat", "timestamp": 1418591162814}, {"url": "http://www.euroavtoprokat.ru/car-rental/turkey/istanbul.htm", "timestamp": 1418571531780}, {"url": "http://citieslist.ru/", "timestamp": 1418488992092}, {"url": "http://www.euroavtoprokat.ru/car-rental/turkey/istanbul.htm", "timestamp": 1418480798674}, {"url": "http://rutv.ru/brand/show/episode/453757", "timestamp": 1418253037406}, {"url": "http://www.fodors.com/community/europe/best-car-rental-company-in-italy.cfm", "timestamp": 1418247198586}, {"url": "http://wheelsabroad.com/car-rental/united-kingdom/england/london?gclid=cjwkeaia-5-kbrdylpg5096r8masjabqedm4cmiichc-_-ewkbtsqyci5bu9ucwvjmxp4o0tficaarocljdw_wcb", "timestamp": 1418245144696}, {"url": "http://lestinet.com/site/stopagent.ru", "timestamp": 1418243376170}, {"url": "http://android-help.ru/q2a/16774/\u043a\u0430\u043a-\u043f\u043e\u043b\u0443\u0447\u0438\u0442\u044c-root-\u043f\u0440\u0430\u0432\u0430-\u043d\u0430-philips-w832-android-4-0-4", "timestamp": 1418169606439}, {"url": "http://club.dns-shop.ru/rabinovich/blog/\u044f-\u0432\u0441\u0435-\u0435\u0449\u0435-\u0434\u0435\u0440\u0436\u0443\u0441\u044c-\u043e\u0431\u0437\u043e\u0440-\u0441\u043c\u0430\u0440\u0442\u0444\u043e\u043d\u0430-philips-xenium-w832/", "timestamp": 1418169602505}, {"url": "http://www.supportforum.philips.com/ru/showthread.php?1529-philips-xenium-w832/page6", "timestamp": 1418167859617}, {"url": "http://www.supportforum.philips.com/ru/showthread.php?842-\u043d\u0435-\u0440\u0430\u0431\u043e\u0442\u0430\u0435\u0442-gps-\u0432-\u0441\u043c\u0430\u0440\u0442\u0444\u043e\u043d\u0435-philips-xenium-w832", "timestamp": 1418166430112}, {"url": "http://rabota.ua/info/jobsearcher/post/umora.aspx", "timestamp": 1418114698621}, {"url": "http://www.enter.ru/product/appliances/myasorubka-philips-hr2728-2020103007131", "timestamp": 1418053557067}, {"url": "http://www.ferra.ru/ru/byt/news/2013/12/02/polaris-pmg-1805/", "timestamp": 1417866883735}, {"url": "http://www.ferra.ru/ru/byt/news/2013/10/12/bosch-mfw6-propower/", "timestamp": 1417862586856}, {"url": "http://www.linotype.com/1266/neuehelvetica-family.html", "timestamp": 1417856979616}, {"url": "http://www.linotype.com/1546/tradegothic-family.html?site=webfonts", "timestamp": 1417812010753}, {"url": "http://www.vandelaydesign.com/best-ecommerce-website-designs/", "timestamp": 1417807232287}, {"url": "http://www.awwwards.com/20-of-the-very-best-e-commerce-web-sites.html", "timestamp": 1417805189928}, {"url": "http://101.ru/?an=port_channel_mp3&channel=82", "timestamp": 1417711286305}, {"url": "http://www.just.ru/myasorubki/56658_elektromyasorybky_kenwood_mg_450/?from=yandex_msk&utm_source=yandex&utm_medium=cpc&utm_campaign=10817239_model_bytovaya-tehnika-melkaya_msk_p_api&utm_content=612422293_2792852770_\u043c\u044f\u0441\u043e\u0440\u0443\u0431\u043a\u0443 mg 450&position_type=premi", "timestamp": 1417701042306}, {"url": "http://101.ru/?an=port_channel_mp3&channel=5", "timestamp": 1417695760398}, {"url": "http://101.ru/?an=port_channel_mp3&channel=5", "timestamp": 1417689964129}, {"url": "http://101.ru/?an=port_channel_mp3&channel=17", "timestamp": 1417683034834}, {"url": "http://www.kenwoodworld.com/ru-ru/all-products/blenders-mixers-and-meat-grinders/meat-grinders-ru/-mg350-0w21910001", "timestamp": 1417608945879}, {"url": "http://101.ru/?an=port_channel_mp3&channel=24", "timestamp": 1417605700777}, {"url": "http://101.ru/?an=port_channel_mp3&channel=24", "timestamp": 1417605639264}, {"url": "http://101.ru/?an=port_channel_mp3&channel=82", "timestamp": 1417605624817}, {"url": "http://www.kenwoodworld.com/ru-ru/all-products/blenders-mixers-and-meat-grinders/meat-grinders-ru/mg470-meat-grinder-0wmg470008", "timestamp": 1417604804579}, {"url": "http://livedemo00.template-help.com/magento_48517/blackberry-bold-9000-phone.html", "timestamp": 1417604730951}, {"url": "http://www.kenwoodworld.com/ru-ru/all-products/blenders-mixers-and-meat-grinders/meat-grinders-ru/mg470-meat-grinder-0wmg470008", "timestamp": 1417548651645}, {"url": "http://www.kenwoodworld.com/en-int/products/blenders/meat-grinders/mg474-meat-grinder", "timestamp": 1417548321763}, {"url": "http://www.kenwoodworld.com/ru-ru/all-products/blenders-mixers-and-meat-grinders/meat-grinders-ru/-mg350-0w21910001", "timestamp": 1417548310507}, {"url": "http://www.kenwoodworld.com/ru-ru/all-products/blenders-mixers-and-meat-grinders/meat-grinders-ru/-mg350-0w21910001", "timestamp": 1417548309162}, {"url": "http://www.kenwoodworld.com/ru-ru/all-products/blenders-mixers-and-meat-grinders/meat-grinders-ru/-mg350-0w21910001?feat=6405fda1-43cc-42cc-8860-1c2a492555c5&tabsegment=key-features", "timestamp": 1417548297576}, {"url": "http://www.kenwoodworld.com/ru-ru/all-products/blenders-mixers-and-meat-grinders/meat-grinders-ru/-mg350-0w21910001?tabsegment=key-features", "timestamp": 1417548284970}, {"url": "http://www.kenwoodworld.com/ru-ru/all-products/blenders-mixers-and-meat-grinders/meat-grinders-ru/-mg350-0w21910001", "timestamp": 1417548264964}, {"url": "http://www.kenwoodworld.com/en-int/products/blenders/meat-grinders", "timestamp": 1417546314287}, {"url": "http://www.kenwoodworld.com/ru-ru/all-products/blenders-mixers-and-meat-grinders/meat-grinders-ru/mg700-meat-grinder-0wmg700006", "timestamp": 1417545459520}, {"url": "http://www.kenwoodworld.com/ru-ru/all-products/blenders-mixers-and-meat-grinders/meat-grinders-ru/mg700-meat-grinder-0wmg700006", "timestamp": 1417545200191}, {"url": "http://www.kenwoodworld.com/ru-ru/all-products/kmix-by-kenwood/kmix-kitchen-machines-/kmx51-kmix-kitchen-machine-0wkmx51002", "timestamp": 1417545116313}, {"url": "http://www.kenwoodworld.com/ru-ru/all-products/blenders-mixers-and-meat-grinders/meat-grinders-ru/---mg517---0wmg517007", "timestamp": 1417544991760}, {"url": "http://www.kenwoodworld.com/ru-ru/all-products/blenders-mixers-and-meat-grinders/meat-grinders-ru/-mg350-0w21910001", "timestamp": 1417544967371}, {"url": "http://www.kenwoodworld.com/ru-ru/all-products/blenders-mixers-and-meat-grinders/meat-grinders-ru/-mg350-0w21910001?feat=ac86d868-3ea4-4523-93e1-885bbf4222cd&tabsegment=key-features", "timestamp": 1417544772661}, {"url": "http://www.kenwoodworld.com/ru-ru/all-products/blenders-mixers-and-meat-grinders/meat-grinders-ru/-mg350-0w21910001?feat=3a288c22-e5f2-448e-a573-ccde95fd2341&tabsegment=key-features", "timestamp": 1417544765049}, {"url": "http://www.kenwoodworld.com/ru-ru/all-products/blenders-mixers-and-meat-grinders/meat-grinders-ru/-mg350-0w21910001?feat=ac86d868-3ea4-4523-93e1-885bbf4222cd&tabsegment=key-features", "timestamp": 1417544748628}, {"url": "http://www.kenwoodworld.com/ru-ru/all-products/blenders-mixers-and-meat-grinders/meat-grinders-ru/-mg350-0w21910001?tabsegment=key-features", "timestamp": 1417544731238}, {"url": "http://www.kenwoodworld.com/ru-ru/all-products/blenders-mixers-and-meat-grinders/meat-grinders-ru/-mg350-0w21910001", "timestamp": 1417544522237}, {"url": "http://www.kenwoodworld.com/ru-ru/all-products/blenders-mixers-and-meat-grinders/meat-grinders-ru/-mg350-0w21910001", "timestamp": 1417544351791}, {"url": "http://www.kenwoodworld.com/ru-ru/all-products/blenders-mixers-and-meat-grinders/meat-grinders-ru/-mg350-0w21910001", "timestamp": 1417544282950}, {"url": "http://www.kenwoodworld.com/ru-ru", "timestamp": 1417544269909}, {"url": "http://www.kenwoodworld.com/ru-ru/all-products/blenders-mixers-and-meat-grinders/meat-grinders-ru/mg516-meat-grinder-and-roto-food-cutter-0wmg516006?tabsegment=specifications", "timestamp": 1417544204394}, {"url": "http://www.kenwoodworld.com/ru-ru/all-products/blenders-mixers-and-meat-grinders/meat-grinders-ru/mg516-meat-grinder-and-roto-food-cutter-0wmg516006", "timestamp": 1417544190747}, {"url": "http://www.kenwoodworld.com/ru-ru/all-products/blenders-mixers-and-meat-grinders/meat-grinders-ru/mg516-meat-grinder-and-roto-food-cutter-0wmg516006", "timestamp": 1417544045014}, {"url": "http://www.kenwoodworld.com/ru-ru/all-products/blenders-mixers-and-meat-grinders/meat-grinders-ru/mg516-meat-grinder-and-roto-food-cutter-0wmg516006?tabsegment=specifications", "timestamp": 1417544035023}, {"url": "http://www.kenwoodworld.com/ru-ru/all-products/blenders-mixers-and-meat-grinders/meat-grinders-ru/mg516-meat-grinder-and-roto-food-cutter-0wmg516006", "timestamp": 1417544015196}, {"url": "http://www.kenwoodworld.com/ru-ru", "timestamp": 1417544004579}, {"url": "http://www.kenwoodworld.com/ru-ru/all-products/blenders-mixers-and-meat-grinders/meat-grinders-ru/mg510-meat-grinder-0wmg510009?tabsegment=specifications", "timestamp": 1417543914820}, {"url": "http://www.kenwoodworld.com/uk/search-results", "timestamp": 1417543814629}, {"url": "http://www.kenwoodworld.com/uk/search-results", "timestamp": 1417543642699}, {"url": "http://www.kenwoodworld.com/uk/search-results", "timestamp": 1417543628088}, {"url": "http://www.kenwoodworld.com/uk/search-results", "timestamp": 1417543616074}, {"url": "http://www.kenwoodworld.com/uk/products/food-mixers/chef-major-attachments/potato-peeler-at444-awat444001", "timestamp": 1417543439173}, {"url": "http://www.kenwoodworld.com/uk/search-results", "timestamp": 1417543352117}, {"url": "http://www.kenwoodworld.com/uk/search-results", "timestamp": 1417543294005}, {"url": "http://www.kenwoodworld.com/uk/search-results", "timestamp": 1417543192107}, {"url": "http://www.kenwoodworld.com/uk", "timestamp": 1417543022466}, {"url": "http://www.kenwoodworld.com/ru-ru/all-products/blenders-mixers-and-meat-grinders/meat-grinders-ru/mg510-meat-grinder-0wmg510009?tabsegment=specifications", "timestamp": 1417542940415}, {"url": "http://www.kenwoodworld.com/ru-ru/all-products/blenders-mixers-and-meat-grinders/meat-grinders-ru/mg510-meat-grinder-0wmg510009?tabsegment=support", "timestamp": 1417542907491}, {"url": "http://www.kenwoodworld.com/ru-ru/all-products/blenders-mixers-and-meat-grinders/meat-grinders-ru/mg510-meat-grinder-0wmg510009?tabsegment=specifications", "timestamp": 1417542866623}, {"url": "http://www.kenwoodworld.com/ru-ru/all-products/blenders-mixers-and-meat-grinders/meat-grinders-ru/mg510-meat-grinder-0wmg510009", "timestamp": 1417542858206}, {"url": "http://www.kenwoodworld.com/ru-ru/all-products/blenders-mixers-and-meat-grinders/meat-grinders-ru/mg510-meat-grinder-0wmg510009", "timestamp": 1417542839578}, {"url": "http://www.kenwoodworld.com/ru-ru/all-products/blenders-mixers-and-meat-grinders/meat-grinders-ru/mg510-meat-grinder-0wmg510009", "timestamp": 1417542795850}, {"url": "http://www.kenwoodworld.com/ru-ru/all-products/blenders-mixers-and-meat-grinders/meat-grinders-ru/mg510-meat-grinder-0wmg510009", "timestamp": 1417542742883}, {"url": "http://www.kenwoodworld.com/ru-ru/all-products/blenders-mixers-and-meat-grinders/meat-grinders-ru/mg510-meat-grinder-0wmg510009", "timestamp": 1417542725367}, {"url": "http://www.kenwoodworld.com/ru-ru/all-products/blenders-mixers-and-meat-grinders/meat-grinders-ru/mg510-meat-grinder-0wmg510009", "timestamp": 1417542659966}, {"url": "http://www.kenwoodworld.com/ru-ru", "timestamp": 1417542501523}, {"url": "http://101.ru/?an=port_channel_mp3&channel=24", "timestamp": 1417542435930}, {"url": "http://www.shop-script.ru/platform/", "timestamp": 1417473193974}, {"url": "http://101.ru/?an=port_channel_mp3&channel=34", "timestamp": 1417451297674}]}

需要'url'和'timestamp'成为df数据集向量,如何实现?

python-3.x
  • 1 个回答
  • 10 Views
Martin Hope
максим ильин
Asked: 2020-04-29 03:11:29 +0000 UTC

从 DataFrame 文本列中删除标签

  • 4

all_file 文件夹包含 200 个文本文件,每个文件由 1 行包含 html 标记的文本组成。

数据框创建代码:

dir_input='/data/home/maksim.ilin/data/all_file/*.txt'
files=glob.glob(dir_input)
df=pd.concat([pd.read_csv(f,header=None,sep='\t') for f in files],ignore_index=True)

作为连接的结果,得到一个单向量数据帧信息:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 200 entries, 0 to 199
Data columns (total 1 columns):
0    200 non-null object
dtypes: object(1)
memory usage: 1.6+ KB

数据框如下所示:在此处输入图像描述

删除标签:

df[0]=df[0].astype(str)
texts=[]
for a in df.iterrows():
    texts.append(BeautifulSoup(a['description']).text)

结果,错误:

---> 22     texts.append(BeautifulSoup(a['description']).text)
TypeError: tuple indices must be integers or slices, not str

文件中的示例行:

<p>Приглашается Бренд-менеджер в известную компанию (сеть магазинов бытовой, видео, аудио-техники). </p><p>Требования:<br />Мужчина/женщина, <br />25-40 лет, <br />образование высшее (желательно маркетинг), <br />с опытом работы от 3 лет на позиции бренд-менеджера (в компании, занимающейся бытовой техникой или в очень крупной компании). <br />Обязательно хороший уровень английского (устный и письменный), <br />сильные навыки управления проектами. <br />Сильные презентационные навыки. <br />ПК: MS Office, Power Point – обязательно. </p><p>Обязанности: <br />продвижение бренда компании, <br />маркетинговые исследования, <br />вывод собственных брендов на рынок, <br />имиджевая реклама. </p><p>Условия:<br />Офис в центре. <br />Возможны командировки. </p>

有一个文本转换,结果它没有转换。怎么对?它会接受BeautifulSoup这种形式的数据吗?

python
  • 1 个回答
  • 10 Views

Sidebar

Stats

  • 问题 10021
  • Answers 30001
  • 最佳答案 8000
  • 用户 6900
  • 常问
  • 回答
  • Marko Smith

    我看不懂措辞

    • 1 个回答
  • Marko Smith

    请求的模块“del”不提供名为“default”的导出

    • 3 个回答
  • Marko Smith

    "!+tab" 在 HTML 的 vs 代码中不起作用

    • 5 个回答
  • Marko Smith

    我正在尝试解决“猜词”的问题。Python

    • 2 个回答
  • Marko Smith

    可以使用哪些命令将当前指针移动到指定的提交而不更改工作目录中的文件?

    • 1 个回答
  • Marko Smith

    Python解析野莓

    • 1 个回答
  • Marko Smith

    问题:“警告:检查最新版本的 pip 时出错。”

    • 2 个回答
  • Marko Smith

    帮助编写一个用值填充变量的循环。解决这个问题

    • 2 个回答
  • Marko Smith

    尽管依赖数组为空,但在渲染上调用了 2 次 useEffect

    • 2 个回答
  • Marko Smith

    数据不通过 Telegram.WebApp.sendData 发送

    • 1 个回答
  • Martin Hope
    Alexandr_TT 2020年新年大赛! 2020-12-20 18:20:21 +0000 UTC
  • Martin Hope
    Alexandr_TT 圣诞树动画 2020-12-23 00:38:08 +0000 UTC
  • Martin Hope
    Air 究竟是什么标识了网站访问者? 2020-11-03 15:49:20 +0000 UTC
  • Martin Hope
    Qwertiy 号码显示 9223372036854775807 2020-07-11 18:16:49 +0000 UTC
  • Martin Hope
    user216109 如何为黑客设下陷阱,或充分击退攻击? 2020-05-10 02:22:52 +0000 UTC
  • Martin Hope
    Qwertiy 并变成3个无穷大 2020-11-06 07:15:57 +0000 UTC
  • Martin Hope
    koks_rs 什么是样板代码? 2020-10-27 15:43:19 +0000 UTC
  • Martin Hope
    Sirop4ik 向 git 提交发布的正确方法是什么? 2020-10-05 00:02:00 +0000 UTC
  • Martin Hope
    faoxis 为什么在这么多示例中函数都称为 foo? 2020-08-15 04:42:49 +0000 UTC
  • Martin Hope
    Pavel Mayorov 如何从事件或回调函数中返回值?或者至少等他们完成。 2020-08-11 16:49:28 +0000 UTC

热门标签

javascript python java php c# c++ html android jquery mysql

Explore

  • 主页
  • 问题
    • 热门问题
    • 最新问题
  • 标签
  • 帮助

Footer

RError.com

关于我们

  • 关于我们
  • 联系我们

Legal Stuff

  • Privacy Policy

帮助

© 2023 RError.com All Rights Reserve   沪ICP备12040472号-5