Revolucion for Monica提出的问题

Revolucion for Monica

Asked: 2022-01-21 20:49:58 +0000 UTC

在迭代不同的时间序列时，ARMA statsmodel 停留在一个预测上

0

我制作了一个 ARMA 模型来预测不同商店中某些商品的一系列销售额。对于每个时间序列，如果有数据，它会测试并保存具有最佳 Akaike Information Critera 的模型。但是，它总是给出相同的结果，所以我一定在某个地方遇到了问题，但我找不到它。确实，这是我的模型：

import statsmodels.tsa.api as smt

array = []

for i, row in test.iterrows():
  print("row['shop_id']: ", row['shop_id'], " row['item_id']: ", row['item_id'])
  ts = pd.DataFrame(sales_monthly.loc[pd.IndexSlice[:, [row['shop_id']],[row['item_id']]], :]['item_price'].values*sales_monthly.loc[pd.IndexSlice[:, [row['shop_id']],[row['item_id']]], :]['item_cnt_day'].values).T.iloc[0]
  print(ts.values)
  if ts.values != []:
    best_aic = np.inf
    best_order = None
    best_model = None

    rng = range(5)
    for i in rng:
      for j in rng:
        try:
          tmp_model = smt.ARMA(ts.values, order = (i, j)).fit(method='mle', trand='nc')
          tmp_aic = tmp_model.aic
          if tmp_aic < best_aic:
            best_aic = tmp_aic
            best_order = (i, j)
            best_model = tmp_mdl
        except Exception as e:
          continue
    y_hat = best_model.forecast()[0][0]
    if y_hat<0:
      y_hat = 0
  else:
    y_hat = 0
  print("predicted:", y_hat)
  d = {'id':row['ID'], 'item_cnt_month': y_hat}
  array.append(d)
  print("-------------------")

df = pd.DataFrame(array)
df

它打印出来：

row['shop_id']:  5  row['item_id']:  5037
[2599.  2599.  3998.  3998.  1299.  1499.  1499.  2997.5  749.5]
predicted: 15001.056988528915
-------------------
row['shop_id']:  5  row['item_id']:  5320
[]
predicted: 0
-------------------
row['shop_id']:  5  row['item_id']:  5233
[2697. 1198.  599. 2997. 1199.]
predicted: 15001.056988528915
-------------------
row['shop_id']:  5  row['item_id']:  5232
[599.]
predicted: 0
-------------------
row['shop_id']:  5  row['item_id']:  5268
[]
predicted: 0
-------------------
row['shop_id']:  5  row['item_id']:  5039
[5198.  6597.  2599.  5197.   749.5 1499. ]
predicted: 15001.056988528915
-------------------
row['shop_id']:  5  row['item_id']:  5041
[11497.  7998.]
predicted: 15001.056988528915
-------------------
row['shop_id']:  5  row['item_id']:  5046
[ 299. 1495.  349.  349.]
predicted: 15001.056988528915
-------------------
...

我不明白，因为当我尝试一一预测它们时效果很好。例如，使用以下内容ts.values：

array([ 7770.        , 15640.        , 15540.        , 12950.        ,
       30775.        , 15950.        , 12760.        , 22330.        ,
       15949.64285714,     0.        ,  6380.        ,  3190.        ,
        9670.        ,  3490.        ,  3090.        ,  3490.        ,
        3490.        , 10470.        ])

一世：

import statsmodels.tsa.api as smt

# pick best order by Aikake Information Criterion smallest aic wins
best_aic = np.inf
best_order = None
best_mdl = None

rng = range(5)
for i in rng:
  for j in rng:
    try:
      tmp_mdl = smt.ARMA(ts.values, order = (i, j)).fit(method='mle', trand='nc')
      tmp_aic = tmp_mdl.aic
      if tmp_aic < best_aic:
        best_aic = tmp_aic
        best_order = (i, j)
        best_mdl = tmp_mdl
    except:
      continue
    
print(best_aic, best_order)
print('aic: {} | order: {}'.format(best_aic, best_order))
print(best_mdl.forecast()[0][0])

他回来了：

204.39695560597815 (0, 0)
aic: 204.39695560597815 | order: (0, 0)
1712.4545454545446

Revolucion for Monica

Asked: 2020-08-05 19:25:48 +0000 UTC

如何将具有相同键的线连接成一条线？

2

我有一个 DataFrame 并想创建另一列，该列组合名称以和中相同值开头的Answer列QID。

这是数据及其示例：

    QID     Category    Text    QType   Question    Answer0     Answer1
0   16  Automotive  Access to car   Single  Do you have access to a car?    I own a car/cars    I own a car/cars
1   16  Automotive  Access to car   Single  Do you have access to a car?    I lease/ have a company car     I lease/have a company car
2   16  Automotive  Access to car   Single  Do you have access to a car?    I have access to a car/cars     I have access to a car/cars
3   16  Automotive  Access to car   Single  Do you have access to a car?    No, I don’t have access to a car/cars   No, I don't have access to a car
4   16  Automotive  Access to car   Single  Do you have access to a car?    Prefer not to say   Prefer not to say
5   17  Automotive  Make of car/cars    Multiple    If you own/lease a car(s), which brand are they?    Audi    Audi
6   17  Automotive  Make of car/cars    Multiple    If you own/lease a car(s), which brand are they?    Alfa Romeo  Alfa Romeo
7   17  Automotive  Make of car/cars    Multiple    If you own/lease a car(s), which brand are they?    BMW     BMW
8   17  Automotive  Make of car/cars    Multiple    If you own/lease a car(s), which brand are they?    Cadillac    Cadillac
9   17  Automotive  Make of car/cars    Multiple    If you own/lease a car(s), which brand are they?    Chevrolet   Chevrolet
10  17  Automotive  Make of car/cars    Multiple    If you own/lease a car(s), which brand are they?    Chrysler    Chrysler
11  17  Automotive  Make of car/cars    Multiple    If you own/lease a car(s), which brand are they?    Citroen     Citroen
12  17  Automotive  Make of car/cars    Multiple    If you own/lease a car(s), which brand are they?    Daihatsu    Daihatsu
13  17  Automotive  Make of car/cars    Multiple    If you own/lease a car(s), which brand are they?    Fiat    Fiat
14  17  Automotive  Make of car/cars    Multiple    If you own/lease a car(s), which brand are they?    Ford    Ford
15  17  Automotive  Make of car/cars    Multiple    If you own/lease a car(s), which brand are they?    Honda   Honda
16  17  Automotive  Make of car/cars    Multiple    If you own/lease a car(s), which brand are they?    Hyundai     Hyundai
...

想得到这样的东西：

    QID     Category    Text    QType   Question    Answer0     Answer1     Answer3     Answer4     Answer5     Answer6     Answer7     Answer8     Answer9     Answer10    Answer11     Answer12     ...      
4   16  Automotive  Access to car   Single  Do you have access to a car?    I own a car/cars    I lease/ have a company car     I have access to a car/cars     No, I don’t have access to a car/cars   Prefer not to say       
5   17  Automotive  Make of car/cars    Multiple    If you own/lease a car(s), which brand are they?    Audi    Alfa Romeo  BMW     Cadillac    Chevrolet   Chrysler    Citroen     ...

我可以结合捐赠/静态数量的列，其名称以和中的相同值Answer开头QID：

df = pd.DataFrame('path/to/file')

# ленивый - в первую очередь нужны атрибуты, кроме столбцов QID и Answer
agg = {col:"first" for col in list(df.columns) if col!="QID" and "Answer" not in col}
# получить список всех ответов в Answer0 для QID
agg = {**agg, **{"Answer0":lambda s: list(s)}}

# вспомогательная функция для вызова ряда. не нужна, но делает более читабельной.
def ans(r, i):
    return "" if i>=len(r["AnswerT"]) else r["AnswerT"][i]

# разделить список от объединения обратно на столбцы с помощью назначения
# переименовать Answer0 в AnserT из агрегирования, чтобы на него можно было ссылаться.  
# AnswerT бросай, когда больше не хочешь.
dfgrouped = df.groupby("QID").agg(agg).reset_index().rename(columns={"Answer0":"AnswerT"}).assign(
    Answer0=lambda dfa: dfa.apply(lambda r: ans(r, 0), axis=1),
    Answer1=lambda dfa: dfa.apply(lambda r: ans(r, 1), axis=1),
    Answer2=lambda dfa: dfa.apply(lambda r: ans(r, 2), axis=1),
    Answer3=lambda dfa: dfa.apply(lambda r: ans(r, 3), axis=1),
    Answer4=lambda dfa: dfa.apply(lambda r: ans(r, 4), axis=1),
    Answer5=lambda dfa: dfa.apply(lambda r: ans(r, 5), axis=1),
    Answer6=lambda dfa: dfa.apply(lambda r: ans(r, 6), axis=1),
).drop("AnswerT", axis=1)

print(dfgrouped.to_string(index=False))

以及如何组合动态数量的Answer列，其中这些列的名称以和中的相同值开头QID？

Revolucion for Monica

Asked: 2020-08-15 23:07:42 +0000 UTC

计算余弦相似度时如何去掉NoneType值？

0

我尝试使用以下两个函数在数据帧的两列与另一列中的“space.distance.cosine”空间之间创建余弦相似度：

def cosine_sim(x):
    li = []
    for item in x["sent_emb"]:
        li.append(spatial.distance.cosine(item,x["quest_emb"][0]))
    return li

def predictions(train):

    train["cosine_sim"] = train.apply(cosine_sim, axis = 1)

这两列如下所示：

    sent_emb                                            quest_emb
0   [[0.030376578, 0.044331014, 0.081356354, 0.062...   [[0.01491953, 0.021973763, 0.021364095, 0.0393...
1   [[0.030376578, 0.044331014, 0.081356354, 0.062...   [[0.04444952, 0.028005758, 0.030357722, 0.0375...
2   [[0.030376578, 0.044331014, 0.081356354, 0.062...   [[0.03949683, 0.04509903, 0.018089347, 0.07667...
   ...

但是，我得到了一个“TypeError”，其中一些值似乎是“NoneType”和“float”。你知道我可以如何过滤这种数据以将其设置为一两个零，这不会阻止我应用我的

TypeError: ("unsupported operand type(s) for *: 'NoneType' and 'float'", 'occurred at index 473')

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-23-af28fc11a9d3> in <module>()
----> 1 predicted = predictions(train)

<ipython-input-22-1699cf33d87c> in predictions(train)
      1 def predictions(train):
      2 
----> 3     train["cosine_sim"] = train.apply(cosine_sim, axis = 1)
      4     train["diff"] = (train["quest_emb"] - train["sent_emb"])**2
      5     train["euclidean_dis"] = train["diff"].apply(lambda x: list(np.sum(x, axis = 1)))

~/Documents/programming/mybot/mybotenv/lib/python3.5/site-packages/pandas/core/frame.py in apply(self, func, axis, broadcast, raw, reduce, result_type, args, **kwds)
   6012                          args=args,
   6013                          kwds=kwds)
-> 6014         return op.get_result()
   6015 
   6016     def applymap(self, func):

~/Documents/programming/mybot/mybotenv/lib/python3.5/site-packages/pandas/core/apply.py in get_result(self)
    140             return self.apply_raw()
    141 
--> 142         return self.apply_standard()
    143 
    144     def apply_empty_result(self):

~/Documents/programming/mybot/mybotenv/lib/python3.5/site-packages/pandas/core/apply.py in apply_standard(self)
    246 
    247         # compute the result using the series generator
--> 248         self.apply_series_generator()
    249 
    250         # wrap results

~/Documents/programming/mybot/mybotenv/lib/python3.5/site-packages/pandas/core/apply.py in apply_series_generator(self)
    275             try:
    276                 for i, v in enumerate(series_gen):
--> 277                     results[i] = self.f(v)
    278                     keys.append(v.name)
    279             except Exception as e:

<ipython-input-20-276aa09bc25e> in cosine_sim(x)
      2     li = []
      3     for item in x["sent_emb"]:
----> 4         li.append(spatial.distance.cosine(item,x["quest_emb"][0]))
      5     return li

~/Documents/programming/mybot/mybotenv/lib/python3.5/site-packages/scipy/spatial/distance.py in cosine(u, v, w)
    742     # cosine distance is also referred to as 'uncentered correlation',
    743     #   or 'reflective correlation'
--> 744     return correlation(u, v, w=w, centered=False)
    745 
    746 

~/Documents/programming/mybot/mybotenv/lib/python3.5/site-packages/scipy/spatial/distance.py in correlation(u, v, w, centered)
    693         u = u - umu
    694         v = v - vmu
--> 695     uv = np.average(u * v, weights=w)
    696     uu = np.average(np.square(u), weights=w)
    697     vv = np.average(np.square(v), weights=w)

TypeError: ("unsupported operand type(s) for *: 'NoneType' and 'float'", 'occurred at index 473')

Revolucion for Monica

Asked: 2020-08-15 21:11:52 +0000 UTC

如何按频率降序显示所有四个字母的单词？

1

我试图找出根据文本长度和分布频率过滤和排序文本的最佳方法。

在聊天语料库 (text5) 中找到所有四个字母的单词。在频率分布 (FreqDist) 的帮助下，按频率降序显示这些单词。

使用 Python 进行自然语言处理，来自 Steven Bird、Ewan Klein 和 Edward Loper的Ch1

即，在 Chat Corpus (text5) 中查找所有四个字母的单词。使用频率分布 (FreqDist) 以频率降序显示这些单词。

我尝试过这个。我认为这按频率降序显示，但我不确定这是否是最有效的方式，因为我必须将它写成三行。

>>> from nltk.books import *
>>> aux = sorted(w for w in set(text2) if len(w) == 4)
>>> aux.reverse()
>>> aux
[u'zeal', u'your', u'year', u'yard'...

在迭代不同的时间序列时，ARMA statsmodel 停留在一个预测上

如何将具有相同键的线连接成一条线？

计算余弦相似度时如何去掉NoneType值？

如何按频率降序显示所有四个字母的单词？

我看不懂措辞

请求的模块“del”不提供名为“default”的导出

"!+tab" 在 HTML 的 vs 代码中不起作用

我正在尝试解决“猜词”的问题。Python

可以使用哪些命令将当前指针移动到指定的提交而不更改工作目录中的文件？

Python解析野莓

问题：“警告：检查最新版本的 pip 时出错。”

帮助编写一个用值填充变量的循环。解决这个问题

尽管依赖数组为空，但在渲染上调用了 2 次 useEffect

数据不通过 Telegram.WebApp.sendData 发送

Revolucion for Monica's questions