为每一列获取唯一值的好方法DataFrame
。
数据框: https ://yadi.sk/i/CiNCLc2juUxaXw
我的解决方案:
adult_train_drop = adult_train.drop(['Age', 'fnlwgt', 'Education_Num', 'Capital_Gain', 'Capital_Loss', 'Hours_per_week'], axis=1)
for col in adult_train_drop:
print(col, ': ', adult_train[col].unique())
print()
结果:
Workclass : [' State-gov' ' Self-emp-not-inc' ' Private' ' Federal-gov' ' Local-gov'
nan ' Self-emp-inc' ' Without-pay' ' Never-worked']
Education : [' Bachelors' ' HS-grad' ' 11th' ' Masters' ' 9th' ' Some-college'
' Assoc-acdm' ' Assoc-voc' ' 7th-8th' ' Doctorate' ' Prof-school'
' 5th-6th' ' 10th' ' 1st-4th' ' Preschool' ' 12th']
Martial_Status : [' Never-married' ' Married-civ-spouse' ' Divorced'
' Married-spouse-absent' ' Separated' ' Married-AF-spouse' ' Widowed']
Occupation : [' Adm-clerical' ' Exec-managerial' ' Handlers-cleaners' ' Prof-specialty'
' Other-service' ' Sales' ' Craft-repair' ' Transport-moving'
' Farming-fishing' ' Machine-op-inspct' ' Tech-support' nan
' Protective-serv' ' Armed-Forces' ' Priv-house-serv']
Relationship : [' Not-in-family' ' Husband' ' Wife' ' Own-child' ' Unmarried'
' Other-relative']
Race : [' White' ' Black' ' Asian-Pac-Islander' ' Amer-Indian-Eskimo' ' Other']
Sex : [' Male' ' Female']
Country : [' United-States' ' Cuba' ' Jamaica' ' India' nan ' Mexico' ' South'
' Puerto-Rico' ' Honduras' ' England' ' Canada' ' Germany' ' Iran'
' Philippines' ' Italy' ' Poland' ' Columbia' ' Cambodia' ' Thailand'
' Ecuador' ' Laos' ' Taiwan' ' Haiti' ' Portugal' ' Dominican-Republic'
' El-Salvador' ' France' ' Guatemala' ' China' ' Japan' ' Yugoslavia'
' Peru' ' Outlying-US(Guam-USVI-etc)' ' Scotland' ' Trinadad&Tobago'
' Greece' ' Nicaragua' ' Vietnam' ' Hong' ' Ireland' ' Hungary'
' Holand-Netherlands']
Target : [' <=50K' ' >50K']
有可能得到类似的结果,但是
- 使代码本身更紧凑(一行);
- 答案也会更好,也
DataFrame
类似。
我找到了一个非常简短且合乎逻辑的选择:
结论: