Editor's note: The author of this article is Bai Li, a 17th-level student of Computer Science at the University of Waterloo. He has written a tutorial on the probability of finding a girlfriend using logistic regression on his blog. On Zhijun's friendly reminder, this article is intended to share project ideas, please do not easily check in.
The University of Waterloo is a famous science and engineering university in Canada and one of the best universities in North America. Our mathematics, computer science and engineering disciplines rank among the top in the world. However, such schools have an obvious characteristic: more boys than girls. What's worse is that the University of Waterloo is notoriously lack of social activities, so I guess I, like many CS classmates in the school, can't find a girlfriend, and I won't find it in my life.
Some people think that love cannot be quantified. This kind of operation is too abstract. For girlfriends or something, let it be. But as a data scientist at the University of Waterloo, I don't agree with this point. Computer professionals will never give up easily! Without social interaction, wouldn't it be enough to use machine learning to help complete the career of finding female votes?
Let me talk about the detailed tutorials below. I have moved the bench and picked up the small books!
First of all, we need to make it clear that the problem of this project is: what qualities do boys at the University of Waterloo need to find a girlfriend? First of all, most people think that a higher salary is definitely more attractive to girls, and height or body shape is a plus. What we want to determine is which factors can be quantified with predictions and which are just guesses without data support.
I probably thought about these features:
Dating (target variable): boys who have girlfriends, or boys who have been in love for at least six months in the past five years
Are you an international student
Whether the major is CS, SE or ECE
Excellent grades, and found an internship with Niu X
Is humorous and talkative
Whether the personality is outgoing and willing to make new friends
Are you confident
Taller than me (175+)
Wear glasses or not
People who go to the gym often or who like various sports
Pay attention to dressing
Permanently live in Canada or have lived and worked in Canada for at least five years
Is Asian
You may have noticed that some of these characteristics are too subjective. For example, how can a person be "fun" and "humorous"? In this project, I divided people into two parts and marked half of them with 1 and the Other half with 0. So based on my observations of people around me, I determine their ability to find a girlfriend.
It should be noted that this article is not an objective and rigorous statistical study, but rather provides a way of thinking.
In order to collect data, I made a table of the information of every imaginable classmate, using 1 or 0 to represent "yes" or "no", and finally there were 70 classmates' information. Attention, everyone at the University of Waterloo, the boys who have spoken to Bai Li in the past two years, you may have been used as training data.
Analysis process
First, we perform Fisher's exact test on various other variables on the target variable Dating, and finally get the three most important variables:
Fitness-boys who love fitness and sports are twice as likely to have a girlfriend than other boys (p-value=0.02)
Glasses-boys who don’t wear glasses are 70% more likely to have a girlfriend than boys who wear glasses (p-value=0.08) (Don’t play with your mobile phone before going to bed! Do eye exercises, everyone!)
Confidence-confident male Sen is the most handsome (p-value=0.09)
Sure enough, the girls all like boys with good figures and energetic. But I’m a little surprised that wearing glasses is so important? Some people may associate wearing glasses with nerd, but there are really papers that have studied this issue, and it turns out that people really think that glasses will make the charm greatly compromised.
Some variables may be able to predict the probability of a successful date, but because the sample size is too small, the result is very uncertain:
International students are more likely to succeed than local students
Asian boys have a smaller advantage over boys of other races
When other factors remain unchanged, CS major boys seem to be more popular
The remaining factors such as height, grades, dressing and grooming have little to do with whether you can find a girlfriend. Even if you go to work at Facebook headquarters, I'm sorry, should you be single or single.
Here are the full results of the experiment:
Variable: international
N(international)=10, N(~international)=60
p(dating|international)=0.60, p(dating|~international)=0.38
p-value=0.299
Variable: cs
N(cs)=56, N(~cs)=14
p(dating|cs)=0.45, p(dating|~cs)=0.29
p-value=0.368
Variable: career
N(career)=46, N(~career)=24
p(dating|career)=0.43, p(dating|~career)=0.38
p-value=0.799
Variable: interesting
N(interesting)=34, N(~interesting)=36
p(dating|interesting)=0.47, p(dating|~interesting)=0.36
p-value=0.467
Variable: social
N(social)=29, N(~social)=41
p(dating|social)=0.45, p(dating|~social)=0.39
p-value=0.806
Variable: confident
N(confident)=37, N(~confident)=33
p(dating|confident)=0.51, p(dating|~confident)=0.30
p-value=0.092
Variable: tall
N(tall)=26, N(~tall)=44
p(dating|tall)=0.46, p(dating|~tall)=0.39
p-value=0.619
Variable: glasses
N(glasses)=41, N(~glasses)=29
p(dating|glasses)=0.32, p(dating|~glasses)=0.55
p-value=0.084
Variable: gym
N(gym)=22, N(~gym)=48
p(dating|gym)=0.64, p(dating|~gym)=0.31
p-value=0.018
Variable: fashion
N(fashion)=17, N(~fashion)=53
p(dating|fashion)=0.41, p(dating|~fashion)=0.42
p-value=1.000
Variable: canada
N(canada)=31, N(~canada)=39
p(dating|canada)=0.42, p(dating|~canada)=0.41
p-value=1.000
Variable: asian
N(asian)=59, N(~asian)=11
p(dating|asian)=0.37, p(dating|~asian)=0.64
p-value=0.181
Next, we compared the relationship between the variables, which can help identify incorrect model assumptions. Red dots indicate positive connections, and blue indicate negative connections. We only show the relationship of statistical significance <0.1, so most of the variable combinations are blank.
As shown in the picture {have a girlfriend, have confidence, exercise, and don’t wear glasses} are related to each other.
Here I would like to emphasize again that the information I collect is from my classmates, or classmates of classmates, who majored in CS does not represent the entire University of Waterloo students.
So any model trained on this data will reflect the same bias. In the future, I will collect more data to improve the model.
Use logistic regression to predict the probability of love
How to use algorithms to predict the probability of finding a girlfriend? Let's rub our hands together and get ready to start!
I trained a logistic regression GLM to predict the probability of finding a girlfriend from the perspective of each explanatory variable. Using the glmnet and caret packages in R, I trained a GLM with elastic network regularization. Then use standard grid search to optimize the hyperparameters, use leave-one-out cross-validation and optimize the Kappa coefficient in each iteration.
As a result, the cross-validated AUC score on the ROC curve of the model is 0.673, which means that the model's prediction is still somewhat accurate, but there are still many uncertain factors. In the end, I also made a simple calculator for everyone who is interested in.
The editor tested it and selected all "vulnerable factors". This result is really sad
Okay, I won’t say anything, I’m going to the gym to calm down by lifting the iron, and make an appointment for myopia surgery (bye bye manually).
Kenwood Portable Radio,Kenwood Vhf Handheld Radio,Kenwood Dual Band Portable Radio,Kenwood Vhf Portable Radio
Guangzhou Etmy Technology Co., Ltd. , https://www.gzdigitaltalkie.com