Is it not enough to use machine learning to help complete the career of finding female tickets?

Editor's note: The author of this article is Bai Li, a 17th-level student of Computer Science at the University of Waterloo. He has written a tutorial on the probability of finding a girlfriend using logistic regression on his blog. On Zhijun's friendly reminder, this article is intended to share project ideas, please do not easily check in.

The University of Waterloo is a famous science and engineering university in Canada and one of the best universities in North America. Our mathematics, computer science and engineering disciplines rank among the top in the world. However, such schools have an obvious characteristic: more boys than girls. What's worse is that the University of Waterloo is notoriously lack of social activities, so I guess I, like many CS classmates in the school, can't find a girlfriend, and I won't find it in my life.

Some people think that love cannot be quantified. This kind of operation is too abstract. For girlfriends or something, let it be. But as a data scientist at the University of Waterloo, I don't agree with this point. Computer professionals will never give up easily! Without social interaction, wouldn't it be enough to use machine learning to help complete the career of finding female votes?

Let me talk about the detailed tutorials below. I have moved the bench and picked up the small books!

First of all, we need to make it clear that the problem of this project is: what qualities do boys at the University of Waterloo need to find a girlfriend? First of all, most people think that a higher salary is definitely more attractive to girls, and height or body shape is a plus. What we want to determine is which factors can be quantified with predictions and which are just guesses without data support.

I probably thought about these features:

Dating (target variable): boys who have girlfriends, or boys who have been in love for at least six months in the past five years

Are you an international student

Whether the major is CS, SE or ECE

Excellent grades, and found an internship with Niu X

Is humorous and talkative

Whether the personality is outgoing and willing to make new friends

Are you confident

Taller than me (175+)

Wear glasses or not

People who go to the gym often or who like various sports

Pay attention to dressing

Permanently live in Canada or have lived and worked in Canada for at least five years

Is Asian

You may have noticed that some of these characteristics are too subjective. For example, how can a person be "fun" and "humorous"? In this project, I divided people into two parts and marked half of them with 1 and the Other half with 0. So based on my observations of people around me, I determine their ability to find a girlfriend.

It should be noted that this article is not an objective and rigorous statistical study, but rather provides a way of thinking.

In order to collect data, I made a table of the information of every imaginable classmate, using 1 or 0 to represent "yes" or "no", and finally there were 70 classmates' information. Attention, everyone at the University of Waterloo, the boys who have spoken to Bai Li in the past two years, you may have been used as training data.

Analysis process

First, we perform Fisher's exact test on various other variables on the target variable Dating, and finally get the three most important variables:

Fitness-boys who love fitness and sports are twice as likely to have a girlfriend than other boys (p-value=0.02)

Glasses-boys who don’t wear glasses are 70% more likely to have a girlfriend than boys who wear glasses (p-value=0.08) (Don’t play with your mobile phone before going to bed! Do eye exercises, everyone!)

Confidence-confident male Sen is the most handsome (p-value=0.09)

Sure enough, the girls all like boys with good figures and energetic. But I’m a little surprised that wearing glasses is so important? Some people may associate wearing glasses with nerd, but there are really papers that have studied this issue, and it turns out that people really think that glasses will make the charm greatly compromised.

Some variables may be able to predict the probability of a successful date, but because the sample size is too small, the result is very uncertain:

International students are more likely to succeed than local students

Asian boys have a smaller advantage over boys of other races

When other factors remain unchanged, CS major boys seem to be more popular

The remaining factors such as height, grades, dressing and grooming have little to do with whether you can find a girlfriend. Even if you go to work at Facebook headquarters, I'm sorry, should you be single or single.

Here are the full results of the experiment:

Variable: international

N(international)=10, N(~international)=60

p(dating|international)=0.60, p(dating|~international)=0.38

p-value=0.299

Variable: cs

N(cs)=56, N(~cs)=14

p(dating|cs)=0.45, p(dating|~cs)=0.29

p-value=0.368

Variable: career

N(career)=46, N(~career)=24

p(dating|career)=0.43, p(dating|~career)=0.38

p-value=0.799

Variable: interesting

N(interesting)=34, N(~interesting)=36

p(dating|interesting)=0.47, p(dating|~interesting)=0.36

p-value=0.467

Variable: social

N(social)=29, N(~social)=41

p(dating|social)=0.45, p(dating|~social)=0.39

p-value=0.806

Variable: confident

N(confident)=37, N(~confident)=33

p(dating|confident)=0.51, p(dating|~confident)=0.30

p-value=0.092

Variable: tall

N(tall)=26, N(~tall)=44

p(dating|tall)=0.46, p(dating|~tall)=0.39

p-value=0.619

Variable: glasses

N(glasses)=41, N(~glasses)=29

p(dating|glasses)=0.32, p(dating|~glasses)=0.55

p-value=0.084

Variable: gym

N(gym)=22, N(~gym)=48

p(dating|gym)=0.64, p(dating|~gym)=0.31

p-value=0.018

Variable: fashion

N(fashion)=17, N(~fashion)=53

p(dating|fashion)=0.41, p(dating|~fashion)=0.42

p-value=1.000

Variable: canada

N(canada)=31, N(~canada)=39

p(dating|canada)=0.42, p(dating|~canada)=0.41

p-value=1.000

Variable: asian

N(asian)=59, N(~asian)=11

p(dating|asian)=0.37, p(dating|~asian)=0.64

p-value=0.181

Next, we compared the relationship between the variables, which can help identify incorrect model assumptions. Red dots indicate positive connections, and blue indicate negative connections. We only show the relationship of statistical significance <0.1, so most of the variable combinations are blank.

As shown in the picture {have a girlfriend, have confidence, exercise, and don’t wear glasses} are related to each other.

Here I would like to emphasize again that the information I collect is from my classmates, or classmates of classmates, who majored in CS does not represent the entire University of Waterloo students.

So any model trained on this data will reflect the same bias. In the future, I will collect more data to improve the model.

Use logistic regression to predict the probability of love

How to use algorithms to predict the probability of finding a girlfriend? Let's rub our hands together and get ready to start!

I trained a logistic regression GLM to predict the probability of finding a girlfriend from the perspective of each explanatory variable. Using the glmnet and caret packages in R, I trained a GLM with elastic network regularization. Then use standard grid search to optimize the hyperparameters, use leave-one-out cross-validation and optimize the Kappa coefficient in each iteration.

As a result, the cross-validated AUC score on the ROC curve of the model is 0.673, which means that the model's prediction is still somewhat accurate, but there are still many uncertain factors. In the end, I also made a simple calculator for everyone who is interested in.

The editor tested it and selected all "vulnerable factors". This result is really sad

Okay, I won’t say anything, I’m going to the gym to calm down by lifting the iron, and make an appointment for myopia surgery (bye bye manually).

Kenwood portable radio

Kenwood Portable Radio,Kenwood Vhf Handheld Radio,Kenwood Dual Band Portable Radio,Kenwood Vhf Portable Radio

Guangzhou Etmy Technology Co., Ltd. , https://www.gzdigitaltalkie.com