Višestruka linearna regresija se primjenjuje na promjenljive koje su metričkog tipa. U praksi se često dešava da želimo da analiziramo promjenljivu koja uzima konačan broj vrijednosti i tu dolazimo do problema klasifikacije podataka. Često se javljaju promjenljive indikatorskog tipa koje uzimaju samo dvije vrijednosti, npr. muški/ženski pol, ispravan/neispravan proizvod i slično. Takvu veličinu takođe želimo da uklopimo u neki model u kojem ona zavisi od drugih promjenljivih. Ako primijenimo običnu regresiju na takav problem, dobićemo neke realne vrijednosti i teško možemo da interpretiramo takav rezultat, odnosno da kažemo kojoj klasi pripada dati podatak. Ideja je da se napravi model za vjerovatnoću \(p\) da dati indikator uzme vrijednost 1. Model (binarne) logističke regresije ima oblik:

\[ p(x)=\frac{1}{1+e^{-(\beta_0+\beta_1 x_1 +\dots +\beta_p x_P)}} \] odnosno inverz (logit funkcija):

\[ log\left(\frac{p(x)}{1-p(x)}\right)=\beta_0+\beta_1 x_1 +\dots +\beta_p x_P \] Ako se u modelu na osnovu datih prediktora dobije \(p>0.5\) onda taj podatak klasifikujemo kao “uspjeh”, u suprotnom kao “neuspjeh”.

Primjeri

baza<-read.csv("http://www.math.rs/p/files/1458635682-70-baza.txt", header=FALSE)

Primjećujemo da promjenljiva \(V3\) uzima vrijednosti 0 i 1.

model.log <- glm(V3 ~ V1 + V2, family = binomial)
summary(model.log)

Call:
glm(formula = V3 ~ V1 + V2, family = binomial)

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-2.19287  -0.18009   0.01577   0.19578   1.78527  

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept) -25.16133    5.79836  -4.339 1.43e-05 ***
V1            0.20623    0.04800   4.297 1.73e-05 ***
V2            0.20147    0.04862   4.144 3.42e-05 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 134.6  on 99  degrees of freedom
Residual deviance:  40.7  on 97  degrees of freedom
AIC: 46.7

Number of Fisher Scoring iterations: 7
# Vrijednosti za p dobijene modelom
V3.p <- fitted(model.log)
# Klasifikujemo na osnovu p
V3.klas <- ifelse(V3.p >= 0.5, 1, 0)
# Nekada je pozeljno reci samo da sa nekom vjerovatnocom podatak pripada odredjenoj klasi.
# Na primjer, ako je p=0.5 nemamo preciznu odluku.
# Brojimo koliko je puta model "pogodio"
mean(V3 == V3.klas) # u 89% slucajeva je pogodio
[1] 0.89
# Sada hocemo dobijeni model da iskoristimo za predvidjanje. Neka je dato da je student osvojio 
# na prijemnom V1=20 i V2=100 poena. Hocemo da procjenimo da li ce upisati master.
predict(model.log, newdata =  data.frame(V1 = 20, V2 = 100), type = "response")
        1 
0.2912049 

Primjer 2

library(ISLR)
baza <- Smarket
names(baza)
[1] "Year"      "Lag1"      "Lag2"      "Lag3"      "Lag4"      "Lag5"      "Volume"    "Today"    
[9] "Direction"
attach(baza)
The following objects are masked from baza (pos = 3):

    Direction, Lag1, Lag2, Lag3, Lag4, Lag5, Today, Volume, Year
mdl <- glm(Direction ~ Lag1 + Lag2 + Lag3 + Lag4 + Lag5 + Volume, family =
             binomial)
             
summary(mdl)

Call:
glm(formula = Direction ~ Lag1 + Lag2 + Lag3 + Lag4 + Lag5 + 
    Volume, family = binomial)

Deviance Residuals: 
   Min      1Q  Median      3Q     Max  
-1.446  -1.203   1.065   1.145   1.326  

Coefficients:
             Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.126000   0.240736  -0.523    0.601
Lag1        -0.073074   0.050167  -1.457    0.145
Lag2        -0.042301   0.050086  -0.845    0.398
Lag3         0.011085   0.049939   0.222    0.824
Lag4         0.009359   0.049974   0.187    0.851
Lag5         0.010313   0.049511   0.208    0.835
Volume       0.135441   0.158360   0.855    0.392

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 1731.2  on 1249  degrees of freedom
Residual deviance: 1727.6  on 1243  degrees of freedom
AIC: 1741.6

Number of Fisher Scoring iterations: 3
mdl.p <- fitted(mdl)
# Ako je p>=0.5, dodjeljujemo "Up", inace "Down"
mdl.fit <- ifelse(mdl.p > 0.5, "Up", "Down")
mean(mdl.fit == Direction)
[1] 0.5216
# Sada hocemo da pravimo model na osnovu poduzorka iz date baze, odnosno
# izdvojicemo one podatke do 2005 godine i na osnovu njih napraviti model, a
# potom predvidjati vrijednosti na osnovu preostalog uzorka
uzorak1 <- Year<2005
mdl1 <- glm(
  Direction ~ Lag1 + Lag2 + Lag3 + Lag4 + Lag5 + Volume,
  data = Smarket,
  family = binomial,
  subset = uzorak1
  )
  summary(mdl1)

Call:
glm(formula = Direction ~ Lag1 + Lag2 + Lag3 + Lag4 + Lag5 + 
    Volume, family = binomial, data = Smarket, subset = uzorak1)

Deviance Residuals: 
   Min      1Q  Median      3Q     Max  
-1.302  -1.190   1.079   1.160   1.350  

Coefficients:
             Estimate Std. Error z value Pr(>|z|)
(Intercept)  0.191213   0.333690   0.573    0.567
Lag1        -0.054178   0.051785  -1.046    0.295
Lag2        -0.045805   0.051797  -0.884    0.377
Lag3         0.007200   0.051644   0.139    0.889
Lag4         0.006441   0.051706   0.125    0.901
Lag5        -0.004223   0.051138  -0.083    0.934
Volume      -0.116257   0.239618  -0.485    0.628

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 1383.3  on 997  degrees of freedom
Residual deviance: 1381.1  on 991  degrees of freedom
AIC: 1395.1

Number of Fisher Scoring iterations: 3
 
  mdl1.p <- predict(mdl1, newdata = Smarket[!uzorak1, ], type = "response")
  
  mdl1.fit <- ifelse(mdl1.p > 0.5, "Up", "Down")
  Direction.2005 <- Direction[!uzorak1]
  
  # Provjeravamo koliko je model pogodio
  
  table(mdl1.fit, Direction.2005)
        Direction.2005
mdl1.fit Down Up
    Down   77 97
    Up     34 44
  mean(mdl1.fit == Direction.2005)
[1] 0.4801587
# Probamo model sa manje prediktora
mdl2 <- glm(Direction~Lag1+Lag2,
            data=Smarket,family=binomial, subset=uzorak1)
mdl2.p <- predict(mdl2 ,newdata=Smarket[!uzorak1,],type="response") 
mdl2.fit <- ifelse(mdl2.p > 0.5, "Up", "Down")
table(mdl2.fit, Direction.2005)
        Direction.2005
mdl2.fit Down  Up
    Down   35  35
    Up     76 106
mean(mdl2.fit == Direction.2005)
[1] 0.5595238
LS0tDQp0aXRsZTogIkxvZ2lzdGnEjWthIHJlZ3Jlc2lqYSINCm91dHB1dDogaHRtbF9ub3RlYm9vaw0KLS0tDQoNClZpxaFlc3RydWthIGxpbmVhcm5hIHJlZ3Jlc2lqYSBzZSBwcmltamVuanVqZSBuYSBwcm9tamVubGppdmUga29qZSBzdSBtZXRyacSNa29nIHRpcGEuIFUgcHJha3NpIHNlIMSNZXN0byBkZcWhYXZhIGRhIMW+ZWxpbW8gZGEgYW5hbGl6aXJhbW8gcHJvbWplbmxqaXZ1IGtvamEgdXppbWEga29uYcSNYW4gYnJvaiB2cmlqZWRub3N0aSBpIHR1IGRvbGF6aW1vIGRvIHByb2JsZW1hIGtsYXNpZmlrYWNpamUgcG9kYXRha2EuIMSMZXN0byBzZSBqYXZsamFqdSBwcm9tamVubGppdmUgaW5kaWthdG9yc2tvZyB0aXBhIGtvamUgdXppbWFqdSBzYW1vIGR2aWplIHZyaWplZG5vc3RpLCBucHIuIG11xaFraS/FvmVuc2tpIHBvbCwgaXNwcmF2YW4vbmVpc3ByYXZhbiBwcm9penZvZCBpIHNsacSNbm8uIFRha3Z1IHZlbGnEjWludSB0YWtvxJFlIMW+ZWxpbW8gZGEgdWtsb3BpbW8gdSBuZWtpIG1vZGVsIHUga29qZW0gb25hIHphdmlzaSBvZCBkcnVnaWggcHJvbWplbmxqaXZpaC4gQWtvIHByaW1pamVuaW1vIG9iacSNbnUgcmVncmVzaWp1IG5hIHRha2F2IHByb2JsZW0sIGRvYmnEh2VtbyBuZWtlIHJlYWxuZSB2cmlqZWRub3N0aSBpIHRlxaFrbyBtb8W+ZW1vIGRhIGludGVycHJldGlyYW1vIHRha2F2IHJlenVsdGF0LCBvZG5vc25vIGRhIGthxb5lbW8ga29qb2oga2xhc2kgcHJpcGFkYSBkYXRpIHBvZGF0YWsuIElkZWphIGplIGRhIHNlIG5hcHJhdmkgbW9kZWwgemEgdmplcm92YXRub8SHdSAkcCQgZGEgZGF0aSBpbmRpa2F0b3IgdXptZSB2cmlqZWRub3N0IDEuIE1vZGVsIChiaW5hcm5lKSBsb2dpc3RpxI1rZSByZWdyZXNpamUgaW1hIG9ibGlrOg0KDQokJCBwKHgpPVxmcmFjezF9ezErZV57LShcYmV0YV8wK1xiZXRhXzEgeF8xICtcZG90cyArXGJldGFfcCB4X1ApfX0gICAkJA0Kb2Rub3NubyBpbnZlcnogKGxvZ2l0IGZ1bmtjaWphKToNCg0KJCQgbG9nXGxlZnQoXGZyYWN7cCh4KX17MS1wKHgpfVxyaWdodCk9XGJldGFfMCtcYmV0YV8xIHhfMSArXGRvdHMgK1xiZXRhX3AgeF9QICAgJCQNCkFrbyBzZSB1IG1vZGVsdSBuYSBvc25vdnUgZGF0aWggcHJlZGlrdG9yYSBkb2JpamUgJHA+MC41JCBvbmRhIHRhaiBwb2RhdGFrIGtsYXNpZmlrdWplbW8ga2FvICJ1c3BqZWgiLCB1IHN1cHJvdG5vbSBrYW8gIm5ldXNwamVoIi4NCg0KDQojIyMgUHJpbWplcmkNCg0KYGBge3J9DQoNCmJhemE8LXJlYWQuY3N2KCJodHRwOi8vd3d3Lm1hdGgucnMvcC9maWxlcy8xNDU4NjM1NjgyLTcwLWJhemEudHh0IiwgaGVhZGVyPUZBTFNFKQ0KYmF6YQ0KIyBCYXphIHNhZHJ6aSByZXp1bHRhdGUgc3R1ZGVuYXRhIG5hIHByaWplbW5pbSBpc3BpdGltYSBuYSBuZWtvbSBmYWt1bHRldHUgemEgdXBpcyBuYSBtYXN0ZXIgc3R1ZGlqZSAocHJ2ZSBkdmlqZSBrb2xvbmUpDQojIGRvayB0cmVjYSBqZSBpbmRpa2F0b3IgZGEgbGkgamUgZGF0aSBzdHVkZW50IHVwaXNhbyBpbGkgbmlqZS4NCg0KYGBgDQoNClByaW1qZcSHdWplbW8gZGEgcHJvbWplbmxqaXZhICRWMyQgdXppbWEgdnJpamVkbm9zdGkgMCBpIDEuDQoNCg0KYGBge3J9DQphdHRhY2goYmF6YSkNCnBsb3QoVjN+VjErVjIpDQoNCmBgYA0KDQoNCmBgYHtyfQ0KDQptb2RlbC5sb2cgPC0gZ2xtKFYzIH4gVjEgKyBWMiwgZmFtaWx5ID0gYmlub21pYWwpDQpzdW1tYXJ5KG1vZGVsLmxvZykNCg0KIyBWcmlqZWRub3N0aSB6YSBwIGRvYmlqZW5lIG1vZGVsb20NCg0KVjMucCA8LSBmaXR0ZWQobW9kZWwubG9nKQ0KDQojIEtsYXNpZmlrdWplbW8gbmEgb3Nub3Z1IHANCg0KVjMua2xhcyA8LSBpZmVsc2UoVjMucCA+PSAwLjUsIDEsIDApDQoNCiMgTmVrYWRhIGplIHBvemVsam5vIHJlY2kgc2FtbyBkYSBzYSBuZWtvbSB2amVyb3ZhdG5vY29tIHBvZGF0YWsgcHJpcGFkYSBvZHJlZGplbm9qIGtsYXNpLg0KIyBOYSBwcmltamVyLCBha28gamUgcD0wLjUgbmVtYW1vIHByZWNpem51IG9kbHVrdS4NCg0KIyBCcm9qaW1vIGtvbGlrbyBqZSBwdXRhIG1vZGVsICJwb2dvZGlvIg0KDQptZWFuKFYzID09IFYzLmtsYXMpICMgdSA4OSUgc2x1Y2FqZXZhIGplIHBvZ29kaW8NCg0KIyBTYWRhIGhvY2VtbyBkb2JpamVuaSBtb2RlbCBkYSBpc2tvcmlzdGltbyB6YSBwcmVkdmlkamFuamUuIE5la2EgamUgZGF0byBkYSBqZSBzdHVkZW50IG9zdm9qaW8gDQojIG5hIHByaWplbW5vbSBWMT0yMCBpIFYyPTEwMCBwb2VuYS4gSG9jZW1vIGRhIHByb2NqZW5pbW8gZGEgbGkgY2UgdXBpc2F0aSBtYXN0ZXIuDQoNCnByZWRpY3QobW9kZWwubG9nLCBuZXdkYXRhID0gIGRhdGEuZnJhbWUoVjEgPSAyMCwgVjIgPSAxMDApLCB0eXBlID0gInJlc3BvbnNlIikNCg0KDQoNCmBgYA0KDQojIyMjIFByaW1qZXIgMg0KDQpgYGB7cn0NCg0KbGlicmFyeShJU0xSKQ0KDQpiYXphIDwtIFNtYXJrZXQNCm5hbWVzKGJhemEpDQoNCmF0dGFjaChiYXphKQ0KDQptZGwgPC0gZ2xtKERpcmVjdGlvbiB+IExhZzEgKyBMYWcyICsgTGFnMyArIExhZzQgKyBMYWc1ICsgVm9sdW1lLCBmYW1pbHkgPQ0KICAgICAgICAgICAgIGJpbm9taWFsKQ0KICAgICAgICAgICAgIA0Kc3VtbWFyeShtZGwpDQoNCm1kbC5wIDwtIGZpdHRlZChtZGwpDQoNCiMgQWtvIGplIHA+PTAuNSwgZG9kamVsanVqZW1vICJVcCIsIGluYWNlICJEb3duIg0KDQptZGwuZml0IDwtIGlmZWxzZShtZGwucCA+IDAuNSwgIlVwIiwgIkRvd24iKQ0KDQptZWFuKG1kbC5maXQgPT0gRGlyZWN0aW9uKQ0KDQojIFNhZGEgaG9jZW1vIGRhIHByYXZpbW8gbW9kZWwgbmEgb3Nub3Z1IHBvZHV6b3JrYSBpeiBkYXRlIGJhemUsIG9kbm9zbm8NCiMgaXpkdm9qaWNlbW8gb25lIHBvZGF0a2UgZG8gMjAwNSBnb2RpbmUgaSBuYSBvc25vdnUgbmppaCBuYXByYXZpdGkgbW9kZWwsIGENCiMgcG90b20gcHJlZHZpZGphdGkgdnJpamVkbm9zdGkgbmEgb3Nub3Z1IHByZW9zdGFsb2cgdXpvcmthDQoNCnV6b3JhazEgPC0gWWVhcjwyMDA1DQptZGwxIDwtIGdsbSgNCiAgRGlyZWN0aW9uIH4gTGFnMSArIExhZzIgKyBMYWczICsgTGFnNCArIExhZzUgKyBWb2x1bWUsDQogIGRhdGEgPSBTbWFya2V0LA0KICBmYW1pbHkgPSBiaW5vbWlhbCwNCiAgc3Vic2V0ID0gdXpvcmFrMQ0KICApDQogIHN1bW1hcnkobWRsMSkNCiANCiAgbWRsMS5wIDwtIHByZWRpY3QobWRsMSwgbmV3ZGF0YSA9IFNtYXJrZXRbIXV6b3JhazEsIF0sIHR5cGUgPSAicmVzcG9uc2UiKQ0KICANCiAgbWRsMS5maXQgPC0gaWZlbHNlKG1kbDEucCA+IDAuNSwgIlVwIiwgIkRvd24iKQ0KICBEaXJlY3Rpb24uMjAwNSA8LSBEaXJlY3Rpb25bIXV6b3JhazFdDQogIA0KICAjIFByb3ZqZXJhdmFtbyBrb2xpa28gamUgbW9kZWwgcG9nb2Rpbw0KICANCiAgdGFibGUobWRsMS5maXQsIERpcmVjdGlvbi4yMDA1KQ0KICBtZWFuKG1kbDEuZml0ID09IERpcmVjdGlvbi4yMDA1KQ0KDQoNCiMgUHJvYmFtbyBtb2RlbCBzYSBtYW5qZSBwcmVkaWt0b3JhDQptZGwyIDwtIGdsbShEaXJlY3Rpb25+TGFnMStMYWcyLA0KICAgICAgICAgICAgZGF0YT1TbWFya2V0LGZhbWlseT1iaW5vbWlhbCwgc3Vic2V0PXV6b3JhazEpDQptZGwyLnAgPC0gcHJlZGljdChtZGwyICxuZXdkYXRhPVNtYXJrZXRbIXV6b3JhazEsXSx0eXBlPSJyZXNwb25zZSIpIA0KbWRsMi5maXQgPC0gaWZlbHNlKG1kbDIucCA+IDAuNSwgIlVwIiwgIkRvd24iKQ0KDQoNCnRhYmxlKG1kbDIuZml0LCBEaXJlY3Rpb24uMjAwNSkNCm1lYW4obWRsMi5maXQgPT0gRGlyZWN0aW9uLjIwMDUpDQoNCg0KYGBgDQoNCg0KDQoNCg0KDQoNCg0KDQoNCg0KDQoNCg0KDQoNCg0K