Popular Threads From Wekalist:
List Statistics
- Total Threads: 1792
- Total Posts: 297
Phrases Used to Find This Thread
|
# 1

14-06-2011 05:46 AM
|
|
|
Hi,
I have two csv files which I created; one for train and another for test. The contents are as below:
hasreply , totalsentences , totallines , ratioblanklines ,
totalwords , avgwordlength , totalfnword , ratiofnwords , totalchars ,
ratioletters , ratiodigits , ratioucase , totalspcchars , totalpunc ,
ratiopunc , author
YES ,3,6,0.3423234,34,4.45454,5,0.3432,35,1.23232,0.342342,0.342342,8,5,0.234234, Allen-P
YES ,4,7,1.3423234,35,5.45454,6,1.3432,36,2.23232,1.342342,1.342342,9,6,1.234234, Allen-P
YES ,5,8,2.3423234,36,6.45454,7,2.3432,37,3.23232,2.342342,2.342342,10,7,2.234234, Allen-P
YES ,6,9,3.3423234,37,7.45454,8,3.3432,38,4.23232,3.342342,3.342342,11,8,3.234234, Allen-P
NO ,5,8,2.3423234,36,6.45454,7,2.3432,37,3.23232,2.342342,2.342342,10,7,2.234234, Tanveer
NO ,6,9,3.3423234,37,7.45454,8,3.3432,38,4.23232,3.342342,3.342342,11,8,3.234234, Tanveer
And the test data is as follows:
hasreply , totalsentences , totallines , ratioblanklines ,
totalwords , avgwordlength , totalfnword , ratiofnwords , totalchars ,
ratioletters , ratiodigits , ratioucase , totalspcchars , totalpunc ,
ratiopunc, author
YES ,3,6,0.3423234,34,4.45454,3,0.3432,31,1.23232,0.342342,0.542342,2,7,0.234234,?
Now after running the J48 on train data it shows me the
statistics and output fine. Now when I give this test data then it says
the train and test data are not compatible. All I want is , it will
suggest me the author from train data. Also the format
and columns are same and i put ? in place of author part in test data.
What am I doing wrong? THanks.
|
# 2

16-06-2011 03:03 AM
|
|
|
Hi,
I have two csv files which I created; one for train and another for test. The contents are as below:
hasreply , totalsentences , totallines , ratioblanklines ,
totalwords , avgwordlength , totalfnword , ratiofnwords , totalchars ,
ratioletters , ratiodigits , ratioucase , totalspcchars , totalpunc ,
ratiopunc , author
YES ,3,6,0.3423234,34,4.45454,5,0.3432,35,1.23232,0.342342,0.342342,8,5,0.234234, Allen-P
YES ,4,7,1.3423234,35,5.45454,6,1.3432,36,2.23232,1.342342,1.342342,9,6,1.234234, Allen-P
YES ,5,8,2.3423234,36,6.45454,7,2.3432,37,3.23232,2.342342,2.342342,10,7,2.234234, Allen-P
YES ,6,9,3.3423234,37,7.45454,8,3.3432,38,4.23232,3.342342,3.342342,11,8,3.234234, Allen-P
NO ,5,8,2.3423234,36,6.45454,7,2.3432,37,3.23232,2.342342,2.342342,10,7,2.234234, Tanveer
NO ,6,9,3.3423234,37,7.45454,8,3.3432,38,4.23232,3.342342,3.342342,11,8,3.234234, Tanveer
And the test data is as follows:
hasreply , totalsentences , totallines , ratioblanklines ,
totalwords , avgwordlength , totalfnword , ratiofnwords , totalchars ,
ratioletters , ratiodigits , ratioucase , totalspcchars , totalpunc ,
ratiopunc, author
YES ,3,6,0.3423234,34,4.45454,3,0.3432,31,1.23232,0.342342,0.542342,2,7,0.234234,?
Now after running the J48 on train data it shows me the
statistics and output fine. Now when I give this test data then it says
the train and test data are not compatible. All I want is , it will
suggest me the author from train data. Also the format
and columns are same and i put ? in place of author part in test data.
What am I doing wrong? THanks.
Hi I created both train and test files to arff format.
Now the first step, creating the model with train data is fine as it was before too. Now when give the arff test file then the output it gives I can't make any sense of it.
What is the step to do this anyway? May be I am doing it wrong way. What I am doing is
1. Open Explorer. From Preprocess Tab I select the train arff file.
2. Then went to Classify tab, and used the "Test Option" Use Training set and ran the J48 on it. It shows the output fine so far.
3. Now I again select the Supplied Test Set under Test Option and select the test arff file. But now it's giving weird output of `NaN`, in ROC it's showing ? mark.
It is the right way?
My objective is first to train with the train data set sample that I attached and then just give the 1 instance test arff file and want it to tell me probability which author it is.
Thank you.
________________________________
From: Sebastian Luna Valero <>
To: Weka machine learning workbench list. <>
Sent: Tuesday, June 14, 2011 2:14:45 AM
Subject: Re: [Wekalist] Getting error as train and test dataset are not compatible
Hi,
Convert both train and test files into arff format and tray again...
HTH,
Sebastian
> Hi,
> I have two csv files which I created; one for train and another for test.
> The contents are as below:
> hasreply , totalsentences , totallines , ratioblanklines ,
> totalwords , avgwordlength , totalfnword , ratiofnwords , totalchars ,
> ratioletters , ratiodigits , ratioucase , totalspcchars , totalpunc ,
> ratiopunc , author
> YES
> ,3,6,0.3423234,34,4.45454,5,0.3432,35,1.23232,0.342342,0.342342,8,5,0.234234,
> Allen-P
> YES
> ,4,7,1.3423234,35,5.45454,6,1.3432,36,2.23232,1.342342,1.342342,9,6,1.234234,
> Allen-P
> YES
> ,5,8,2.3423234,36,6.45454,7,2.3432,37,3.23232,2.342342,2.342342,10,7,2.234234,
> Allen-P
> YES
> ,6,9,3.3423234,37,7.45454,8,3.3432,38,4.23232,3.342342,3.342342,11,8,3.234234,
> Allen-P
> NO
> ,5,8,2.3423234,36,6.45454,7,2.3432,37,3.23232,2.342342,2.342342,10,7,2.234234,
> Tanveer
> NO
> ,6,9,3.3423234,37,7.45454,8,3.3432,38,4.23232,3.342342,3.342342,11,8,3.234234,
> Tanveer
>
> And the test data is as follows:
> hasreply , totalsentences , totallines , ratioblanklines ,
> totalwords , avgwordlength , totalfnword , ratiofnwords , totalchars ,
> ratioletters , ratiodigits , ratioucase , totalspcchars , totalpunc ,
> ratiopunc, author
> YES
> ,3,6,0.3423234,34,4.45454,3,0.3432,31,1.23232,0.342342,0.542342,2,7,0.234234,?
>
> Now after running the J48 on train data it shows me the
> statistics and output fine. Now when I give this test data then it says
> the train and test data are not compatible. All I want is , it will
> suggest me the author from train data. Also the format
> and columns are same and i put ? in place of author part in test data.
> What am I doing wrong?
> THanks._______________________________________________
> Wekalist mailing list
> Send posts to:
> List info and subscription status:
> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette:
> http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
_______________________________________________
Wekalist mailing list
Send posts to:
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
|
# 3

16-06-2011 07:49 AM
|
|
|
Hi,
I have two csv files which I created; one for train and another for test. The contents are as below:
hasreply , totalsentences , totallines , ratioblanklines ,
totalwords , avgwordlength , totalfnword , ratiofnwords , totalchars ,
ratioletters , ratiodigits , ratioucase , totalspcchars , totalpunc ,
ratiopunc , author
YES ,3,6,0.3423234,34,4.45454,5,0.3432,35,1.23232,0.342342,0.342342,8,5,0.234234, Allen-P
YES ,4,7,1.3423234,35,5.45454,6,1.3432,36,2.23232,1.342342,1.342342,9,6,1.234234, Allen-P
YES ,5,8,2.3423234,36,6.45454,7,2.3432,37,3.23232,2.342342,2.342342,10,7,2.234234, Allen-P
YES ,6,9,3.3423234,37,7.45454,8,3.3432,38,4.23232,3.342342,3.342342,11,8,3.234234, Allen-P
NO ,5,8,2.3423234,36,6.45454,7,2.3432,37,3.23232,2.342342,2.342342,10,7,2.234234, Tanveer
NO ,6,9,3.3423234,37,7.45454,8,3.3432,38,4.23232,3.342342,3.342342,11,8,3.234234, Tanveer
And the test data is as follows:
hasreply , totalsentences , totallines , ratioblanklines ,
totalwords , avgwordlength , totalfnword , ratiofnwords , totalchars ,
ratioletters , ratiodigits , ratioucase , totalspcchars , totalpunc ,
ratiopunc, author
YES ,3,6,0.3423234,34,4.45454,3,0.3432,31,1.23232,0.342342,0.542342,2,7,0.234234,?
Now after running the J48 on train data it shows me the
statistics and output fine. Now when I give this test data then it says
the train and test data are not compatible. All I want is , it will
suggest me the author from train data. Also the format
and columns are same and i put ? in place of author part in test data.
What am I doing wrong? THanks.
Hi I created both train and test files to arff format.
Now the first step, creating the model with train data is fine as it was before too. Now when give the arff test file then the output it gives I can't make any sense of it.
What is the step to do this anyway? May be I am doing it wrong way. What I am doing is
1. Open Explorer. From Preprocess Tab I select the train arff file.
2. Then went to Classify tab, and used the "Test Option" Use Training set and ran the J48 on it. It shows the output fine so far.
3. Now I again select the Supplied Test Set under Test Option and select the test arff file. But now it's giving weird output of `NaN`, in ROC it's showing ? mark.
It is the right way?
My objective is first to train with the train data set sample that I attached and then just give the 1 instance test arff file and want it to tell me probability which author it is.
Thank you.
________________________________
From: Sebastian Luna Valero <>
To: Weka machine learning workbench list. <>
Sent: Tuesday, June 14, 2011 2:14:45 AM
Subject: Re: [Wekalist] Getting error as train and test dataset are not compatible
Hi,
Convert both train and test files into arff format and tray again...
HTH,
Sebastian
> Hi,
> I have two csv files which I created; one for train and another for test.
> The contents are as below:
> hasreply , totalsentences , totallines , ratioblanklines ,
> totalwords , avgwordlength , totalfnword , ratiofnwords , totalchars ,
> ratioletters , ratiodigits , ratioucase , totalspcchars , totalpunc ,
> ratiopunc , author
> YES
> ,3,6,0.3423234,34,4.45454,5,0.3432,35,1.23232,0.342342,0.342342,8,5,0.234234,
> Allen-P
> YES
> ,4,7,1.3423234,35,5.45454,6,1.3432,36,2.23232,1.342342,1.342342,9,6,1.234234,
> Allen-P
> YES
> ,5,8,2.3423234,36,6.45454,7,2.3432,37,3.23232,2.342342,2.342342,10,7,2.234234,
> Allen-P
> YES
> ,6,9,3.3423234,37,7.45454,8,3.3432,38,4.23232,3.342342,3.342342,11,8,3.234234,
> Allen-P
> NO
> ,5,8,2.3423234,36,6.45454,7,2.3432,37,3.23232,2.342342,2.342342,10,7,2.234234,
> Tanveer
> NO
> ,6,9,3.3423234,37,7.45454,8,3.3432,38,4.23232,3.342342,3.342342,11,8,3.234234,
> Tanveer
>
> And the test data is as follows:
> hasreply , totalsentences , totallines , ratioblanklines ,
> totalwords , avgwordlength , totalfnword , ratiofnwords , totalchars ,
> ratioletters , ratiodigits , ratioucase , totalspcchars , totalpunc ,
> ratiopunc, author
> YES
> ,3,6,0.3423234,34,4.45454,3,0.3432,31,1.23232,0.342342,0.542342,2,7,0.234234,?
>
> Now after running the J48 on train data it shows me the
> statistics and output fine. Now when I give this test data then it says
> the train and test data are not compatible. All I want is , it will
> suggest me the author from train data. Also the format
> and columns are same and i put ? in place of author part in test data.
> What am I doing wrong?
> THanks._______________________________________________
> Wekalist mailing list
> Send posts to:
> List info and subscription status:
> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette:
> http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
_______________________________________________
Wekalist mailing list
Send posts to:
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Hi,
If you get a NaN, it is probably because the number of classes in the
train file is greater than the number in the test file...
By the way did you specify what is the class attribute in the classify
tab? (default is the last one)
Best,
Christophe
2011/6/16 Tanveer Chowdhury <>:
> Hi I created both train and test files to arff format.
> Now the first step, creating the model with train data is fine as it was
> before too. Now when give the arff test file then the output it gives I
> can't make any sense of it.
> What is the step to do this anyway? May be I am doing it wrong way. What I
> am doing is
> 1. Open Explorer. From Preprocess Tab I select the train arff file.
> 2. Then went to Classify tab, and used the "Test Option" Use Training set
> and ran the J48 on it. It shows the output fine so far.
> 3. Now I again select the Supplied Test Set under Test Option and select the
> test arff file. But now it's giving weird output of `NaN`, in ROC it's
> showing ? mark.
> It is the right way?
> My objective is first to train with the train data set sample that I
> attached and then just give the 1 instance test arff file and want it to
> tell me probability which author it is.
> Thank you.
>
> ________________________________
> From: Sebastian Luna Valero <>
> To: Weka machine learning workbench list. <>
> Sent: Tuesday, June 14, 2011 2:14:45 AM
> Subject: Re: [Wekalist] Getting error as train and test dataset are not
> compatible
>
>
> Hi,
>
> Convert both train and test files into arff format and tray again...
>
> HTH,
> Sebastian
>
>
>
>> Hi,
>> I have two csv files which I created; one for train and another for test.
>> The contents are as below:
>> hasreply , totalsentences , totallines , ratioblanklines ,
>> totalwords , avgwordlength , totalfnword , ratiofnwords , totalchars ,
>> ratioletters , ratiodigits , ratioucase , totalspcchars , totalpunc ,
>> ratiopunc , author
>> YES
>>
>> ,3,6,0.3423234,34,4.45454,5,0.3432,35,1.23232,0.342342,0.342342,8,5,0.234234,
>> Allen-P
>> YES
>>
>> ,4,7,1.3423234,35,5.45454,6,1.3432,36,2.23232,1.342342,1.342342,9,6,1.234234,
>> Allen-P
>> YES
>>
>> ,5,8,2.3423234,36,6.45454,7,2.3432,37,3.23232,2.342342,2.342342,10,7,2.234234,
>> Allen-P
>> YES
>>
>> ,6,9,3.3423234,37,7.45454,8,3.3432,38,4.23232,3.342342,3.342342,11,8,3.234234,
>> Allen-P
>> NO
>>
>> ,5,8,2.3423234,36,6.45454,7,2.3432,37,3.23232,2.342342,2.342342,10,7,2.234234,
>> Tanveer
>> NO
>>
>> ,6,9,3.3423234,37,7.45454,8,3.3432,38,4.23232,3.342342,3.342342,11,8,3.234234,
>> Tanveer
>>
>> And the test data is as follows:
>> hasreply , totalsentences , totallines , ratioblanklines ,
>> totalwords , avgwordlength , totalfnword , ratiofnwords , totalchars ,
>> ratioletters , ratiodigits , ratioucase , totalspcchars , totalpunc ,
>> ratiopunc, author
>> YES
>>
>> ,3,6,0.3423234,34,4.45454,3,0.3432,31,1.23232,0.342342,0.542342,2,7,0.234234,?
>>
>> Now after running the J48 on train data it shows me the
>> statistics and output fine. Now when I give this test data then it says
>> the train and test data are not compatible. All I want is , it will
>> suggest me the author from train data. Also the format
>> and columns are same and i put ? in place of author part in test data.
>> What am I doing wrong?
>> THanks._______________________________________________
>> Wekalist mailing list
>> Send posts to:
>> List info and subscription status:
>> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
>> List etiquette:
>> http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>>
>
>
>
>
> _______________________________________________
> Wekalist mailing list
> Send posts to:
> List info and subscription status:
> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette:
> http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
>
>
> _______________________________________________
> Wekalist mailing list
> Send posts to:
> List info and subscription status:
> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette:
> http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
>
_______________________________________________
Wekalist mailing list
Send posts to:
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
)
|
# 4

16-06-2011 07:53 AM
|
|
|
Hi,
I have two csv files which I created; one for train and another for test. The contents are as below:
hasreply , totalsentences , totallines , ratioblanklines ,
totalwords , avgwordlength , totalfnword , ratiofnwords , totalchars ,
ratioletters , ratiodigits , ratioucase , totalspcchars , totalpunc ,
ratiopunc , author
YES ,3,6,0.3423234,34,4.45454,5,0.3432,35,1.23232,0.342342,0.342342,8,5,0.234234, Allen-P
YES ,4,7,1.3423234,35,5.45454,6,1.3432,36,2.23232,1.342342,1.342342,9,6,1.234234, Allen-P
YES ,5,8,2.3423234,36,6.45454,7,2.3432,37,3.23232,2.342342,2.342342,10,7,2.234234, Allen-P
YES ,6,9,3.3423234,37,7.45454,8,3.3432,38,4.23232,3.342342,3.342342,11,8,3.234234, Allen-P
NO ,5,8,2.3423234,36,6.45454,7,2.3432,37,3.23232,2.342342,2.342342,10,7,2.234234, Tanveer
NO ,6,9,3.3423234,37,7.45454,8,3.3432,38,4.23232,3.342342,3.342342,11,8,3.234234, Tanveer
And the test data is as follows:
hasreply , totalsentences , totallines , ratioblanklines ,
totalwords , avgwordlength , totalfnword , ratiofnwords , totalchars ,
ratioletters , ratiodigits , ratioucase , totalspcchars , totalpunc ,
ratiopunc, author
YES ,3,6,0.3423234,34,4.45454,3,0.3432,31,1.23232,0.342342,0.542342,2,7,0.234234,?
Now after running the J48 on train data it shows me the
statistics and output fine. Now when I give this test data then it says
the train and test data are not compatible. All I want is , it will
suggest me the author from train data. Also the format
and columns are same and i put ? in place of author part in test data.
What am I doing wrong? THanks.
Hi I created both train and test files to arff format.
Now the first step, creating the model with train data is fine as it was before too. Now when give the arff test file then the output it gives I can't make any sense of it.
What is the step to do this anyway? May be I am doing it wrong way. What I am doing is
1. Open Explorer. From Preprocess Tab I select the train arff file.
2. Then went to Classify tab, and used the "Test Option" Use Training set and ran the J48 on it. It shows the output fine so far.
3. Now I again select the Supplied Test Set under Test Option and select the test arff file. But now it's giving weird output of `NaN`, in ROC it's showing ? mark.
It is the right way?
My objective is first to train with the train data set sample that I attached and then just give the 1 instance test arff file and want it to tell me probability which author it is.
Thank you.
________________________________
From: Sebastian Luna Valero <>
To: Weka machine learning workbench list. <>
Sent: Tuesday, June 14, 2011 2:14:45 AM
Subject: Re: [Wekalist] Getting error as train and test dataset are not compatible
Hi,
Convert both train and test files into arff format and tray again...
HTH,
Sebastian
> Hi,
> I have two csv files which I created; one for train and another for test.
> The contents are as below:
> hasreply , totalsentences , totallines , ratioblanklines ,
> totalwords , avgwordlength , totalfnword , ratiofnwords , totalchars ,
> ratioletters , ratiodigits , ratioucase , totalspcchars , totalpunc ,
> ratiopunc , author
> YES
> ,3,6,0.3423234,34,4.45454,5,0.3432,35,1.23232,0.342342,0.342342,8,5,0.234234,
> Allen-P
> YES
> ,4,7,1.3423234,35,5.45454,6,1.3432,36,2.23232,1.342342,1.342342,9,6,1.234234,
> Allen-P
> YES
> ,5,8,2.3423234,36,6.45454,7,2.3432,37,3.23232,2.342342,2.342342,10,7,2.234234,
> Allen-P
> YES
> ,6,9,3.3423234,37,7.45454,8,3.3432,38,4.23232,3.342342,3.342342,11,8,3.234234,
> Allen-P
> NO
> ,5,8,2.3423234,36,6.45454,7,2.3432,37,3.23232,2.342342,2.342342,10,7,2.234234,
> Tanveer
> NO
> ,6,9,3.3423234,37,7.45454,8,3.3432,38,4.23232,3.342342,3.342342,11,8,3.234234,
> Tanveer
>
> And the test data is as follows:
> hasreply , totalsentences , totallines , ratioblanklines ,
> totalwords , avgwordlength , totalfnword , ratiofnwords , totalchars ,
> ratioletters , ratiodigits , ratioucase , totalspcchars , totalpunc ,
> ratiopunc, author
> YES
> ,3,6,0.3423234,34,4.45454,3,0.3432,31,1.23232,0.342342,0.542342,2,7,0.234234,?
>
> Now after running the J48 on train data it shows me the
> statistics and output fine. Now when I give this test data then it says
> the train and test data are not compatible. All I want is , it will
> suggest me the author from train data. Also the format
> and columns are same and i put ? in place of author part in test data.
> What am I doing wrong?
> THanks._______________________________________________
> Wekalist mailing list
> Send posts to:
> List info and subscription status:
> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette:
> http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
_______________________________________________
Wekalist mailing list
Send posts to:
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Hi,
If you get a NaN, it is probably because the number of classes in the
train file is greater than the number in the test file...
By the way did you specify what is the class attribute in the classify
tab? (default is the last one)
Best,
Christophe
2011/6/16 Tanveer Chowdhury <>:
> Hi I created both train and test files to arff format.
> Now the first step, creating the model with train data is fine as it was
> before too. Now when give the arff test file then the output it gives I
> can't make any sense of it.
> What is the step to do this anyway? May be I am doing it wrong way. What I
> am doing is
> 1. Open Explorer. From Preprocess Tab I select the train arff file.
> 2. Then went to Classify tab, and used the "Test Option" Use Training set
> and ran the J48 on it. It shows the output fine so far.
> 3. Now I again select the Supplied Test Set under Test Option and select the
> test arff file. But now it's giving weird output of `NaN`, in ROC it's
> showing ? mark.
> It is the right way?
> My objective is first to train with the train data set sample that I
> attached and then just give the 1 instance test arff file and want it to
> tell me probability which author it is.
> Thank you.
>
> ________________________________
> From: Sebastian Luna Valero <>
> To: Weka machine learning workbench list. <>
> Sent: Tuesday, June 14, 2011 2:14:45 AM
> Subject: Re: [Wekalist] Getting error as train and test dataset are not
> compatible
>
>
> Hi,
>
> Convert both train and test files into arff format and tray again...
>
> HTH,
> Sebastian
>
>
>
>> Hi,
>> I have two csv files which I created; one for train and another for test.
>> The contents are as below:
>> hasreply , totalsentences , totallines , ratioblanklines ,
>> totalwords , avgwordlength , totalfnword , ratiofnwords , totalchars ,
>> ratioletters , ratiodigits , ratioucase , totalspcchars , totalpunc ,
>> ratiopunc , author
>> YES
>>
>> ,3,6,0.3423234,34,4.45454,5,0.3432,35,1.23232,0.342342,0.342342,8,5,0.234234,
>> Allen-P
>> YES
>>
>> ,4,7,1.3423234,35,5.45454,6,1.3432,36,2.23232,1.342342,1.342342,9,6,1.234234,
>> Allen-P
>> YES
>>
>> ,5,8,2.3423234,36,6.45454,7,2.3432,37,3.23232,2.342342,2.342342,10,7,2.234234,
>> Allen-P
>> YES
>>
>> ,6,9,3.3423234,37,7.45454,8,3.3432,38,4.23232,3.342342,3.342342,11,8,3.234234,
>> Allen-P
>> NO
>>
>> ,5,8,2.3423234,36,6.45454,7,2.3432,37,3.23232,2.342342,2.342342,10,7,2.234234,
>> Tanveer
>> NO
>>
>> ,6,9,3.3423234,37,7.45454,8,3.3432,38,4.23232,3.342342,3.342342,11,8,3.234234,
>> Tanveer
>>
>> And the test data is as follows:
>> hasreply , totalsentences , totallines , ratioblanklines ,
>> totalwords , avgwordlength , totalfnword , ratiofnwords , totalchars ,
>> ratioletters , ratiodigits , ratioucase , totalspcchars , totalpunc ,
>> ratiopunc, author
>> YES
>>
>> ,3,6,0.3423234,34,4.45454,3,0.3432,31,1.23232,0.342342,0.542342,2,7,0.234234,?
>>
>> Now after running the J48 on train data it shows me the
>> statistics and output fine. Now when I give this test data then it says
>> the train and test data are not compatible. All I want is , it will
>> suggest me the author from train data. Also the format
>> and columns are same and i put ? in place of author part in test data.
>> What am I doing wrong?
>> THanks._______________________________________________
>> Wekalist mailing list
>> Send posts to:
>> List info and subscription status:
>> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
>> List etiquette:
>> http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>>
>
>
>
>
> _______________________________________________
> Wekalist mailing list
> Send posts to:
> List info and subscription status:
> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette:
> http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
>
>
> _______________________________________________
> Wekalist mailing list
> Send posts to:
> List info and subscription status:
> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette:
> http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
>
_______________________________________________
Wekalist mailing list
Send posts to:
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
)
after looking in your arff file. The author values are missing in the
test file so the classifier can't know if its prediction were right or
wrong. You need to have a test file without missing classes (at least
for almost all lines...)
By the way it will be hard to get something from so small data files.
2011/6/16 Christophe Salperwyck <>:
> Hi,
>
> If you get a NaN, it is probably because the number of classes in the
> train file is greater than the number in the test file...
>
> By the way did you specify what is the class attribute in the classify
> tab? (default is the last one)
>
> Best,
> Christophe
>
> 2011/6/16 Tanveer Chowdhury <>:
>> Hi I created both train and test files to arff format.
>> Now the first step, creating the model with train data is fine as it was
>> before too. Now when give the arff test file then the output it gives I
>> can't make any sense of it.
>> What is the step to do this anyway? May be I am doing it wrong way. What I
>> am doing is
>> 1. Open Explorer. From Preprocess Tab I select the train arff file.
>> 2. Then went to Classify tab, and used the "Test Option" Use Training set
>> and ran the J48 on it. It shows the output fine so far.
>> 3. Now I again select the Supplied Test Set under Test Option and select the
>> test arff file. But now it's giving weird output of `NaN`, in ROC it's
>> showing ? mark.
>> It is the right way?
>> My objective is first to train with the train data set sample that I
>> attached and then just give the 1 instance test arff file and want it to
>> tell me probability which author it is.
>> Thank you.
>>
>> ________________________________
>> From: Sebastian Luna Valero <>
>> To: Weka machine learning workbench list. <>
>> Sent: Tuesday, June 14, 2011 2:14:45 AM
>> Subject: Re: [Wekalist] Getting error as train and test dataset are not
>> compatible
>>
>>
>> Hi,
>>
>> Convert both train and test files into arff format and tray again...
>>
>> HTH,
>> Sebastian
>>
>>
>>
>>> Hi,
>>> I have two csv files which I created; one for train and another for test.
>>> The contents are as below:
>>> hasreply , totalsentences , totallines , ratioblanklines ,
>>> totalwords , avgwordlength , totalfnword , ratiofnwords , totalchars ,
>>> ratioletters , ratiodigits , ratioucase , totalspcchars , totalpunc ,
>>> ratiopunc , author
>>> YES
>>>
>>> ,3,6,0.3423234,34,4.45454,5,0.3432,35,1.23232,0.342342,0.342342,8,5,0.234234,
>>> Allen-P
>>> YES
>>>
>>> ,4,7,1.3423234,35,5.45454,6,1.3432,36,2.23232,1.342342,1.342342,9,6,1.234234,
>>> Allen-P
>>> YES
>>>
>>> ,5,8,2.3423234,36,6.45454,7,2.3432,37,3.23232,2.342342,2.342342,10,7,2.234234,
>>> Allen-P
>>> YES
>>>
>>> ,6,9,3.3423234,37,7.45454,8,3.3432,38,4.23232,3.342342,3.342342,11,8,3.234234,
>>> Allen-P
>>> NO
>>>
>>> ,5,8,2.3423234,36,6.45454,7,2.3432,37,3.23232,2.342342,2.342342,10,7,2.234234,
>>> Tanveer
>>> NO
>>>
>>> ,6,9,3.3423234,37,7.45454,8,3.3432,38,4.23232,3.342342,3.342342,11,8,3.234234,
>>> Tanveer
>>>
>>> And the test data is as follows:
>>> hasreply , totalsentences , totallines , ratioblanklines ,
>>> totalwords , avgwordlength , totalfnword , ratiofnwords , totalchars ,
>>> ratioletters , ratiodigits , ratioucase , totalspcchars , totalpunc ,
>>> ratiopunc, author
>>> YES
>>>
>>> ,3,6,0.3423234,34,4.45454,3,0.3432,31,1.23232,0.342342,0.542342,2,7,0.234234,?
>>>
>>> Now after running the J48 on train data it shows me the
>>> statistics and output fine. Now when I give this test data then it says
>>> the train and test data are not compatible. All I want is , it will
>>> suggest me the author from train data. Also the format
>>> and columns are same and i put ? in place of author part in test data.
>>> What am I doing wrong?
>>> THanks._______________________________________________
>>> Wekalist mailing list
>>> Send posts to:
>>> List info and subscription status:
>>> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
>>> List etiquette:
>>> http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>>>
>>
>>
>>
>>
>> _______________________________________________
>> Wekalist mailing list
>> Send posts to:
>> List info and subscription status:
>> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
>> List etiquette:
>> http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>>
>>
>>
>> _______________________________________________
>> Wekalist mailing list
>> Send posts to:
>> List info and subscription status:
>> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
>> List etiquette:
>> http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>>
>>
>
_______________________________________________
Wekalist mailing list
Send posts to:
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
)
|
# 5

16-06-2011 08:36 AM
|
|
|
Hi,
I have two csv files which I created; one for train and another for test. The contents are as below:
hasreply , totalsentences , totallines , ratioblanklines ,
totalwords , avgwordlength , totalfnword , ratiofnwords , totalchars ,
ratioletters , ratiodigits , ratioucase , totalspcchars , totalpunc ,
ratiopunc , author
YES ,3,6,0.3423234,34,4.45454,5,0.3432,35,1.23232,0.342342,0.342342,8,5,0.234234, Allen-P
YES ,4,7,1.3423234,35,5.45454,6,1.3432,36,2.23232,1.342342,1.342342,9,6,1.234234, Allen-P
YES ,5,8,2.3423234,36,6.45454,7,2.3432,37,3.23232,2.342342,2.342342,10,7,2.234234, Allen-P
YES ,6,9,3.3423234,37,7.45454,8,3.3432,38,4.23232,3.342342,3.342342,11,8,3.234234, Allen-P
NO ,5,8,2.3423234,36,6.45454,7,2.3432,37,3.23232,2.342342,2.342342,10,7,2.234234, Tanveer
NO ,6,9,3.3423234,37,7.45454,8,3.3432,38,4.23232,3.342342,3.342342,11,8,3.234234, Tanveer
And the test data is as follows:
hasreply , totalsentences , totallines , ratioblanklines ,
totalwords , avgwordlength , totalfnword , ratiofnwords , totalchars ,
ratioletters , ratiodigits , ratioucase , totalspcchars , totalpunc ,
ratiopunc, author
YES ,3,6,0.3423234,34,4.45454,3,0.3432,31,1.23232,0.342342,0.542342,2,7,0.234234,?
Now after running the J48 on train data it shows me the
statistics and output fine. Now when I give this test data then it says
the train and test data are not compatible. All I want is , it will
suggest me the author from train data. Also the format
and columns are same and i put ? in place of author part in test data.
What am I doing wrong? THanks.
Hi I created both train and test files to arff format.
Now the first step, creating the model with train data is fine as it was before too. Now when give the arff test file then the output it gives I can't make any sense of it.
What is the step to do this anyway? May be I am doing it wrong way. What I am doing is
1. Open Explorer. From Preprocess Tab I select the train arff file.
2. Then went to Classify tab, and used the "Test Option" Use Training set and ran the J48 on it. It shows the output fine so far.
3. Now I again select the Supplied Test Set under Test Option and select the test arff file. But now it's giving weird output of `NaN`, in ROC it's showing ? mark.
It is the right way?
My objective is first to train with the train data set sample that I attached and then just give the 1 instance test arff file and want it to tell me probability which author it is.
Thank you.
________________________________
From: Sebastian Luna Valero <>
To: Weka machine learning workbench list. <>
Sent: Tuesday, June 14, 2011 2:14:45 AM
Subject: Re: [Wekalist] Getting error as train and test dataset are not compatible
Hi,
Convert both train and test files into arff format and tray again...
HTH,
Sebastian
> Hi,
> I have two csv files which I created; one for train and another for test.
> The contents are as below:
> hasreply , totalsentences , totallines , ratioblanklines ,
> totalwords , avgwordlength , totalfnword , ratiofnwords , totalchars ,
> ratioletters , ratiodigits , ratioucase , totalspcchars , totalpunc ,
> ratiopunc , author
> YES
> ,3,6,0.3423234,34,4.45454,5,0.3432,35,1.23232,0.342342,0.342342,8,5,0.234234,
> Allen-P
> YES
> ,4,7,1.3423234,35,5.45454,6,1.3432,36,2.23232,1.342342,1.342342,9,6,1.234234,
> Allen-P
> YES
> ,5,8,2.3423234,36,6.45454,7,2.3432,37,3.23232,2.342342,2.342342,10,7,2.234234,
> Allen-P
> YES
> ,6,9,3.3423234,37,7.45454,8,3.3432,38,4.23232,3.342342,3.342342,11,8,3.234234,
> Allen-P
> NO
> ,5,8,2.3423234,36,6.45454,7,2.3432,37,3.23232,2.342342,2.342342,10,7,2.234234,
> Tanveer
> NO
> ,6,9,3.3423234,37,7.45454,8,3.3432,38,4.23232,3.342342,3.342342,11,8,3.234234,
> Tanveer
>
> And the test data is as follows:
> hasreply , totalsentences , totallines , ratioblanklines ,
> totalwords , avgwordlength , totalfnword , ratiofnwords , totalchars ,
> ratioletters , ratiodigits , ratioucase , totalspcchars , totalpunc ,
> ratiopunc, author
> YES
> ,3,6,0.3423234,34,4.45454,3,0.3432,31,1.23232,0.342342,0.542342,2,7,0.234234,?
>
> Now after running the J48 on train data it shows me the
> statistics and output fine. Now when I give this test data then it says
> the train and test data are not compatible. All I want is , it will
> suggest me the author from train data. Also the format
> and columns are same and i put ? in place of author part in test data.
> What am I doing wrong?
> THanks._______________________________________________
> Wekalist mailing list
> Send posts to:
> List info and subscription status:
> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette:
> http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
_______________________________________________
Wekalist mailing list
Send posts to:
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Hi,
If you get a NaN, it is probably because the number of classes in the
train file is greater than the number in the test file...
By the way did you specify what is the class attribute in the classify
tab? (default is the last one)
Best,
Christophe
2011/6/16 Tanveer Chowdhury <>:
> Hi I created both train and test files to arff format.
> Now the first step, creating the model with train data is fine as it was
> before too. Now when give the arff test file then the output it gives I
> can't make any sense of it.
> What is the step to do this anyway? May be I am doing it wrong way. What I
> am doing is
> 1. Open Explorer. From Preprocess Tab I select the train arff file.
> 2. Then went to Classify tab, and used the "Test Option" Use Training set
> and ran the J48 on it. It shows the output fine so far.
> 3. Now I again select the Supplied Test Set under Test Option and select the
> test arff file. But now it's giving weird output of `NaN`, in ROC it's
> showing ? mark.
> It is the right way?
> My objective is first to train with the train data set sample that I
> attached and then just give the 1 instance test arff file and want it to
> tell me probability which author it is.
> Thank you.
>
> ________________________________
> From: Sebastian Luna Valero <>
> To: Weka machine learning workbench list. <>
> Sent: Tuesday, June 14, 2011 2:14:45 AM
> Subject: Re: [Wekalist] Getting error as train and test dataset are not
> compatible
>
>
> Hi,
>
> Convert both train and test files into arff format and tray again...
>
> HTH,
> Sebastian
>
>
>
>> Hi,
>> I have two csv files which I created; one for train and another for test.
>> The contents are as below:
>> hasreply , totalsentences , totallines , ratioblanklines ,
>> totalwords , avgwordlength , totalfnword , ratiofnwords , totalchars ,
>> ratioletters , ratiodigits , ratioucase , totalspcchars , totalpunc ,
>> ratiopunc , author
>> YES
>>
>> ,3,6,0.3423234,34,4.45454,5,0.3432,35,1.23232,0.342342,0.342342,8,5,0.234234,
>> Allen-P
>> YES
>>
>> ,4,7,1.3423234,35,5.45454,6,1.3432,36,2.23232,1.342342,1.342342,9,6,1.234234,
>> Allen-P
>> YES
>>
>> ,5,8,2.3423234,36,6.45454,7,2.3432,37,3.23232,2.342342,2.342342,10,7,2.234234,
>> Allen-P
>> YES
>>
>> ,6,9,3.3423234,37,7.45454,8,3.3432,38,4.23232,3.342342,3.342342,11,8,3.234234,
>> Allen-P
>> NO
>>
>> ,5,8,2.3423234,36,6.45454,7,2.3432,37,3.23232,2.342342,2.342342,10,7,2.234234,
>> Tanveer
>> NO
>>
>> ,6,9,3.3423234,37,7.45454,8,3.3432,38,4.23232,3.342342,3.342342,11,8,3.234234,
>> Tanveer
>>
>> And the test data is as follows:
>> hasreply , totalsentences , totallines , ratioblanklines ,
>> totalwords , avgwordlength , totalfnword , ratiofnwords , totalchars ,
>> ratioletters , ratiodigits , ratioucase , totalspcchars , totalpunc ,
>> ratiopunc, author
>> YES
>>
>> ,3,6,0.3423234,34,4.45454,3,0.3432,31,1.23232,0.342342,0.542342,2,7,0.234234,?
>>
>> Now after running the J48 on train data it shows me the
>> statistics and output fine. Now when I give this test data then it says
>> the train and test data are not compatible. All I want is , it will
>> suggest me the author from train data. Also the format
>> and columns are same and i put ? in place of author part in test data.
>> What am I doing wrong?
>> THanks._______________________________________________
>> Wekalist mailing list
>> Send posts to:
>> List info and subscription status:
>> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
>> List etiquette:
>> http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>>
>
>
>
>
> _______________________________________________
> Wekalist mailing list
> Send posts to:
> List info and subscription status:
> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette:
> http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
>
>
> _______________________________________________
> Wekalist mailing list
> Send posts to:
> List info and subscription status:
> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette:
> http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
>
_______________________________________________
Wekalist mailing list
Send posts to:
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
)
after looking in your arff file. The author values are missing in the
test file so the classifier can't know if its prediction were right or
wrong. You need to have a test file without missing classes (at least
for almost all lines...)
By the way it will be hard to get something from so small data files.
2011/6/16 Christophe Salperwyck <>:
> Hi,
>
> If you get a NaN, it is probably because the number of classes in the
> train file is greater than the number in the test file...
>
> By the way did you specify what is the class attribute in the classify
> tab? (default is the last one)
>
> Best,
> Christophe
>
> 2011/6/16 Tanveer Chowdhury <>:
>> Hi I created both train and test files to arff format.
>> Now the first step, creating the model with train data is fine as it was
>> before too. Now when give the arff test file then the output it gives I
>> can't make any sense of it.
>> What is the step to do this anyway? May be I am doing it wrong way. What I
>> am doing is
>> 1. Open Explorer. From Preprocess Tab I select the train arff file.
>> 2. Then went to Classify tab, and used the "Test Option" Use Training set
>> and ran the J48 on it. It shows the output fine so far.
>> 3. Now I again select the Supplied Test Set under Test Option and select the
>> test arff file. But now it's giving weird output of `NaN`, in ROC it's
>> showing ? mark.
>> It is the right way?
>> My objective is first to train with the train data set sample that I
>> attached and then just give the 1 instance test arff file and want it to
>> tell me probability which author it is.
>> Thank you.
>>
>> ________________________________
>> From: Sebastian Luna Valero <>
>> To: Weka machine learning workbench list. <>
>> Sent: Tuesday, June 14, 2011 2:14:45 AM
>> Subject: Re: [Wekalist] Getting error as train and test dataset are not
>> compatible
>>
>>
>> Hi,
>>
>> Convert both train and test files into arff format and tray again...
>>
>> HTH,
>> Sebastian
>>
>>
>>
>>> Hi,
>>> I have two csv files which I created; one for train and another for test.
>>> The contents are as below:
>>> hasreply , totalsentences , totallines , ratioblanklines ,
>>> totalwords , avgwordlength , totalfnword , ratiofnwords , totalchars ,
>>> ratioletters , ratiodigits , ratioucase , totalspcchars , totalpunc ,
>>> ratiopunc , author
>>> YES
>>>
>>> ,3,6,0.3423234,34,4.45454,5,0.3432,35,1.23232,0.342342,0.342342,8,5,0.234234,
>>> Allen-P
>>> YES
>>>
>>> ,4,7,1.3423234,35,5.45454,6,1.3432,36,2.23232,1.342342,1.342342,9,6,1.234234,
>>> Allen-P
>>> YES
>>>
>>> ,5,8,2.3423234,36,6.45454,7,2.3432,37,3.23232,2.342342,2.342342,10,7,2.234234,
>>> Allen-P
>>> YES
>>>
>>> ,6,9,3.3423234,37,7.45454,8,3.3432,38,4.23232,3.342342,3.342342,11,8,3.234234,
>>> Allen-P
>>> NO
>>>
>>> ,5,8,2.3423234,36,6.45454,7,2.3432,37,3.23232,2.342342,2.342342,10,7,2.234234,
>>> Tanveer
>>> NO
>>>
>>> ,6,9,3.3423234,37,7.45454,8,3.3432,38,4.23232,3.342342,3.342342,11,8,3.234234,
>>> Tanveer
>>>
>>> And the test data is as follows:
>>> hasreply , totalsentences , totallines , ratioblanklines ,
>>> totalwords , avgwordlength , totalfnword , ratiofnwords , totalchars ,
>>> ratioletters , ratiodigits , ratioucase , totalspcchars , totalpunc ,
>>> ratiopunc, author
>>> YES
>>>
>>> ,3,6,0.3423234,34,4.45454,3,0.3432,31,1.23232,0.342342,0.542342,2,7,0.234234,?
>>>
>>> Now after running the J48 on train data it shows me the
>>> statistics and output fine. Now when I give this test data then it says
>>> the train and test data are not compatible. All I want is , it will
>>> suggest me the author from train data. Also the format
>>> and columns are same and i put ? in place of author part in test data.
>>> What am I doing wrong?
>>> THanks._______________________________________________
>>> Wekalist mailing list
>>> Send posts to:
>>> List info and subscription status:
>>> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
>>> List etiquette:
>>> http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>>>
>>
>>
>>
>>
>> _______________________________________________
>> Wekalist mailing list
>> Send posts to:
>> List info and subscription status:
>> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
>> List etiquette:
>> http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>>
>>
>>
>> _______________________________________________
>> Wekalist mailing list
>> Send posts to:
>> List info and subscription status:
>> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
>> List etiquette:
>> http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>>
>>
>
_______________________________________________
Wekalist mailing list
Send posts to:
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
)
In your test file you need to have the class value set, now you have
"?" (which means missing value, that's why it is not working as I said
previously).
Best
2011/6/16 Tanveer Chowdhury <>:
> Hi,
> First of all thank you so much for taking your time to reply.
> As you suggested I now took this sample bank arff file; one for train and
> one for test but still it outputs this `NaN`
> after running test.
> I have attached the result buffer and the arff file for the bank.
> Thank you.
> ________________________________
> From: Christophe Salperwyck <>
> To: Tanveer Chowdhury <>; Weka machine learning
> workbench list. <>
> Sent: Wednesday, June 15, 2011 11:53:02 PM
> Subject: Re: [Wekalist] Getting error as train and test dataset are not
> compatible
>
> after looking in your arff file. The author values are missing in the
> test file so the classifier can't know if its prediction were right or
> wrong. You need to have a test file without missing classes (at least
> for almost all lines...)
>
> By the way it will be hard to get something from so small data files.
>
> 2011/6/16 Christophe Salperwyck <>:
>> Hi,
>>
>> If you get a NaN, it is probably because the number of classes in the
>> train file is greater than the number in the test file...
>>
>> By the way did you specify what is the class attribute in the classify
>> tab? (default is the last one)
>>
>> Best,
>> Christophe
>>
>> 2011/6/16 Tanveer Chowdhury <>:
>>> Hi I created both train and test files to arff format.
>>> Now the first step, creating the model with train data is fine as it was
>>> before too. Now when give the arff test file then the output it gives I
>>> can't make any sense of it.
>>> What is the step to do this anyway? May be I am doing it wrong way. What
>>> I
>>> am doing is
>>> 1. Open Explorer. From Preprocess Tab I select the train arff file.
>>> 2. Then went to Classify tab, and used the "Test Option" Use Training set
>>> and ran the J48 on it. It shows the output fine so far.
>>> 3. Now I again select the Supplied Test Set under Test Option and select
>>> the
>>> test arff file. But now it's giving weird output of `NaN`, in ROC it's
>>> showing ? mark.
>>> It is the right way?
>>> My objective is first to train with the train data set sample that I
>>> attached and then just give the 1 instance test arff file and want it to
>>> tell me probability which author it is.
>>> Thank you.
>>>
>>> ________________________________
>>> From: Sebastian Luna Valero <>
>>> To: Weka machine learning workbench list.
>>> <>
>>> Sent: Tuesday, June 14, 2011 2:14:45 AM
>>> Subject: Re: [Wekalist] Getting error as train and test dataset are not
>>> compatible
>>>
>>>
>>> Hi,
>>>
>>> Convert both train and test files into arff format and tray again...
>>>
>>> HTH,
>>> Sebastian
>>>
>>>
>>>
>>>> Hi,
>>>> I have two csv files which I created; one for train and another for
>>>> test.
>>>> The contents are as below:
>>>> hasreply , totalsentences , totallines , ratioblanklines ,
>>>> totalwords , avgwordlength , totalfnword , ratiofnwords , totalchars ,
>>>> ratioletters , ratiodigits , ratioucase , totalspcchars , totalpunc ,
>>>> ratiopunc , author
>>>> YES
>>>>
>>>>
>>>> ,3,6,0.3423234,34,4.45454,5,0.3432,35,1.23232,0.342342,0.342342,8,5,0.234234,
>>>> Allen-P
>>>> YES
>>>>
>>>>
>>>> ,4,7,1.3423234,35,5.45454,6,1.3432,36,2.23232,1.342342,1.342342,9,6,1.234234,
>>>> Allen-P
>>>> YES
>>>>
>>>>
>>>> ,5,8,2.3423234,36,6.45454,7,2.3432,37,3.23232,2.342342,2.342342,10,7,2.234234,
>>>> Allen-P
>>>> YES
>>>>
>>>>
>>>> ,6,9,3.3423234,37,7.45454,8,3.3432,38,4.23232,3.342342,3.342342,11,8,3.234234,
>>>> Allen-P
>>>> NO
>>>>
>>>>
>>>> ,5,8,2.3423234,36,6.45454,7,2.3432,37,3.23232,2.342342,2.342342,10,7,2.234234,
>>>> Tanveer
>>>> NO
>>>>
>>>>
>>>> ,6,9,3.3423234,37,7.45454,8,3.3432,38,4.23232,3.342342,3.342342,11,8,3.234234,
>>>> Tanveer
>>>>
>>>> And the test data is as follows:
>>>> hasreply , totalsentences , totallines , ratioblanklines ,
>>>> totalwords , avgwordlength , totalfnword , ratiofnwords , totalchars ,
>>>> ratioletters , ratiodigits , ratioucase , totalspcchars , totalpunc ,
>>>> ratiopunc, author
>>>> YES
>>>>
>>>>
>>>> ,3,6,0.3423234,34,4.45454,3,0.3432,31,1.23232,0.342342,0.542342,2,7,0.234234,?
>>>>
>>>> Now after running the J48 on train data it shows me the
>>>> statistics and output fine. Now when I give this test data then it says
>>>> the train and test data are not compatible. All I want is , it will
>>>> suggest me the author from train data. Also the format
>>>> and columns are same and i put ? in place of author part in test data.
>>>> What am I doing wrong?
>>>> THanks._______________________________________________
>>>> Wekalist mailing list
>>>> Send posts to:
>>>> List info and subscription status:
>>>> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
>>>> List etiquette:
>>>> http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Wekalist mailing list
>>> Send posts to:
>>> List info and subscription status:
>>> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
>>> List etiquette:
>>> http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>>>
>>>
>>>
>>> _______________________________________________
>>> Wekalist mailing list
>>> Send posts to:
>>> List info and subscription status:
>>> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
>>> List etiquette:
>>> http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>>>
>>>
>>
>
>
>
_______________________________________________
Wekalist mailing list
Send posts to:
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
)
|
# 6

16-06-2011 10:31 PM
|
|
|
Hi,
I have two csv files which I created; one for train and another for test. The contents are as below:
hasreply , totalsentences , totallines , ratioblanklines ,
totalwords , avgwordlength , totalfnword , ratiofnwords , totalchars ,
ratioletters , ratiodigits , ratioucase , totalspcchars , totalpunc ,
ratiopunc , author
YES ,3,6,0.3423234,34,4.45454,5,0.3432,35,1.23232,0.342342,0.342342,8,5,0.234234, Allen-P
YES ,4,7,1.3423234,35,5.45454,6,1.3432,36,2.23232,1.342342,1.342342,9,6,1.234234, Allen-P
YES ,5,8,2.3423234,36,6.45454,7,2.3432,37,3.23232,2.342342,2.342342,10,7,2.234234, Allen-P
YES ,6,9,3.3423234,37,7.45454,8,3.3432,38,4.23232,3.342342,3.342342,11,8,3.234234, Allen-P
NO ,5,8,2.3423234,36,6.45454,7,2.3432,37,3.23232,2.342342,2.342342,10,7,2.234234, Tanveer
NO ,6,9,3.3423234,37,7.45454,8,3.3432,38,4.23232,3.342342,3.342342,11,8,3.234234, Tanveer
And the test data is as follows:
hasreply , totalsentences , totallines , ratioblanklines ,
totalwords , avgwordlength , totalfnword , ratiofnwords , totalchars ,
ratioletters , ratiodigits , ratioucase , totalspcchars , totalpunc ,
ratiopunc, author
YES ,3,6,0.3423234,34,4.45454,3,0.3432,31,1.23232,0.342342,0.542342,2,7,0.234234,?
Now after running the J48 on train data it shows me the
statistics and output fine. Now when I give this test data then it says
the train and test data are not compatible. All I want is , it will
suggest me the author from train data. Also the format
and columns are same and i put ? in place of author part in test data.
What am I doing wrong? THanks.
Hi I created both train and test files to arff format.
Now the first step, creating the model with train data is fine as it was before too. Now when give the arff test file then the output it gives I can't make any sense of it.
What is the step to do this anyway? May be I am doing it wrong way. What I am doing is
1. Open Explorer. From Preprocess Tab I select the train arff file.
2. Then went to Classify tab, and used the "Test Option" Use Training set and ran the J48 on it. It shows the output fine so far.
3. Now I again select the Supplied Test Set under Test Option and select the test arff file. But now it's giving weird output of `NaN`, in ROC it's showing ? mark.
It is the right way?
My objective is first to train with the train data set sample that I attached and then just give the 1 instance test arff file and want it to tell me probability which author it is.
Thank you.
________________________________
From: Sebastian Luna Valero <>
To: Weka machine learning workbench list. <>
Sent: Tuesday, June 14, 2011 2:14:45 AM
Subject: Re: [Wekalist] Getting error as train and test dataset are not compatible
Hi,
Convert both train and test files into arff format and tray again...
HTH,
Sebastian
> Hi,
> I have two csv files which I created; one for train and another for test.
> The contents are as below:
> hasreply , totalsentences , totallines , ratioblanklines ,
> totalwords , avgwordlength , totalfnword , ratiofnwords , totalchars ,
> ratioletters , ratiodigits , ratioucase , totalspcchars , totalpunc ,
> ratiopunc , author
> YES
> ,3,6,0.3423234,34,4.45454,5,0.3432,35,1.23232,0.342342,0.342342,8,5,0.234234,
> Allen-P
> YES
> ,4,7,1.3423234,35,5.45454,6,1.3432,36,2.23232,1.342342,1.342342,9,6,1.234234,
> Allen-P
> YES
> ,5,8,2.3423234,36,6.45454,7,2.3432,37,3.23232,2.342342,2.342342,10,7,2.234234,
> Allen-P
> YES
> ,6,9,3.3423234,37,7.45454,8,3.3432,38,4.23232,3.342342,3.342342,11,8,3.234234,
> Allen-P
> NO
> ,5,8,2.3423234,36,6.45454,7,2.3432,37,3.23232,2.342342,2.342342,10,7,2.234234,
> Tanveer
> NO
> ,6,9,3.3423234,37,7.45454,8,3.3432,38,4.23232,3.342342,3.342342,11,8,3.234234,
> Tanveer
>
> And the test data is as follows:
> hasreply , totalsentences , totallines , ratioblanklines ,
> totalwords , avgwordlength , totalfnword , ratiofnwords , totalchars ,
> ratioletters , ratiodigits , ratioucase , totalspcchars , totalpunc ,
> ratiopunc, author
> YES
> ,3,6,0.3423234,34,4.45454,3,0.3432,31,1.23232,0.342342,0.542342,2,7,0.234234,?
>
> Now after running the J48 on train data it shows me the
> statistics and output fine. Now when I give this test data then it says
> the train and test data are not compatible. All I want is , it will
> suggest me the author from train data. Also the format
> and columns are same and i put ? in place of author part in test data.
> What am I doing wrong?
> THanks._______________________________________________
> Wekalist mailing list
> Send posts to:
> List info and subscription status:
> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette:
> http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
_______________________________________________
Wekalist mailing list
Send posts to:
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Hi,
If you get a NaN, it is probably because the number of classes in the
train file is greater than the number in the test file...
By the way did you specify what is the class attribute in the classify
tab? (default is the last one)
Best,
Christophe
2011/6/16 Tanveer Chowdhury <>:
> Hi I created both train and test files to arff format.
> Now the first step, creating the model with train data is fine as it was
> before too. Now when give the arff test file then the output it gives I
> can't make any sense of it.
> What is the step to do this anyway? May be I am doing it wrong way. What I
> am doing is
> 1. Open Explorer. From Preprocess Tab I select the train arff file.
> 2. Then went to Classify tab, and used the "Test Option" Use Training set
> and ran the J48 on it. It shows the output fine so far.
> 3. Now I again select the Supplied Test Set under Test Option and select the
> test arff file. But now it's giving weird output of `NaN`, in ROC it's
> showing ? mark.
> It is the right way?
> My objective is first to train with the train data set sample that I
> attached and then just give the 1 instance test arff file and want it to
> tell me probability which author it is.
> Thank you.
>
> ________________________________
> From: Sebastian Luna Valero <>
> To: Weka machine learning workbench list. <>
> Sent: Tuesday, June 14, 2011 2:14:45 AM
> Subject: Re: [Wekalist] Getting error as train and test dataset are not
> compatible
>
>
> Hi,
>
> Convert both train and test files into arff format and tray again...
>
> HTH,
> Sebastian
>
>
>
>> Hi,
>> I have two csv files which I created; one for train and another for test.
>> The contents are as below:
>> hasreply , totalsentences , totallines , ratioblanklines ,
>> totalwords , avgwordlength , totalfnword , ratiofnwords , totalchars ,
>> ratioletters , ratiodigits , ratioucase , totalspcchars , totalpunc ,
>> ratiopunc , author
>> YES
>>
>> ,3,6,0.3423234,34,4.45454,5,0.3432,35,1.23232,0.342342,0.342342,8,5,0.234234,
>> Allen-P
>> YES
>>
>> ,4,7,1.3423234,35,5.45454,6,1.3432,36,2.23232,1.342342,1.342342,9,6,1.234234,
>> Allen-P
>> YES
>>
>> ,5,8,2.3423234,36,6.45454,7,2.3432,37,3.23232,2.342342,2.342342,10,7,2.234234,
>> Allen-P
>> YES
>>
>> ,6,9,3.3423234,37,7.45454,8,3.3432,38,4.23232,3.342342,3.342342,11,8,3.234234,
>> Allen-P
>> NO
>>
>> ,5,8,2.3423234,36,6.45454,7,2.3432,37,3.23232,2.342342,2.342342,10,7,2.234234,
>> Tanveer
>> NO
>>
>> ,6,9,3.3423234,37,7.45454,8,3.3432,38,4.23232,3.342342,3.342342,11,8,3.234234,
>> Tanveer
>>
>> And the test data is as follows:
>> hasreply , totalsentences , totallines , ratioblanklines ,
>> totalwords , avgwordlength , totalfnword , ratiofnwords , totalchars ,
>> ratioletters , ratiodigits , ratioucase , totalspcchars , totalpunc ,
>> ratiopunc, author
>> YES
>>
>> ,3,6,0.3423234,34,4.45454,3,0.3432,31,1.23232,0.342342,0.542342,2,7,0.234234,?
>>
>> Now after running the J48 on train data it shows me the
>> statistics and output fine. Now when I give this test data then it says
>> the train and test data are not compatible. All I want is , it will
>> suggest me the author from train data. Also the format
>> and columns are same and i put ? in place of author part in test data.
>> What am I doing wrong?
>> THanks._______________________________________________
>> Wekalist mailing list
>> Send posts to:
>> List info and subscription status:
>> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
>> List etiquette:
>> http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>>
>
>
>
>
> _______________________________________________
> Wekalist mailing list
> Send posts to:
> List info and subscription status:
> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette:
> http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
>
>
> _______________________________________________
> Wekalist mailing list
> Send posts to:
> List info and subscription status:
> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette:
> http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
>
_______________________________________________
Wekalist mailing list
Send posts to:
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
)
after looking in your arff file. The author values are missing in the
test file so the classifier can't know if its prediction were right or
wrong. You need to have a test file without missing classes (at least
for almost all lines...)
By the way it will be hard to get something from so small data files.
2011/6/16 Christophe Salperwyck <>:
> Hi,
>
> If you get a NaN, it is probably because the number of classes in the
> train file is greater than the number in the test file...
>
> By the way did you specify what is the class attribute in the classify
> tab? (default is the last one)
>
> Best,
> Christophe
>
> 2011/6/16 Tanveer Chowdhury <>:
>> Hi I created both train and test files to arff format.
>> Now the first step, creating the model with train data is fine as it was
>> before too. Now when give the arff test file then the output it gives I
>> can't make any sense of it.
>> What is the step to do this anyway? May be I am doing it wrong way. What I
>> am doing is
>> 1. Open Explorer. From Preprocess Tab I select the train arff file.
>> 2. Then went to Classify tab, and used the "Test Option" Use Training set
>> and ran the J48 on it. It shows the output fine so far.
>> 3. Now I again select the Supplied Test Set under Test Option and select the
>> test arff file. But now it's giving weird output of `NaN`, in ROC it's
>> showing ? mark.
>> It is the right way?
>> My objective is first to train with the train data set sample that I
>> attached and then just give the 1 instance test arff file and want it to
>> tell me probability which author it is.
>> Thank you.
>>
>> ________________________________
>> From: Sebastian Luna Valero <>
>> To: Weka machine learning workbench list. <>
>> Sent: Tuesday, June 14, 2011 2:14:45 AM
>> Subject: Re: [Wekalist] Getting error as train and test dataset are not
>> compatible
>>
>>
>> Hi,
>>
>> Convert both train and test files into arff format and tray again...
>>
>> HTH,
>> Sebastian
>>
>>
>>
>>> Hi,
>>> I have two csv files which I created; one for train and another for test.
>>> The contents are as below:
>>> hasreply , totalsentences , totallines , ratioblanklines ,
>>> totalwords , avgwordlength , totalfnword , ratiofnwords , totalchars ,
>>> ratioletters , ratiodigits , ratioucase , totalspcchars , totalpunc ,
>>> ratiopunc , author
>>> YES
>>>
>>> ,3,6,0.3423234,34,4.45454,5,0.3432,35,1.23232,0.342342,0.342342,8,5,0.234234,
>>> Allen-P
>>> YES
>>>
>>> ,4,7,1.3423234,35,5.45454,6,1.3432,36,2.23232,1.342342,1.342342,9,6,1.234234,
>>> Allen-P
>>> YES
>>>
>>> ,5,8,2.3423234,36,6.45454,7,2.3432,37,3.23232,2.342342,2.342342,10,7,2.234234,
>>> Allen-P
>>> YES
>>>
>>> ,6,9,3.3423234,37,7.45454,8,3.3432,38,4.23232,3.342342,3.342342,11,8,3.234234,
>>> Allen-P
>>> NO
>>>
>>> ,5,8,2.3423234,36,6.45454,7,2.3432,37,3.23232,2.342342,2.342342,10,7,2.234234,
>>> Tanveer
>>> NO
>>>
>>> ,6,9,3.3423234,37,7.45454,8,3.3432,38,4.23232,3.342342,3.342342,11,8,3.234234,
>>> Tanveer
>>>
>>> And the test data is as follows:
>>> hasreply , totalsentences , totallines , ratioblanklines ,
>>> totalwords , avgwordlength , totalfnword , ratiofnwords , totalchars ,
>>> ratioletters , ratiodigits , ratioucase , totalspcchars , totalpunc ,
>>> ratiopunc, author
>>> YES
>>>
>>> ,3,6,0.3423234,34,4.45454,3,0.3432,31,1.23232,0.342342,0.542342,2,7,0.234234,?
>>>
>>> Now after running the J48 on train data it shows me the
>>> statistics and output fine. Now when I give this test data then it says
>>> the train and test data are not compatible. All I want is , it will
>>> suggest me the author from train data. Also the format
>>> and columns are same and i put ? in place of author part in test data.
>>> What am I doing wrong?
>>> THanks._______________________________________________
>>> Wekalist mailing list
>>> Send posts to:
>>> List info and subscription status:
>>> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
>>> List etiquette:
>>> http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>>>
>>
>>
>>
>>
>> _______________________________________________
>> Wekalist mailing list
>> Send posts to:
>> List info and subscription status:
>> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
>> List etiquette:
>> http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>>
>>
>>
>> _______________________________________________
>> Wekalist mailing list
>> Send posts to:
>> List info and subscription status:
>> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
>> List etiquette:
>> http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>>
>>
>
_______________________________________________
Wekalist mailing list
Send posts to:
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
)
In your test file you need to have the class value set, now you have
"?" (which means missing value, that's why it is not working as I said
previously).
Best
2011/6/16 Tanveer Chowdhury <>:
> Hi,
> First of all thank you so much for taking your time to reply.
> As you suggested I now took this sample bank arff file; one for train and
> one for test but still it outputs this `NaN`
> after running test.
> I have attached the result buffer and the arff file for the bank.
> Thank you.
> ________________________________
> From: Christophe Salperwyck <>
> To: Tanveer Chowdhury <>; Weka machine learning
> workbench list. <>
> Sent: Wednesday, June 15, 2011 11:53:02 PM
> Subject: Re: [Wekalist] Getting error as train and test dataset are not
> compatible
>
> after looking in your arff file. The author values are missing in the
> test file so the classifier can't know if its prediction were right or
> wrong. You need to have a test file without missing classes (at least
> for almost all lines...)
>
> By the way it will be hard to get something from so small data files.
>
> 2011/6/16 Christophe Salperwyck <>:
>> Hi,
>>
>> If you get a NaN, it is probably because the number of classes in the
>> train file is greater than the number in the test file...
>>
>> By the way did you specify what is the class attribute in the classify
>> tab? (default is the last one)
>>
>> Best,
>> Christophe
>>
>> 2011/6/16 Tanveer Chowdhury <>:
>>> Hi I created both train and test files to arff format.
>>> Now the first step, creating the model with train data is fine as it was
>>> before too. Now when give the arff test file then the output it gives I
>>> can't make any sense of it.
>>> What is the step to do this anyway? May be I am doing it wrong way. What
>>> I
>>> am doing is
>>> 1. Open Explorer. From Preprocess Tab I select the train arff file.
>>> 2. Then went to Classify tab, and used the "Test Option" Use Training set
>>> and ran the J48 on it. It shows the output fine so far.
>>> 3. Now I again select the Supplied Test Set under Test Option and select
>>> the
>>> test arff file. But now it's giving weird output of `NaN`, in ROC it's
>>> showing ? mark.
>>> It is the right way?
>>> My objective is first to train with the train data set sample that I
>>> attached and then just give the 1 instance test arff file and want it to
>>> tell me probability which author it is.
>>> Thank you.
>>>
>>> ________________________________
>>> From: Sebastian Luna Valero <>
>>> To: Weka machine learning workbench list.
>>> <>
>>> Sent: Tuesday, June 14, 2011 2:14:45 AM
>>> Subject: Re: [Wekalist] Getting error as train and test dataset are not
>>> compatible
>>>
>>>
>>> Hi,
>>>
>>> Convert both train and test files into arff format and tray again...
>>>
>>> HTH,
>>> Sebastian
>>>
>>>
>>>
>>>> Hi,
>>>> I have two csv files which I created; one for train and another for
>>>> test.
>>>> The contents are as below:
>>>> hasreply , totalsentences , totallines , ratioblanklines ,
>>>> totalwords , avgwordlength , totalfnword , ratiofnwords , totalchars ,
>>>> ratioletters , ratiodigits , ratioucase , totalspcchars , totalpunc ,
>>>> ratiopunc , author
>>>> YES
>>>>
>>>>
>>>> ,3,6,0.3423234,34,4.45454,5,0.3432,35,1.23232,0.342342,0.342342,8,5,0.234234,
>>>> Allen-P
>>>> YES
>>>>
>>>>
>>>> ,4,7,1.3423234,35,5.45454,6,1.3432,36,2.23232,1.342342,1.342342,9,6,1.234234,
>>>> Allen-P
>>>> YES
>>>>
>>>>
>>>> ,5,8,2.3423234,36,6.45454,7,2.3432,37,3.23232,2.342342,2.342342,10,7,2.234234,
>>>> Allen-P
>>>> YES
>>>>
>>>>
>>>> ,6,9,3.3423234,37,7.45454,8,3.3432,38,4.23232,3.342342,3.342342,11,8,3.234234,
>>>> Allen-P
>>>> NO
>>>>
>>>>
>>>> ,5,8,2.3423234,36,6.45454,7,2.3432,37,3.23232,2.342342,2.342342,10,7,2.234234,
>>>> Tanveer
>>>> NO
>>>>
>>>>
>>>> ,6,9,3.3423234,37,7.45454,8,3.3432,38,4.23232,3.342342,3.342342,11,8,3.234234,
>>>> Tanveer
>>>>
>>>> And the test data is as follows:
>>>> hasreply , totalsentences , totallines , ratioblanklines ,
>>>> totalwords , avgwordlength , totalfnword , ratiofnwords , totalchars ,
>>>> ratioletters , ratiodigits , ratioucase , totalspcchars , totalpunc ,
>>>> ratiopunc, author
>>>> YES
>>>>
>>>>
>>>> ,3,6,0.3423234,34,4.45454,3,0.3432,31,1.23232,0.342342,0.542342,2,7,0.234234,?
>>>>
>>>> Now after running the J48 on train data it shows me the
>>>> statistics and output fine. Now when I give this test data then it says
>>>> the train and test data are not compatible. All I want is , it will
>>>> suggest me the author from train data. Also the format
>>>> and columns are same and i put ? in place of author part in test data.
>>>> What am I doing wrong?
>>>> THanks._______________________________________________
>>>> Wekalist mailing list
>>>> Send posts to:
>>>> List info and subscription status:
>>>> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
>>>> List etiquette:
>>>> http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Wekalist mailing list
>>> Send posts to:
>>> List info and subscription status:
>>> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
>>> List etiquette:
>>> http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>>>
>>>
>>>
>>> _______________________________________________
>>> Wekalist mailing list
>>> Send posts to:
>>> List info and subscription status:
>>> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
>>> List etiquette:
>>> http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>>>
>>>
>>
>
>
>
_______________________________________________
Wekalist mailing list
Send posts to:
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
)
Hi,
Thanks for your help. Now it's working for both train and test. But I need to understand something here.
Lets say I have 3 users say, A1 to A3 and have their training data in arff format. In that arff file in the last attribute will be author with of of their names in the { A1,A2,A3} as thats my class for classfication.
Now in test data I kind of don't know who the author is and that's what I want weka to let me know. Now if I put the arff file of test like this:
.... lines truncated ...
@attribute author {A1, A2, A3}
@data
YES ,3,6,0.3423234,34,4.45454,5,0.3432,35,1.23232,0.342342,0.342342,8,5,0.234234, A2
Here as I am giving "A2" as the author then what's the point of using weka? Though it tells me that the predicted author is `A1` and not `A2` even if I give `A2`. Then why we even put this false thing in the test data? may be I am getting the whole picture wrong.
Hope I made myself clear. thank you.
________________________________
From: Christophe Salperwyck <>
To: Tanveer Chowdhury <>
Cc: Weka machine learning workbench list. <>
Sent: Thursday, June 16, 2011 3:36:09 AM
Subject: Re: [Wekalist] Getting error as train and test dataset are not compatible
In your test file you need to have the class value set, now you have
"?" (which means missing value, that's why it is not working as I said
previously).
Best
2011/6/16 Tanveer Chowdhury <>:
> Hi,
> First of all thank you so much for taking your time to reply.
> As you suggested I now took this sample bank arff file; one for train and
> one for test but still it outputs this `NaN`
> after running test.
> I have attached the result buffer and the arff file for the bank.
> Thank you.
> ________________________________
> From: Christophe Salperwyck <>
> To: Tanveer Chowdhury <>; Weka machine learning
> workbench list. <>
> Sent: Wednesday, June 15, 2011 11:53:02 PM
> Subject: Re: [Wekalist] Getting error as train and test dataset are not
> compatible
>
> after looking in your arff file. The author values are missing in the
> test file so the classifier can't know if its prediction were right or
> wrong. You need to have a test file without missing classes (at least
> for almost all lines...)
>
> By the way it will be hard to get something from so small data files.
>
> 2011/6/16 Christophe Salperwyck <>:
>> Hi,
>>
>> If you get a NaN, it is probably because the number of classes in the
>> train file is greater than the number in the test file...
>>
>> By the way did you specify what is the class attribute in the classify
>> tab? (default is the last one)
>>
>> Best,
>> Christophe
>>
>> 2011/6/16 Tanveer Chowdhury <>:
>>> Hi I created both train and test files to arff format.
>>> Now the first step, creating the model with train data is fine as it was
>>> before too. Now when give the arff test file then the output it gives I
>>> can't make any sense of it.
>>> What is the step to do this anyway? May be I am doing it wrong way. What
>>> I
>>> am doing is
>>> 1. Open Explorer. From Preprocess Tab I select the train arff file.
>>> 2. Then went to Classify tab, and used the "Test Option" Use Training set
>>> and ran the J48 on it. It shows the output fine so far.
>>> 3. Now I again select the Supplied Test Set under Test Option and select
>>> the
>>> test arff file. But now it's giving weird output of `NaN`, in ROC it's
>>> showing ? mark.
>>> It is the right way?
>>> My objective is first to train with the train data set sample that I
>>> attached and then just give the 1 instance test arff file and want it to
>>> tell me probability which author it is.
>>> Thank you.
>>>
>>> ________________________________
>>> From: Sebastian Luna Valero <>
>>> To: Weka machine learning workbench list.
>>> <>
>>> Sent: Tuesday, June 14, 2011 2:14:45 AM
>>> Subject: Re: [Wekalist] Getting error as train and test dataset are not
>>> compatible
>>>
>>>
>>> Hi,
>>>
>>> Convert both train and test files into arff format and tray again...
>>>
>>> HTH,
>>> Sebastian
>>>
>>>
>>>
>>>> Hi,
>>>> I have two csv files which I created; one for train and another for
>>>> test.
>>>> The contents are as below:
>>>> hasreply , totalsentences , totallines , ratioblanklines ,
>>>> totalwords , avgwordlength , totalfnword , ratiofnwords , totalchars ,
>>>> ratioletters , ratiodigits , ratioucase , totalspcchars , totalpunc ,
>>>> ratiopunc , author
>>>> YES
>>>>
>>>>
>>>> ,3,6,0.3423234,34,4.45454,5,0.3432,35,1.23232,0.342342,0.342342,8,5,0.234234,
>>>> Allen-P
>>>> YES
>>>>
>>>>
>>>> ,4,7,1.3423234,35,5.45454,6,1.3432,36,2.23232,1.342342,1.342342,9,6,1.234234,
>>>> Allen-P
>>>> YES
>>>>
>>>>
>>>> ,5,8,2.3423234,36,6.45454,7,2.3432,37,3.23232,2.342342,2.342342,10,7,2.234234,
>>>> Allen-P
>>>> YES
>>>>
>>>>
>>>> ,6,9,3.3423234,37,7.45454,8,3.3432,38,4.23232,3.342342,3.342342,11,8,3.234234,
>>>> Allen-P
>>>> NO
>>>>
>>>>
>>>> ,5,8,2.3423234,36,6.45454,7,2.3432,37,3.23232,2.342342,2.342342,10,7,2.234234,
>>>> Tanveer
>>>> NO
>>>>
>>>>
>>>> ,6,9,3.3423234,37,7.45454,8,3.3432,38,4.23232,3.342342,3.342342,11,8,3.234234,
>>>> Tanveer
>>>>
>>>> And the test data is as follows:
>>>> hasreply , totalsentences , totallines , ratioblanklines ,
>>>> totalwords , avgwordlength , totalfnword , ratiofnwords , totalchars ,
>>>> ratioletters , ratiodigits , ratioucase , totalspcchars , totalpunc ,
>>>> ratiopunc, author
>>>> YES
>>>>
>>>>
>>>> ,3,6,0.3423234,34,4.45454,3,0.3432,31,1.23232,0.342342,0.542342,2,7,0.234234,?
>>>>
>>>> Now after running the J48 on train data it shows me the
>>>> statistics and output fine. Now when I give this test data then it says
>>>> the train and test data are not compatible. All I want is , it will
>>>> suggest me the author from train data. Also the format
>>>> and columns are same and i put ? in place of author part in test data.
>>>> What am I doing wrong?
>>>> THanks._______________________________________________
>>>> Wekalist mailing list
>>>> Send posts to:
>>>> List info and subscription status:
>>>> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
>>>> List etiquette:
>>>> http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Wekalist mailing list
>>> Send posts to:
>>> List info and subscription status:
>>> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
>>> List etiquette:
>>> http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>>>
>>>
>>>
>>> _______________________________________________
>>> Wekalist mailing list
>>> Send posts to:
>>> List info and subscription status:
>>> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
>>> List etiquette:
>>> http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>>>
>>>
>>
>
>
>
|
# 7

17-06-2011 09:14 AM
|
|
|
Hi,
I have two csv files which I created; one for train and another for test. The contents are as below:
hasreply , totalsentences , totallines , ratioblanklines ,
totalwords , avgwordlength , totalfnword , ratiofnwords , totalchars ,
ratioletters , ratiodigits , ratioucase , totalspcchars , totalpunc ,
ratiopunc , author
YES ,3,6,0.3423234,34,4.45454,5,0.3432,35,1.23232,0.342342,0.342342,8,5,0.234234, Allen-P
YES ,4,7,1.3423234,35,5.45454,6,1.3432,36,2.23232,1.342342,1.342342,9,6,1.234234, Allen-P
YES ,5,8,2.3423234,36,6.45454,7,2.3432,37,3.23232,2.342342,2.342342,10,7,2.234234, Allen-P
YES ,6,9,3.3423234,37,7.45454,8,3.3432,38,4.23232,3.342342,3.342342,11,8,3.234234, Allen-P
NO ,5,8,2.3423234,36,6.45454,7,2.3432,37,3.23232,2.342342,2.342342,10,7,2.234234, Tanveer
NO ,6,9,3.3423234,37,7.45454,8,3.3432,38,4.23232,3.342342,3.342342,11,8,3.234234, Tanveer
And the test data is as follows:
hasreply , totalsentences , totallines , ratioblanklines ,
totalwords , avgwordlength , totalfnword , ratiofnwords , totalchars ,
ratioletters , ratiodigits , ratioucase , totalspcchars , totalpunc ,
ratiopunc, author
YES ,3,6,0.3423234,34,4.45454,3,0.3432,31,1.23232,0.342342,0.542342,2,7,0.234234,?
Now after running the J48 on train data it shows me the
statistics and output fine. Now when I give this test data then it says
the train and test data are not compatible. All I want is , it will
suggest me the author from train data. Also the format
and columns are same and i put ? in place of author part in test data.
What am I doing wrong? THanks.
Hi I created both train and test files to arff format.
Now the first step, creating the model with train data is fine as it was before too. Now when give the arff test file then the output it gives I can't make any sense of it.
What is the step to do this anyway? May be I am doing it wrong way. What I am doing is
1. Open Explorer. From Preprocess Tab I select the train arff file.
2. Then went to Classify tab, and used the "Test Option" Use Training set and ran the J48 on it. It shows the output fine so far.
3. Now I again select the Supplied Test Set under Test Option and select the test arff file. But now it's giving weird output of `NaN`, in ROC it's showing ? mark.
It is the right way?
My objective is first to train with the train data set sample that I attached and then just give the 1 instance test arff file and want it to tell me probability which author it is.
Thank you.
________________________________
From: Sebastian Luna Valero <>
To: Weka machine learning workbench list. <>
Sent: Tuesday, June 14, 2011 2:14:45 AM
Subject: Re: [Wekalist] Getting error as train and test dataset are not compatible
Hi,
Convert both train and test files into arff format and tray again...
HTH,
Sebastian
> Hi,
> I have two csv files which I created; one for train and another for test.
> The contents are as below:
> hasreply , totalsentences , totallines , ratioblanklines ,
> totalwords , avgwordlength , totalfnword , ratiofnwords , totalchars ,
> ratioletters , ratiodigits , ratioucase , totalspcchars , totalpunc ,
> ratiopunc , author
> YES
> ,3,6,0.3423234,34,4.45454,5,0.3432,35,1.23232,0.342342,0.342342,8,5,0.234234,
> Allen-P
> YES
> ,4,7,1.3423234,35,5.45454,6,1.3432,36,2.23232,1.342342,1.342342,9,6,1.234234,
> Allen-P
> YES
> ,5,8,2.3423234,36,6.45454,7,2.3432,37,3.23232,2.342342,2.342342,10,7,2.234234,
> Allen-P
> YES
> ,6,9,3.3423234,37,7.45454,8,3.3432,38,4.23232,3.342342,3.342342,11,8,3.234234,
> Allen-P
> NO
> ,5,8,2.3423234,36,6.45454,7,2.3432,37,3.23232,2.342342,2.342342,10,7,2.234234,
> Tanveer
> NO
> ,6,9,3.3423234,37,7.45454,8,3.3432,38,4.23232,3.342342,3.342342,11,8,3.234234,
> Tanveer
>
> And the test data is as follows:
> hasreply , totalsentences , totallines , ratioblanklines ,
> totalwords , avgwordlength , totalfnword , ratiofnwords , totalchars ,
> ratioletters , ratiodigits , ratioucase , totalspcchars , totalpunc ,
> ratiopunc, author
> YES
> ,3,6,0.3423234,34,4.45454,3,0.3432,31,1.23232,0.342342,0.542342,2,7,0.234234,?
>
> Now after running the J48 on train data it shows me the
> statistics and output fine. Now when I give this test data then it says
> the train and test data are not compatible. All I want is , it will
> suggest me the author from train data. Also the format
> and columns are same and i put ? in place of author part in test data.
> What am I doing wrong?
> THanks._______________________________________________
> Wekalist mailing list
> Send posts to:
> List info and subscription status:
> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette:
> http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
_______________________________________________
Wekalist mailing list
Send posts to:
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Hi,
If you get a NaN, it is probably because the number of classes in the
train file is greater than the number in the test file...
By the way did you specify what is the class attribute in the classify
tab? (default is the last one)
Best,
Christophe
2011/6/16 Tanveer Chowdhury <>:
> Hi I created both train and test files to arff format.
> Now the first step, creating the model with train data is fine as it was
> before too. Now when give the arff test file then the output it gives I
> can't make any sense of it.
> What is the step to do this anyway? May be I am doing it wrong way. What I
> am doing is
> 1. Open Explorer. From Preprocess Tab I select the train arff file.
> 2. Then went to Classify tab, and used the "Test Option" Use Training set
> and ran the J48 on it. It shows the output fine so far.
> 3. Now I again select the Supplied Test Set under Test Option and select the
> test arff file. But now it's giving weird output of `NaN`, in ROC it's
> showing ? mark.
> It is the right way?
> My objective is first to train with the train data set sample that I
> attached and then just give the 1 instance test arff file and want it to
> tell me probability which author it is.
> Thank you.
>
> ________________________________
> From: Sebastian Luna Valero <>
> To: Weka machine learning workbench list. <>
> Sent: Tuesday, June 14, 2011 2:14:45 AM
> Subject: Re: [Wekalist] Getting error as train and test dataset are not
> compatible
>
>
> Hi,
>
> Convert both train and test files into arff format and tray again...
>
> HTH,
> Sebastian
>
>
>
>> Hi,
>> I have two csv files which I created; one for train and another for test.
>> The contents are as below:
>> hasreply , totalsentences , totallines , ratioblanklines ,
>> totalwords , avgwordlength , totalfnword , ratiofnwords , totalchars ,
>> ratioletters , ratiodigits , ratioucase , totalspcchars , totalpunc ,
>> ratiopunc , author
>> YES
>>
>> ,3,6,0.3423234,34,4.45454,5,0.3432,35,1.23232,0.342342,0.342342,8,5,0.234234,
>> Allen-P
>> YES
>>
>> ,4,7,1.3423234,35,5.45454,6,1.3432,36,2.23232,1.342342,1.342342,9,6,1.234234,
>> Allen-P
>> YES
>>
>> ,5,8,2.3423234,36,6.45454,7,2.3432,37,3.23232,2.342342,2.342342,10,7,2.234234,
>> Allen-P
>> YES
>>
>> ,6,9,3.3423234,37,7.45454,8,3.3432,38,4.23232,3.342342,3.342342,11,8,3.234234,
>> Allen-P
>> NO
>>
>> ,5,8,2.3423234,36,6.45454,7,2.3432,37,3.23232,2.342342,2.342342,10,7,2.234234,
>> Tanveer
>> NO
>>
>> ,6,9,3.3423234,37,7.45454,8,3.3432,38,4.23232,3.342342,3.342342,11,8,3.234234,
>> Tanveer
>>
>> And the test data is as follows:
>> hasreply , totalsentences , totallines , ratioblanklines ,
>> totalwords , avgwordlength , totalfnword , ratiofnwords , totalchars ,
>> ratioletters , ratiodigits , ratioucase , totalspcchars , totalpunc ,
>> ratiopunc, author
>> YES
>>
>> ,3,6,0.3423234,34,4.45454,3,0.3432,31,1.23232,0.342342,0.542342,2,7,0.234234,?
>>
>> Now after running the J48 on train data it shows me the
>> statistics and output fine. Now when I give this test data then it says
>> the train and test data are not compatible. All I want is , it will
>> suggest me the author from train data. Also the format
>> and columns are same and i put ? in place of author part in test data.
>> What am I doing wrong?
>> THanks._______________________________________________
>> Wekalist mailing list
>> Send posts to:
>> List info and subscription status:
>> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
>> List etiquette:
>> http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>>
>
>
>
>
> _______________________________________________
> Wekalist mailing list
> Send posts to:
> List info and subscription status:
> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette:
> http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
>
>
> _______________________________________________
> Wekalist mailing list
> Send posts to:
> List info and subscription status:
> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette:
> http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
>
_______________________________________________
Wekalist mailing list
Send posts to:
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
)
after looking in your arff file. The author values are missing in the
test file so the classifier can't know if its prediction were right or
wrong. You need to have a test file without missing classes (at least
for almost all lines...)
By the way it will be hard to get something from so small data files.
2011/6/16 Christophe Salperwyck <>:
> Hi,
>
> If you get a NaN, it is probably because the number of classes in the
> train file is greater than the number in the test file...
>
> By the way did you specify what is the class attribute in the classify
> tab? (default is the last one)
>
> Best,
> Christophe
>
> 2011/6/16 Tanveer Chowdhury <>:
>> Hi I created both train and test files to arff format.
>> Now the first step, creating the model with train data is fine as it was
>> before too. Now when give the arff test file then the output it gives I
>> can't make any sense of it.
>> What is the step to do this anyway? May be I am doing it wrong way. What I
>> am doing is
>> 1. Open Explorer. From Preprocess Tab I select the train arff file.
>> 2. Then went to Classify tab, and used the "Test Option" Use Training set
>> and ran the J48 on it. It shows the output fine so far.
>> 3. Now I again select the Supplied Test Set under Test Option and select the
>> test arff file. But now it's giving weird output of `NaN`, in ROC it's
>> showing ? mark.
>> It is the right way?
>> My objective is first to train with the train data set sample that I
>> attached and then just give the 1 instance test arff file and want it to
>> tell me probability which author it is.
>> Thank you.
>>
>> ________________________________
>> From: Sebastian Luna Valero <>
>> To: Weka machine learning workbench list. <>
>> Sent: Tuesday, June 14, 2011 2:14:45 AM
>> Subject: Re: [Wekalist] Getting error as train and test dataset are not
>> compatible
>>
>>
>> Hi,
>>
>> Convert both train and test files into arff format and tray again...
>>
>> HTH,
>> Sebastian
>>
>>
>>
>>> Hi,
>>> I have two csv files which I created; one for train and another for test.
>>> The contents are as below:
>>> hasreply , totalsentences , totallines , ratioblanklines ,
>>> totalwords , avgwordlength , totalfnword , ratiofnwords , totalchars ,
>>> ratioletters , ratiodigits , ratioucase , totalspcchars , totalpunc ,
>>> ratiopunc , author
>>> YES
>>>
>>> ,3,6,0.3423234,34,4.45454,5,0.3432,35,1.23232,0.342342,0.342342,8,5,0.234234,
>>> Allen-P
>>> YES
>>>
>>> ,4,7,1.3423234,35,5.45454,6,1.3432,36,2.23232,1.342342,1.342342,9,6,1.234234,
>>> Allen-P
>>> YES
>>>
>>> ,5,8,2.3423234,36,6.45454,7,2.3432,37,3.23232,2.342342,2.342342,10,7,2.234234,
>>> Allen-P
>>> YES
>>>
>>> ,6,9,3.3423234,37,7.45454,8,3.3432,38,4.23232,3.342342,3.342342,11,8,3.234234,
>>> Allen-P
>>> NO
>>>
>>> ,5,8,2.3423234,36,6.45454,7,2.3432,37,3.23232,2.342342,2.342342,10,7,2.234234,
>>> Tanveer
>>> NO
>>>
>>> ,6,9,3.3423234,37,7.45454,8,3.3432,38,4.23232,3.342342,3.342342,11,8,3.234234,
>>> Tanveer
>>>
>>> And the test data is as follows:
>>> hasreply , totalsentences , totallines , ratioblanklines ,
>>> totalwords , avgwordlength , totalfnword , ratiofnwords , totalchars ,
>>> ratioletters , ratiodigits , ratioucase , totalspcchars , totalpunc ,
>>> ratiopunc, author
>>> YES
>>>
>>> ,3,6,0.3423234,34,4.45454,3,0.3432,31,1.23232,0.342342,0.542342,2,7,0.234234,?
>>>
>>> Now after running the J48 on train data it shows me the
>>> statistics and output fine. Now when I give this test data then it says
>>> the train and test data are not compatible. All I want is , it will
>>> suggest me the author from train data. Also the format
>>> and columns are same and i put ? in place of author part in test data.
>>> What am I doing wrong?
>>> THanks._______________________________________________
>>> Wekalist mailing list
>>> Send posts to:
>>> List info and subscription status:
>>> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
>>> List etiquette:
>>> http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>>>
>>
>>
>>
>>
>> _______________________________________________
>> Wekalist mailing list
>> Send posts to:
>> List info and subscription status:
>> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
>> List etiquette:
>> http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>>
>>
>>
>> _______________________________________________
>> Wekalist mailing list
>> Send posts to:
>> List info and subscription status:
>> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
>> List etiquette:
>> http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>>
>>
>
_______________________________________________
Wekalist mailing list
Send posts to:
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
)
In your test file you need to have the class value set, now you have
"?" (which means missing value, that's why it is not working as I said
previously).
Best
2011/6/16 Tanveer Chowdhury <>:
> Hi,
> First of all thank you so much for taking your time to reply.
> As you suggested I now took this sample bank arff file; one for train and
> one for test but still it outputs this `NaN`
> after running test.
> I have attached the result buffer and the arff file for the bank.
> Thank you.
> ________________________________
> From: Christophe Salperwyck <>
> To: Tanveer Chowdhury <>; Weka machine learning
> workbench list. <>
> Sent: Wednesday, June 15, 2011 11:53:02 PM
> Subject: Re: [Wekalist] Getting error as train and test dataset are not
> compatible
>
> after looking in your arff file. The author values are missing in the
> test file so the classifier can't know if its prediction were right or
> wrong. You need to have a test file without missing classes (at least
> for almost all lines...)
>
> By the way it will be hard to get something from so small data files.
>
> 2011/6/16 Christophe Salperwyck <>:
>> Hi,
>>
>> If you get a NaN, it is probably because the number of classes in the
>> train file is greater than the number in the test file...
>>
>> By the way did you specify what is the class attribute in the classify
>> tab? (default is the last one)
>>
>> Best,
>> Christophe
>>
>> 2011/6/16 Tanveer Chowdhury <>:
>>> Hi I created both train and test files to arff format.
>>> Now the first step, creating the model with train data is fine as it was
>>> before too. Now when give the arff test file then the output it gives I
>>> can't make any sense of it.
>>> What is the step to do this anyway? May be I am doing it wrong way. What
>>> I
>>> am doing is
>>> 1. Open Explorer. From Preprocess Tab I select the train arff file.
>>> 2. Then went to Classify tab, and used the "Test Option" Use Training set
>>> and ran the J48 on it. It shows the output fine so far.
>>> 3. Now I again select the Supplied Test Set under Test Option and select
>>> the
>>> test arff file. But now it's giving weird output of `NaN`, in ROC it's
>>> showing ? mark.
>>> It is the right way?
>>> My objective is first to train with the train data set sample that I
>>> attached and then just give the 1 instance test arff file and want it to
>>> tell me probability which author it is.
>>> Thank you.
>>>
>>> ________________________________
>>> From: Sebastian Luna Valero <>
>>> To: Weka machine learning workbench list.
>>> <>
>>> Sent: Tuesday, June 14, 2011 2:14:45 AM
>>> Subject: Re: [Wekalist] Getting error as train and test dataset are not
>>> compatible
>>>
>>>
>>> Hi,
>>>
>>> Convert both train and test files into arff format and tray again...
>>>
>>> HTH,
>>> Sebastian
>>>
>>>
>>>
>>>> Hi,
>>>> I have two csv files which I created; one for train and another for
>>>> test.
>>>> The contents are as below:
>>>> hasreply , totalsentences , totallines , ratioblanklines ,
>>>> totalwords , avgwordlength , totalfnword , ratiofnwords , totalchars ,
>>>> ratioletters , ratiodigits , ratioucase , totalspcchars , totalpunc ,
>>>> ratiopunc , author
>>>> YES
>>>>
>>>>
>>>> ,3,6,0.3423234,34,4.45454,5,0.3432,35,1.23232,0.342342,0.342342,8,5,0.234234,
>>>> Allen-P
>>>> YES
>>>>
>>>>
>>>> ,4,7,1.3423234,35,5.45454,6,1.3432,36,2.23232,1.342342,1.342342,9,6,1.234234,
>>>> Allen-P
>>>> YES
>>>>
>>>>
>>>> ,5,8,2.3423234,36,6.45454,7,2.3432,37,3.23232,2.342342,2.342342,10,7,2.234234,
>>>> Allen-P
>>>> YES
>>>>
>>>>
>>>> ,6,9,3.3423234,37,7.45454,8,3.3432,38,4.23232,3.342342,3.342342,11,8,3.234234,
>>>> Allen-P
>>>> NO
>>>>
>>>>
>>>> ,5,8,2.3423234,36,6.45454,7,2.3432,37,3.23232,2.342342,2.342342,10,7,2.234234,
>>>> Tanveer
>>>> NO
>>>>
>>>>
>>>> ,6,9,3.3423234,37,7.45454,8,3.3432,38,4.23232,3.342342,3.342342,11,8,3.234234,
>>>> Tanveer
>>>>
>>>> And the test data is as follows:
>>>> hasreply , totalsentences , totallines , ratioblanklines ,
>>>> totalwords , avgwordlength , totalfnword , ratiofnwords , totalchars ,
>>>> ratioletters , ratiodigits , ratioucase , totalspcchars , totalpunc ,
>>>> ratiopunc, author
>>>> YES
>>>>
>>>>
>>>> ,3,6,0.3423234,34,4.45454,3,0.3432,31,1.23232,0.342342,0.542342,2,7,0.234234,?
>>>>
>>>> Now after running the J48 on train data it shows me the
>>>> statistics and output fine. Now when I give this test data then it says
>>>> the train and test data are not compatible. All I want is , it will
>>>> suggest me the author from train data. Also the format
>>>> and columns are same and i put ? in place of author part in test data.
>>>> What am I doing wrong?
>>>> THanks._______________________________________________
>>>> Wekalist mailing list
>>>> Send posts to:
>>>> List info and subscription status:
>>>> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
>>>> List etiquette:
>>>> http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Wekalist mailing list
>>> Send posts to:
>>> List info and subscription status:
>>> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
>>> List etiquette:
>>> http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>>>
>>>
>>>
>>> _______________________________________________
>>> Wekalist mailing list
>>> Send posts to:
>>> List info and subscription status:
>>> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
>>> List etiquette:
>>> http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>>>
>>>
>>
>
>
>
_______________________________________________
Wekalist mailing list
Send posts to:
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
)
Hi,
Thanks for your help. Now it's working for both train and test. But I need to understand something here.
Lets say I have 3 users say, A1 to A3 and have their training data in arff format. In that arff file in the last attribute will be author with of of their names in the { A1,A2,A3} as thats my class for classfication.
Now in test data I kind of don't know who the author is and that's what I want weka to let me know. Now if I put the arff file of test like this:
.... lines truncated ...
@attribute author {A1, A2, A3}
@data
YES ,3,6,0.3423234,34,4.45454,5,0.3432,35,1.23232,0.342342,0.342342,8,5,0.234234, A2
Here as I am giving "A2" as the author then what's the point of using weka? Though it tells me that the predicted author is `A1` and not `A2` even if I give `A2`. Then why we even put this false thing in the test data? may be I am getting the whole picture wrong.
Hope I made myself clear. thank you.
________________________________
From: Christophe Salperwyck <>
To: Tanveer Chowdhury <>
Cc: Weka machine learning workbench list. <>
Sent: Thursday, June 16, 2011 3:36:09 AM
Subject: Re: [Wekalist] Getting error as train and test dataset are not compatible
In your test file you need to have the class value set, now you have
"?" (which means missing value, that's why it is not working as I said
previously).
Best
2011/6/16 Tanveer Chowdhury <>:
> Hi,
> First of all thank you so much for taking your time to reply.
> As you suggested I now took this sample bank arff file; one for train and
> one for test but still it outputs this `NaN`
> after running test.
> I have attached the result buffer and the arff file for the bank.
> Thank you.
> ________________________________
> From: Christophe Salperwyck <>
> To: Tanveer Chowdhury <>; Weka machine learning
> workbench list. <>
> Sent: Wednesday, June 15, 2011 11:53:02 PM
> Subject: Re: [Wekalist] Getting error as train and test dataset are not
> compatible
>
> after looking in your arff file. The author values are missing in the
> test file so the classifier can't know if its prediction were right or
> wrong. You need to have a test file without missing classes (at least
> for almost all lines...)
>
> By the way it will be hard to get something from so small data files.
>
> 2011/6/16 Christophe Salperwyck <>:
>> Hi,
>>
>> If you get a NaN, it is probably because the number of classes in the
>> train file is greater than the number in the test file...
>>
>> By the way did you specify what is the class attribute in the classify
>> tab? (default is the last one)
>>
>> Best,
>> Christophe
>>
>> 2011/6/16 Tanveer Chowdhury <>:
>>> Hi I created both train and test files to arff format.
>>> Now the first step, creating the model with train data is fine as it was
>>> before too. Now when give the arff test file then the output it gives I
>>> can't make any sense of it.
>>> What is the step to do this anyway? May be I am doing it wrong way. What
>>> I
>>> am doing is
>>> 1. Open Explorer. From Preprocess Tab I select the train arff file.
>>> 2. Then went to Classify tab, and used the "Test Option" Use Training set
>>> and ran the J48 on it. It shows the output fine so far.
>>> 3. Now I again select the Supplied Test Set under Test Option and select
>>> the
>>> test arff file. But now it's giving weird output of `NaN`, in ROC it's
>>> showing ? mark.
>>> It is the right way?
>>> My objective is first to train with the train data set sample that I
>>> attached and then just give the 1 instance test arff file and want it to
>>> tell me probability which author it is.
>>> Thank you.
>>>
>>> ________________________________
>>> From: Sebastian Luna Valero <>
>>> To: Weka machine learning workbench list.
>>> <>
>>> Sent: Tuesday, June 14, 2011 2:14:45 AM
>>> Subject: Re: [Wekalist] Getting error as train and test dataset are not
>>> compatible
>>>
>>>
>>> Hi,
>>>
>>> Convert both train and test files into arff format and tray again...
>>>
>>> HTH,
>>> Sebastian
>>>
>>>
>>>
>>>> Hi,
>>>> I have two csv files which I created; one for train and another for
>>>> test.
>>>> The contents are as below:
>>>> hasreply , totalsentences , totallines , ratioblanklines ,
>>>> totalwords , avgwordlength , totalfnword , ratiofnwords , totalchars ,
>>>> ratioletters , ratiodigits , ratioucase , totalspcchars , totalpunc ,
>>>> ratiopunc , author
>>>> YES
>>>>
>>>>
>>>> ,3,6,0.3423234,34,4.45454,5,0.3432,35,1.23232,0.342342,0.342342,8,5,0.234234,
>>>> Allen-P
>>>> YES
>>>>
>>>>
>>>> ,4,7,1.3423234,35,5.45454,6,1.3432,36,2.23232,1.342342,1.342342,9,6,1.234234,
>>>> Allen-P
>>>> YES
>>>>
>>>>
>>>> ,5,8,2.3423234,36,6.45454,7,2.3432,37,3.23232,2.342342,2.342342,10,7,2.234234,
>>>> Allen-P
>>>> YES
>>>>
>>>>
>>>> ,6,9,3.3423234,37,7.45454,8,3.3432,38,4.23232,3.342342,3.342342,11,8,3.234234,
>>>> Allen-P
>>>> NO
>>>>
>>>>
>>>> ,5,8,2.3423234,36,6.45454,7,2.3432,37,3.23232,2.342342,2.342342,10,7,2.234234,
>>>> Tanveer
>>>> NO
>>>>
>>>>
>>>> ,6,9,3.3423234,37,7.45454,8,3.3432,38,4.23232,3.342342,3.342342,11,8,3.234234,
>>>> Tanveer
>>>>
>>>> And the test data is as follows:
>>>> hasreply , totalsentences , totallines , ratioblanklines ,
>>>> totalwords , avgwordlength , totalfnword , ratiofnwords , totalchars ,
>>>> ratioletters , ratiodigits , ratioucase , totalspcchars , totalpunc ,
>>>> ratiopunc, author
>>>> YES
>>>>
>>>>
>>>> ,3,6,0.3423234,34,4.45454,3,0.3432,31,1.23232,0.342342,0.542342,2,7,0.234234,?
>>>>
>>>> Now after running the J48 on train data it shows me the
>>>> statistics and output fine. Now when I give this test data then it says
>>>> the train and test data are not compatible. All I want is , it will
>>>> suggest me the author from train data. Also the format
>>>> and columns are same and i put ? in place of author part in test data.
>>>> What am I doing wrong?
>>>> THanks._______________________________________________
>>>> Wekalist mailing list
>>>> Send posts to:
>>>> List info and subscription status:
>>>> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
>>>> List etiquette:
>>>> http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Wekalist mailing list
>>> Send posts to:
>>> List info and subscription status:
>>> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
>>> List etiquette:
>>> http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>>>
>>>
>>>
>>> _______________________________________________
>>> Wekalist mailing list
>>> Send posts to:
>>> List info and subscription status:
>>> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
>>> List etiquette:
>>> http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>>>
>>>
>>
>
>
>
Hi,
You need to apply the built model on the new data.
Reading this should help you (from figure 30-32):
http://maya.cs.depaul.edu/classes/ect584/weka/classify.html
"
Based on the above command, our classification model has been stored
in the file "bank.model" and placed in the directory we specified. We
can now apply this model to the new instances. The advantage of
building a model and storing it is that it can be applied at any time
to different sets of unclassified instances. The command for doing so
is:
java weka.classifiers.trees.J48 -p 9 -l directory-path\bank.model -T
directory-path \bank-new.arff
In the above command, the option -p 9 indicates that we want to
predict a value for attribute number 9 (which is "pep"). The -l
options specifies the directory path and name of the model file (this
is what was created in the previous step). Finally, the -T option
specifies the name (and path) of the test data. In our example, the
test data is our new instances file "bank-new.arff").
"
Best,
Christophe
2011/6/16 Tanveer Chowdhury <>:
> Hi,
> Thanks for your help. Now it's working for both train and test. But I need
> to understand something here.
> Lets say I have 3 users say, A1 to A3 and have their training data in arff
> format. In that arff file in the last attribute will be author with of of
> their names in the { A1,A2,A3} as thats my class for classfication.
> Now in test data I kind of don't know who the author is and that's what I
> want weka to let me know. Now if I put the arff file of test like this:
> .... lines truncated ...
> @attribute author {A1, A2, A3}
>
> @data
> YES
> ,3,6,0.3423234,34,4.45454,5,0.3432,35,1.23232,0.342342,0.342342,8,5,0.234234,
> A2
>
> Here as I am giving "A2" as the author then what's the point of using weka?
> Though it tells me that the predicted author is `A1` and not `A2` even if I
> give `A2`. Then why we even put this false thing in the test data? may be I
> am getting the whole picture wrong.
>
> Hope I made myself clear. thank you.
> ________________________________
> From: Christophe Salperwyck <>
> To: Tanveer Chowdhury <>
> Cc: Weka machine learning workbench list. <>
> Sent: Thursday, June 16, 2011 3:36:09 AM
> Subject: Re: [Wekalist] Getting error as train and test dataset are not
> compatible
>
> In your test file you need to have the class value set, now you have
> "?" (which means missing value, that's why it is not working as I said
> previously).
>
> Best
>
> 2011/6/16 Tanveer Chowdhury <>:
>> Hi,
>> First of all thank you so much for taking your time to reply.
>> As you suggested I now took this sample bank arff file; one for train and
>> one for test but still it outputs this `NaN`
>> after running test.
>> I have attached the result buffer and the arff file for the bank.
>> Thank you.
>> ________________________________
>> From: Christophe Salperwyck <>
>> To: Tanveer Chowdhury <>; Weka machine learning
>> workbench list. <>
>> Sent: Wednesday, June 15, 2011 11:53:02 PM
>> Subject: Re: [Wekalist] Getting error as train and test dataset are not
>> compatible
>>
>> after looking in your arff file. The author values are missing in the
>> test file so the classifier can't know if its prediction were right or
>> wrong. You need to have a test file without missing classes (at least
>> for almost all lines...)
>>
>> By the way it will be hard to get something from so small data files.
>>
>> 2011/6/16 Christophe Salperwyck <>:
>>> Hi,
>>>
>>> If you get a NaN, it is probably because the number of classes in the
>>> train file is greater than the number in the test file...
>>>
>>> By the way did you specify what is the class attribute in the classify
>>> tab? (default is the last one)
>>>
>>> Best,
>>> Christophe
>>>
>>> 2011/6/16 Tanveer Chowdhury <>:
>>>> Hi I created both train and test files to arff format.
>>>> Now the first step, creating the model with train data is fine as it was
>>>> before too. Now when give the arff test file then the output it gives I
>>>> can't make any sense of it.
>>>> What is the step to do this anyway? May be I am doing it wrong way. What
>>>> I
>>>> am doing is
>>>> 1. Open Explorer. From Preprocess Tab I select the train arff file.
>>>> 2. Then went to Classify tab, and used the "Test Option" Use Training
>>>> set
>>>> and ran the J48 on it. It shows the output fine so far.
>>>> 3. Now I again select the Supplied Test Set under Test Option and select
>>>> the
>>>> test arff file. But now it's giving weird output of `NaN`, in ROC it's
>>>> showing ? mark.
>>>> It is the right way?
>>>> My objective is first to train with the train data set sample that I
>>>> attached and then just give the 1 instance test arff file and want it to
>>>> tell me probability which author it is.
>>>> Thank you.
>>>>
>>>> ________________________________
>>>> From: Sebastian Luna Valero <>
>>>> To: Weka machine learning workbench list.
>>>> <>
>>>> Sent: Tuesday, June 14, 2011 2:14:45 AM
>>>> Subject: Re: [Wekalist] Getting error as train and test dataset are not
>>>> compatible
>>>>
>>>>
>>>> Hi,
>>>>
>>>> Convert both train and test files into arff format and tray again...
>>>>
>>>> HTH,
>>>> Sebastian
>>>>
>>>>
>>>>
>>>>> Hi,
>>>>> I have two csv files which I created; one for train and another for
>>>>> test.
>>>>> The contents are as below:
>>>>> hasreply , totalsentences , totallines , ratioblanklines ,
>>>>> totalwords , avgwordlength , totalfnword , ratiofnwords , totalchars ,
>>>>> ratioletters , ratiodigits , ratioucase , totalspcchars , totalpunc ,
>>>>> ratiopunc , author
>>>>> YES
>>>>>
>>>>>
>>>>>
>>>>> ,3,6,0.3423234,34,4.45454,5,0.3432,35,1.23232,0.342342,0.342342,8,5,0.234234,
>>>>> Allen-P
>>>>> YES
>>>>>
>>>>>
>>>>>
>>>>> ,4,7,1.3423234,35,5.45454,6,1.3432,36,2.23232,1.342342,1.342342,9,6,1.234234,
>>>>> Allen-P
>>>>> YES
>>>>>
>>>>>
>>>>>
>>>>> ,5,8,2.3423234,36,6.45454,7,2.3432,37,3.23232,2.342342,2.342342,10,7,2.234234,
>>>>> Allen-P
>>>>> YES
>>>>>
>>>>>
>>>>>
>>>>> ,6,9,3.3423234,37,7.45454,8,3.3432,38,4.23232,3.342342,3.342342,11,8,3.234234,
>>>>> Allen-P
>>>>> NO
>>>>>
>>>>>
>>>>>
>>>>> ,5,8,2.3423234,36,6.45454,7,2.3432,37,3.23232,2.342342,2.342342,10,7,2.234234,
>>>>> Tanveer
>>>>> NO
>>>>>
>>>>>
>>>>>
>>>>> ,6,9,3.3423234,37,7.45454,8,3.3432,38,4.23232,3.342342,3.342342,11,8,3.234234,
>>>>> Tanveer
>>>>>
>>>>> And the test data is as follows:
>>>>> hasreply , totalsentences , totallines , ratioblanklines ,
>>>>> totalwords , avgwordlength , totalfnword , ratiofnwords , totalchars ,
>>>>> ratioletters , ratiodigits , ratioucase , totalspcchars , totalpunc ,
>>>>> ratiopunc, author
>>>>> YES
>>>>>
>>>>>
>>>>>
>>>>> ,3,6,0.3423234,34,4.45454,3,0.3432,31,1.23232,0.342342,0.542342,2,7,0.234234,?
>>>>>
>>>>> Now after running the J48 on train data it shows me the
>>>>> statistics and output fine. Now when I give this test data then it says
>>>>> the train and test data are not compatible. All I want is , it will
>>>>> suggest me the author from train data. Also the format
>>>>> and columns are same and i put ? in place of author part in test data.
>>>>> What am I doing wrong?
>>>>> THanks._______________________________________________
>>>>> Wekalist mailing list
>>>>> Send posts to:
>>>>> List info and subscription status:
>>>>> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
>>>>> List etiquette:
>>>>> http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>>>>>
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Wekalist mailing list
>>>> Send posts to:
>>>> List info and subscription status:
>>>> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
>>>> List etiquette:
>>>> http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Wekalist mailing list
>>>> Send posts to:
>>>> List info and subscription status:
>>>> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
>>>> List etiquette:
>>>> http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>>>>
>>>>
>>>
>>
>>
>>
>
>
>
_______________________________________________
Wekalist mailing list
Send posts to:
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
)
|
# 8

19-06-2011 01:28 AM
|
|
|
Hi,
I have two csv files which I created; one for train and another for test. The contents are as below:
hasreply , totalsentences , totallines , ratioblanklines ,
totalwords , avgwordlength , totalfnword , ratiofnwords , totalchars ,
ratioletters , ratiodigits , ratioucase , totalspcchars , totalpunc ,
ratiopunc , author
YES ,3,6,0.3423234,34,4.45454,5,0.3432,35,1.23232,0.342342,0.342342,8,5,0.234234, Allen-P
YES ,4,7,1.3423234,35,5.45454,6,1.3432,36,2.23232,1.342342,1.342342,9,6,1.234234, Allen-P
YES ,5,8,2.3423234,36,6.45454,7,2.3432,37,3.23232,2.342342,2.342342,10,7,2.234234, Allen-P
YES ,6,9,3.3423234,37,7.45454,8,3.3432,38,4.23232,3.342342,3.342342,11,8,3.234234, Allen-P
NO ,5,8,2.3423234,36,6.45454,7,2.3432,37,3.23232,2.342342,2.342342,10,7,2.234234, Tanveer
NO ,6,9,3.3423234,37,7.45454,8,3.3432,38,4.23232,3.342342,3.342342,11,8,3.234234, Tanveer
And the test data is as follows:
hasreply , totalsentences , totallines , ratioblanklines ,
totalwords , avgwordlength , totalfnword , ratiofnwords , totalchars ,
ratioletters , ratiodigits , ratioucase , totalspcchars , totalpunc ,
ratiopunc, author
YES ,3,6,0.3423234,34,4.45454,3,0.3432,31,1.23232,0.342342,0.542342,2,7,0.234234,?
Now after running the J48 on train data it shows me the
statistics and output fine. Now when I give this test data then it says
the train and test data are not compatible. All I want is , it will
suggest me the author from train data. Also the format
and columns are same and i put ? in place of author part in test data.
What am I doing wrong? THanks.
Hi I created both train and test files to arff format.
Now the first step, creating the model with train data is fine as it was before too. Now when give the arff test file then the output it gives I can't make any sense of it.
What is the step to do this anyway? May be I am doing it wrong way. What I am doing is
1. Open Explorer. From Preprocess Tab I select the train arff file.
2. Then went to Classify tab, and used the "Test Option" Use Training set and ran the J48 on it. It shows the output fine so far.
3. Now I again select the Supplied Test Set under Test Option and select the test arff file. But now it's giving weird output of `NaN`, in ROC it's showing ? mark.
It is the right way?
My objective is first to train with the train data set sample that I attached and then just give the 1 instance test arff file and want it to tell me probability which author it is.
Thank you.
________________________________
From: Sebastian Luna Valero <>
To: Weka machine learning workbench list. <>
Sent: Tuesday, June 14, 2011 2:14:45 AM
Subject: Re: [Wekalist] Getting error as train and test dataset are not compatible
Hi,
Convert both train and test files into arff format and tray again...
HTH,
Sebastian
> Hi,
> I have two csv files which I created; one for train and another for test.
> The contents are as below:
> hasreply , totalsentences , totallines , ratioblanklines ,
> totalwords , avgwordlength , totalfnword , ratiofnwords , totalchars ,
> ratioletters , ratiodigits , ratioucase , totalspcchars , totalpunc ,
> ratiopunc , author
> YES
> ,3,6,0.3423234,34,4.45454,5,0.3432,35,1.23232,0.342342,0.342342,8,5,0.234234,
> Allen-P
> YES
> ,4,7,1.3423234,35,5.45454,6,1.3432,36,2.23232,1.342342,1.342342,9,6,1.234234,
> Allen-P
> YES
> ,5,8,2.3423234,36,6.45454,7,2.3432,37,3.23232,2.342342,2.342342,10,7,2.234234,
> Allen-P
> YES
> ,6,9,3.3423234,37,7.45454,8,3.3432,38,4.23232,3.342342,3.342342,11,8,3.234234,
> Allen-P
> NO
> ,5,8,2.3423234,36,6.45454,7,2.3432,37,3.23232,2.342342,2.342342,10,7,2.234234,
> Tanveer
> NO
> ,6,9,3.3423234,37,7.45454,8,3.3432,38,4.23232,3.342342,3.342342,11,8,3.234234,
> Tanveer
>
> And the test data is as follows:
> hasreply , totalsentences , totallines , ratioblanklines ,
> totalwords , avgwordlength , totalfnword , ratiofnwords , totalchars ,
> ratioletters , ratiodigits , ratioucase , totalspcchars , totalpunc ,
> ratiopunc, author
> YES
> ,3,6,0.3423234,34,4.45454,3,0.3432,31,1.23232,0.342342,0.542342,2,7,0.234234,?
>
> Now after running the J48 on train data it shows me the
> statistics and output fine. Now when I give this test data then it says
> the train and test data are not compatible. All I want is , it will
> suggest me the author from train data. Also the format
> and columns are same and i put ? in place of author part in test data.
> What am I doing wrong?
> THanks._______________________________________________
> Wekalist mailing list
> Send posts to:
> List info and subscription status:
> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette:
> http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
_______________________________________________
Wekalist mailing list
Send posts to:
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Hi,
If you get a NaN, it is probably because the number of classes in the
train file is greater than the number in the test file...
By the way did you specify what is the class attribute in the classify
tab? (default is the last one)
Best,
Christophe
2011/6/16 Tanveer Chowdhury <>:
> Hi I created both train and test files to arff format.
> Now the first step, creating the model with train data is fine as it was
> before too. Now when give the arff test file then the output it gives I
> can't make any sense of it.
> What is the step to do this anyway? May be I am doing it wrong way. What I
> am doing is
> 1. Open Explorer. From Preprocess Tab I select the train arff file.
> 2. Then went to Classify tab, and used the "Test Option" Use Training set
> and ran the J48 on it. It shows the output fine so far.
> 3. Now I again select the Supplied Test Set under Test Option and select the
> test arff file. But now it's giving weird output of `NaN`, in ROC it's
> showing ? mark.
> It is the right way?
> My objective is first to train with the train data set sample that I
> attached and then just give the 1 instance test arff file and want it to
> tell me probability which author it is.
> Thank you.
>
> ________________________________
> From: Sebastian Luna Valero <>
> To: Weka machine learning workbench list. <>
> Sent: Tuesday, June 14, 2011 2:14:45 AM
> Subject: Re: [Wekalist] Getting error as train and test dataset are not
> compatible
>
>
> Hi,
>
> Convert both train and test files into arff format and tray again...
>
> HTH,
> Sebastian
>
>
>
>> Hi,
>> I have two csv files which I created; one for train and another for test.
>> The contents are as below:
>> hasreply , totalsentences , totallines , ratioblanklines ,
>> totalwords , avgwordlength , totalfnword , ratiofnwords , totalchars ,
>> ratioletters , ratiodigits , ratioucase , totalspcchars , totalpunc ,
>> ratiopunc , author
>> YES
>>
>> ,3,6,0.3423234,34,4.45454,5,0.3432,35,1.23232,0.342342,0.342342,8,5,0.234234,
>> Allen-P
>> YES
>>
>> ,4,7,1.3423234,35,5.45454,6,1.3432,36,2.23232,1.342342,1.342342,9,6,1.234234,
>> Allen-P
>> YES
>>
>> ,5,8,2.3423234,36,6.45454,7,2.3432,37,3.23232,2.342342,2.342342,10,7,2.234234,
>> Allen-P
>> YES
>>
>> ,6,9,3.3423234,37,7.45454,8,3.3432,38,4.23232,3.342342,3.342342,11,8,3.234234,
>> Allen-P
>> NO
>>
>> ,5,8,2.3423234,36,6.45454,7,2.3432,37,3.23232,2.342342,2.342342,10,7,2.234234,
>> Tanveer
>> NO
>>
>> ,6,9,3.3423234,37,7.45454,8,3.3432,38,4.23232,3.342342,3.342342,11,8,3.234234,
>> Tanveer
>>
>> And the test data is as follows:
>> hasreply , totalsentences , totallines , ratioblanklines ,
>> totalwords , avgwordlength , totalfnword , ratiofnwords , totalchars ,
>> ratioletters , ratiodigits , ratioucase , totalspcchars , totalpunc ,
>> ratiopunc, author
>> YES
>>
>> ,3,6,0.3423234,34,4.45454,3,0.3432,31,1.23232,0.342342,0.542342,2,7,0.234234,?
>>
>> Now after running the J48 on train data it shows me the
>> statistics and output fine. Now when I give this test data then it says
>> the train and test data are not compatible. All I want is , it will
>> suggest me the author from train data. Also the format
>> and columns are same and i put ? in place of author part in test data.
>> What am I doing wrong?
>> THanks._______________________________________________
>> Wekalist mailing list
>> Send posts to:
>> List info and subscription status:
>> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
>> List etiquette:
>> http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>>
>
>
>
>
> _______________________________________________
> Wekalist mailing list
> Send posts to:
> List info and subscription status:
> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette:
> http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
>
>
> _______________________________________________
> Wekalist mailing list
> Send posts to:
> List info and subscription status:
> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette:
> http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
>
_______________________________________________
Wekalist mailing list
Send posts to:
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
)
after looking in your arff file. The author values are missing in the
test file so the classifier can't know if its prediction were right or
wrong. You need to have a test file without missing classes (at least
for almost all lines...)
By the way it will be hard to get something from so small data files.
2011/6/16 Christophe Salperwyck <>:
> Hi,
>
> If you get a NaN, it is probably because the number of classes in the
> train file is greater than the number in the test file...
>
> By the way did you specify what is the class attribute in the classify
> tab? (default is the last one)
>
> Best,
> Christophe
>
> 2011/6/16 Tanveer Chowdhury <>:
>> Hi I created both train and test files to arff format.
>> Now the first step, creating the model with train data is fine as it was
>> before too. Now when give the arff test file then the output it gives I
>> can't make any sense of it.
>> What is the step to do this anyway? May be I am doing it wrong way. What I
>> am doing is
>> 1. Open Explorer. From Preprocess Tab I select the train arff file.
>> 2. Then went to Classify tab, and used the "Test Option" Use Training set
>> and ran the J48 on it. It shows the output fine so far.
>> 3. Now I again select the Supplied Test Set under Test Option and select the
>> test arff file. But now it's giving weird output of `NaN`, in ROC it's
>> showing ? mark.
>> It is the right way?
>> My objective is first to train with the train data set sample that I
>> attached and then just give the 1 instance test arff file and want it to
>> tell me probability which author it is.
>> Thank you.
>>
>> ________________________________
>> From: Sebastian Luna Valero <>
>> To: Weka machine learning workbench list. <>
>> Sent: Tuesday, June 14, 2011 2:14:45 AM
>> Subject: Re: [Wekalist] Getting error as train and test dataset are not
>> compatible
>>
>>
>> Hi,
>>
>> Convert both train and test files into arff format and tray again...
>>
>> HTH,
>> Sebastian
>>
>>
>>
>>> Hi,
>>> I have two csv files which I created; one for train and another for test.
>>> The contents are as below:
>>> hasreply , totalsentences , totallines , ratioblanklines ,
>>> totalwords , avgwordlength , totalfnword , ratiofnwords , totalchars ,
>>> ratioletters , ratiodigits , ratioucase , totalspcchars , totalpunc ,
>>> ratiopunc , author
>>> YES
>>>
>>> ,3,6,0.3423234,34,4.45454,5,0.3432,35,1.23232,0.342342,0.342342,8,5,0.234234,
>>> Allen-P
>>> YES
>>>
>>> ,4,7,1.3423234,35,5.45454,6,1.3432,36,2.23232,1.342342,1.342342,9,6,1.234234,
>>> Allen-P
>>> YES
>>>
>>> ,5,8,2.3423234,36,6.45454,7,2.3432,37,3.23232,2.342342,2.342342,10,7,2.234234,
>>> Allen-P
>>> YES
>>>
>>> ,6,9,3.3423234,37,7.45454,8,3.3432,38,4.23232,3.342342,3.342342,11,8,3.234234,
>>> Allen-P
>>> NO
>>>
>>> ,5,8,2.3423234,36,6.45454,7,2.3432,37,3.23232,2.342342,2.342342,10,7,2.234234,
>>> Tanveer
>>> NO
>>>
>>> ,6,9,3.3423234,37,7.45454,8,3.3432,38,4.23232,3.342342,3.342342,11,8,3.234234,
>>> Tanveer
>>>
>>> And the test data is as follows:
>>> hasreply , totalsentences , totallines , ratioblanklines ,
>>> totalwords , avgwordlength , totalfnword , ratiofnwords , totalchars ,
>>> ratioletters , ratiodigits , ratioucase , totalspcchars , totalpunc ,
>>> ratiopunc, author
>>> YES
>>>
>>> ,3,6,0.3423234,34,4.45454,3,0.3432,31,1.23232,0.342342,0.542342,2,7,0.234234,?
>>>
>>> Now after running the J48 on train data it shows me the
>>> statistics and output fine. Now when I give this test data then it says
>>> the train and test data are not compatible. All I want is , it will
>>> suggest me the author from train data. Also the format
>>> and columns are same and i put ? in place of author part in test data.
>>> What am I doing wrong?
>>> THanks._______________________________________________
>>> Wekalist mailing list
>>> Send posts to:
>>> List info and subscription status:
>>> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
>>> List etiquette:
>>> http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>>>
>>
>>
>>
>>
>> _______________________________________________
>> Wekalist mailing list
>> Send posts to:
>> List info and subscription status:
>> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
>> List etiquette:
>> http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>>
>>
>>
>> _______________________________________________
>> Wekalist mailing list
>> Send posts to:
>> List info and subscription status:
>> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
>> List etiquette:
>> http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>>
>>
>
_______________________________________________
Wekalist mailing list
Send posts to:
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
)
In your test file you need to have the class value set, now you have
"?" (which means missing value, that's why it is not working as I said
previously).
Best
2011/6/16 Tanveer Chowdhury <>:
> Hi,
> First of all thank you so much for taking your time to reply.
> As you suggested I now took this sample bank arff file; one for train and
> one for test but still it outputs this `NaN`
> after running test.
> I have attached the result buffer and the arff file for the bank.
> Thank you.
> ________________________________
> From: Christophe Salperwyck <>
> To: Tanveer Chowdhury <>; Weka machine learning
> workbench list. <>
> Sent: Wednesday, June 15, 2011 11:53:02 PM
> Subject: Re: [Wekalist] Getting error as train and test dataset are not
> compatible
>
> after looking in your arff file. The author values are missing in the
> test file so the classifier can't know if its prediction were right or
> wrong. You need to have a test file without missing classes (at least
> for almost all lines...)
>
> By the way it will be hard to get something from so small data files.
>
> 2011/6/16 Christophe Salperwyck <>:
>> Hi,
>>
>> If you get a NaN, it is probably because the number of classes in the
>> train file is greater than the number in the test file...
>>
>> By the way did you specify what is the class attribute in the classify
>> tab? (default is the last one)
>>
>> Best,
>> Christophe
>>
>> 2011/6/16 Tanveer Chowdhury <>:
>>> Hi I created both train and test files to arff format.
>>> Now the first step, creating the model with train data is fine as it was
>>> before too. Now when give the arff test file then the output it gives I
>>> can't make any sense of it.
>>> What is the step to do this anyway? May be I am doing it wrong way. What
>>> I
>>> am doing is
>>> 1. Open Explorer. From Preprocess Tab I select the train arff file.
>>> 2. Then went to Classify tab, and used the "Test Option" Use Training set
>>> and ran the J48 on it. It shows the output fine so far.
>>> 3. Now I again select the Supplied Test Set under Test Option and select
>>> the
>>> test arff file. But now it's giving weird output of `NaN`, in ROC it's
>>> showing ? mark.
>>> It is the right way?
>>> My objective is first to train with the train data set sample that I
>>> attached and then just give the 1 instance test arff file and want it to
>>> tell me probability which author it is.
>>> Thank you.
>>>
>>> ________________________________
>>> From: Sebastian Luna Valero <>
>>> To: Weka machine learning workbench list.
>>> <>
>>> Sent: Tuesday, June 14, 2011 2:14:45 AM
>>> Subject: Re: [Wekalist] Getting error as train and test dataset are not
>>> compatible
>>>
>>>
>>> Hi,
>>>
>>> Convert both train and test files into arff format and tray again...
>>>
>>> HTH,
>>> Sebastian
>>>
>>>
>>>
>>>> Hi,
>>>> I have two csv files which I created; one for train and another for
>>>> test.
>>>> The contents are as below:
>>>> hasreply , totalsentences , totallines , ratioblanklines ,
>>>> totalwords , avgwordlength , totalfnword , ratiofnwords , totalchars ,
>>>> ratioletters , ratiodigits , ratioucase , totalspcchars , totalpunc ,
>>>> ratiopunc , author
>>>> YES
>>>>
>>>>
>>>> ,3,6,0.3423234,34,4.45454,5,0.3432,35,1.23232,0.342342,0.342342,8,5,0.234234,
>>>> Allen-P
>>>> YES
>>>>
>>>>
>>>> ,4,7,1.3423234,35,5.45454,6,1.3432,36,2.23232,1.342342,1.342342,9,6,1.234234,
>>>> Allen-P
>>>> YES
>>>>
>>>>
>>>> ,5,8,2.3423234,36,6.45454,7,2.3432,37,3.23232,2.342342,2.342342,10,7,2.234234,
>>>> Allen-P
>>>> YES
>>>>
>>>>
>>>> ,6,9,3.3423234,37,7.45454,8,3.3432,38,4.23232,3.342342,3.342342,11,8,3.234234,
>>>> Allen-P
>>>> NO
>>>>
>>>>
>>>> ,5,8,2.3423234,36,6.45454,7,2.3432,37,3.23232,2.342342,2.342342,10,7,2.234234,
>>>> Tanveer
>>>> NO
>>>>
>>>>
>>>> ,6,9,3.3423234,37,7.45454,8,3.3432,38,4.23232,3.342342,3.342342,11,8,3.234234,
>>>> Tanveer
>>>>
>>>> And the test data is as follows:
>>>> hasreply , totalsentences , totallines , ratioblanklines ,
>>>> totalwords , avgwordlength , totalfnword , ratiofnwords , totalchars ,
>>>> ratioletters , ratiodigits , ratioucase , totalspcchars , totalpunc ,
>>>> ratiopunc, author
>>>> YES
>>>>
>>>>
>>>> ,3,6,0.3423234,34,4.45454,3,0.3432,31,1.23232,0.342342,0.542342,2,7,0.234234,?
>>>>
>>>> Now after running the J48 on train data it shows me the
>>>> statistics and output fine. Now when I give this test data then it says
>>>> the train and test data are not compatible. All I want is , it will
>>>> suggest me the author from train data. Also the format
>>>> and columns are same and i put ? in place of author part in test data.
>>>> What am I doing wrong?
>>>> THanks._______________________________________________
>>>> Wekalist mailing list
>>>> Send posts to:
>>>> List info and subscription status:
>>>> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
>>>> List etiquette:
>>>> http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Wekalist mailing list
>>> Send posts to:
>>> List info and subscription status:
>>> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
>>> List etiquette:
>>> http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>>>
>>>
>>>
>>> _______________________________________________
>>> Wekalist mailing list
>>> Send posts to:
>>> List info and subscription status:
>>> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
>>> List etiquette:
>>> http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>>>
>>>
>>
>
>
>
_______________________________________________
Wekalist mailing list
Send posts to:
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
)
Hi,
Thanks for your help. Now it's working for both train and test. But I need to understand something here.
Lets say I have 3 users say, A1 to A3 and have their training data in arff format. In that arff file in the last attribute will be author with of of their names in the { A1,A2,A3} as thats my class for classfication.
Now in test data I kind of don't know who the author is and that's what I want weka to let me know. Now if I put the arff file of test like this:
.... lines truncated ...
@attribute author {A1, A2, A3}
@data
YES ,3,6,0.3423234,34,4.45454,5,0.3432,35,1.23232,0.342342,0.342342,8,5,0.234234, A2
Here as I am giving "A2" as the author then what's the point of using weka? Though it tells me that the predicted author is `A1` and not `A2` even if I give `A2`. Then why we even put this false thing in the test data? may be I am getting the whole picture wrong.
Hope I made myself clear. thank you.
________________________________
From: Christophe Salperwyck <>
To: Tanveer Chowdhury <>
Cc: Weka machine learning workbench list. <>
Sent: Thursday, June 16, 2011 3:36:09 AM
Subject: Re: [Wekalist] Getting error as train and test dataset are not compatible
In your test file you need to have the class value set, now you have
"?" (which means missing value, that's why it is not working as I said
previously).
Best
2011/6/16 Tanveer Chowdhury <>:
> Hi,
> First of all thank you so much for taking your time to reply.
> As you suggested I now took this sample bank arff file; one for train and
> one for test but still it outputs this `NaN`
> after running test.
> I have attached the result buffer and the arff file for the bank.
> Thank you.
> ________________________________
> From: Christophe Salperwyck <>
> To: Tanveer Chowdhury <>; Weka machine learning
> workbench list. <>
> Sent: Wednesday, June 15, 2011 11:53:02 PM
> Subject: Re: [Wekalist] Getting error as train and test dataset are not
> compatible
>
> after looking in your arff file. The author values are missing in the
> test file so the classifier can't know if its prediction were right or
> wrong. You need to have a test file without missing classes (at least
> for almost all lines...)
>
> By the way it will be hard to get something from so small data files.
>
> 2011/6/16 Christophe Salperwyck <>:
>> Hi,
>>
>> If you get a NaN, it is probably because the number of classes in the
>> train file is greater than the number in the test file...
>>
>> By the way did you specify what is the class attribute in the classify
>> tab? (default is the last one)
>>
>> Best,
>> Christophe
>>
>> 2011/6/16 Tanveer Chowdhury <>:
>>> Hi I created both train and test files to arff format.
>>> Now the first step, creating the model with train data is fine as it was
>>> before too. Now when give the arff test file then the output it gives I
>>> can't make any sense of it.
>>> What is the step to do this anyway? May be I am doing it wrong way. What
>>> I
>>> am doing is
>>> 1. Open Explorer. From Preprocess Tab I select the train arff file.
>>> 2. Then went to Classify tab, and used the "Test Option" Use Training set
>>> and ran the J48 on it. It shows the output fine so far.
>>> 3. Now I again select the Supplied Test Set under Test Option and select
>>> the
>>> test arff file. But now it's giving weird output of `NaN`, in ROC it's
>>> showing ? mark.
>>> It is the right way?
>>> My objective is first to train with the train data set sample that I
>>> attached and then just give the 1 instance test arff file and want it to
>>> tell me probability which author it is.
>>> Thank you.
>>>
>>> ________________________________
>>> From: Sebastian Luna Valero <>
>>> To: Weka machine learning workbench list.
>>> <>
>>> Sent: Tuesday, June 14, 2011 2:14:45 AM
>>> Subject: Re: [Wekalist] Getting error as train and test dataset are not
>>> compatible
>>>
>>>
>>> Hi,
>>>
>>> Convert both train and test files into arff format and tray again...
>>>
>>> HTH,
>>> Sebastian
>>>
>>>
>>>
>>>> Hi,
>>>> I have two csv files which I created; one for train and another for
>>>> test.
>>>> The contents are as below:
>>>> hasreply , totalsentences , totallines , ratioblanklines ,
>>>> totalwords , avgwordlength , totalfnword , ratiofnwords , totalchars ,
>>>> ratioletters , ratiodigits , ratioucase , totalspcchars , totalpunc ,
>>>> ratiopunc , author
>>>> YES
>>>>
>>>>
>>>> ,3,6,0.3423234,34,4.45454,5,0.3432,35,1.23232,0.342342,0.342342,8,5,0.234234,
>>>> Allen-P
>>>> YES
>>>>
>>>>
>>>> ,4,7,1.3423234,35,5.45454,6,1.3432,36,2.23232,1.342342,1.342342,9,6,1.234234,
>>>> Allen-P
>>>> YES
>>>>
>>>>
>>>> ,5,8,2.3423234,36,6.45454,7,2.3432,37,3.23232,2.342342,2.342342,10,7,2.234234,
>>>> Allen-P
>>>> YES
>>>>
>>>>
>>>> ,6,9,3.3423234,37,7.45454,8,3.3432,38,4.23232,3.342342,3.342342,11,8,3.234234,
>>>> Allen-P
>>>> NO
>>>>
>>>>
>>>> ,5,8,2.3423234,36,6.45454,7,2.3432,37,3.23232,2.342342,2.342342,10,7,2.234234,
>>>> Tanveer
>>>> NO
>>>>
>>>>
>>>> ,6,9,3.3423234,37,7.45454,8,3.3432,38,4.23232,3.342342,3.342342,11,8,3.234234,
>>>> Tanveer
>>>>
>>>> And the test data is as follows:
>>>> hasreply , totalsentences , totallines , ratioblanklines ,
>>>> totalwords , avgwordlength , totalfnword , ratiofnwords , totalchars ,
>>>> ratioletters , ratiodigits , ratioucase , totalspcchars , totalpunc ,
>>>> ratiopunc, author
>>>> YES
>>>>
>>>>
>>>> ,3,6,0.3423234,34,4.45454,3,0.3432,31,1.23232,0.342342,0.542342,2,7,0.234234,?
>>>>
>>>> Now after running the J48 on train data it shows me the
>>>> statistics and output fine. Now when I give this test data then it says
>>>> the train and test data are not compatible. All I want is , it will
>>>> suggest me the author from train data. Also the format
>>>> and columns are same and i put ? in place of author part in test data.
>>>> What am I doing wrong?
>>>> THanks._______________________________________________
>>>> Wekalist mailing list
>>>> Send posts to:
>>>> List info and subscription status:
>>>> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
>>>> List etiquette:
>>>> http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Wekalist mailing list
>>> Send posts to:
>>> List info and subscription status:
>>> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
>>> List etiquette:
>>> http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>>>
>>>
>>>
>>> _______________________________________________
>>> Wekalist mailing list
>>> Send posts to:
>>> List info and subscription status:
>>> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
>>> List etiquette:
>>> http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>>>
>>>
>>
>
>
>
Hi,
You need to apply the built model on the new data.
Reading this should help you (from figure 30-32):
http://maya.cs.depaul.edu/classes/ect584/weka/classify.html
"
Based on the above command, our classification model has been stored
in the file "bank.model" and placed in the directory we specified. We
can now apply this model to the new instances. The advantage of
building a model and storing it is that it can be applied at any time
to different sets of unclassified instances. The command for doing so
is:
java weka.classifiers.trees.J48 -p 9 -l directory-path\bank.model -T
directory-path \bank-new.arff
In the above command, the option -p 9 indicates that we want to
predict a value for attribute number 9 (which is "pep"). The -l
options specifies the directory path and name of the model file (this
is what was created in the previous step). Finally, the -T option
specifies the name (and path) of the test data. In our example, the
test data is our new instances file "bank-new.arff").
"
Best,
Christophe
2011/6/16 Tanveer Chowdhury <>:
> Hi,
> Thanks for your help. Now it's working for both train and test. But I need
> to understand something here.
> Lets say I have 3 users say, A1 to A3 and have their training data in arff
> format. In that arff file in the last attribute will be author with of of
> their names in the { A1,A2,A3} as thats my class for classfication.
> Now in test data I kind of don't know who the author is and that's what I
> want weka to let me know. Now if I put the arff file of test like this:
> .... lines truncated ...
> @attribute author {A1, A2, A3}
>
> @data
> YES
> ,3,6,0.3423234,34,4.45454,5,0.3432,35,1.23232,0.342342,0.342342,8,5,0.234234,
> A2
>
> Here as I am giving "A2" as the author then what's the point of using weka?
> Though it tells me that the predicted author is `A1` and not `A2` even if I
> give `A2`. Then why we even put this false thing in the test data? may be I
> am getting the whole picture wrong.
>
> Hope I made myself clear. thank you.
> ________________________________
> From: Christophe Salperwyck <>
> To: Tanveer Chowdhury <>
> Cc: Weka machine learning workbench list. <>
> Sent: Thursday, June 16, 2011 3:36:09 AM
> Subject: Re: [Wekalist] Getting error as train and test dataset are not
> compatible
>
> In your test file you need to have the class value set, now you have
> "?" (which means missing value, that's why it is not working as I said
> previously).
>
> Best
>
> 2011/6/16 Tanveer Chowdhury <>:
>> Hi,
>> First of all thank you so much for taking your time to reply.
>> As you suggested I now took this sample bank arff file; one for train and
>> one for test but still it outputs this `NaN`
>> after running test.
>> I have attached the result buffer and the arff file for the bank.
>> Thank you.
>> ________________________________
>> From: Christophe Salperwyck <>
>> To: Tanveer Chowdhury <>; Weka machine learning
>> workbench list. <>
>> Sent: Wednesday, June 15, 2011 11:53:02 PM
>> Subject: Re: [Wekalist] Getting error as train and test dataset are not
>> compatible
>>
>> after looking in your arff file. The author values are missing in the
>> test file so the classifier can't know if its prediction were right or
>> wrong. You need to have a test file without missing classes (at least
>> for almost all lines...)
>>
>> By the way it will be hard to get something from so small data files.
>>
>> 2011/6/16 Christophe Salperwyck <>:
>>> Hi,
>>>
>>> If you get a NaN, it is probably because the number of classes in the
>>> train file is greater than the number in the test file...
>>>
>>> By the way did you specify what is the class attribute in the classify
>>> tab? (default is the last one)
>>>
>>> Best,
>>> Christophe
>>>
>>> 2011/6/16 Tanveer Chowdhury <>:
>>>> Hi I created both train and test files to arff format.
>>>> Now the first step, creating the model with train data is fine as it was
>>>> before too. Now when give the arff test file then the output it gives I
>>>> can't make any sense of it.
>>>> What is the step to do this anyway? May be I am doing it wrong way. What
>>>> I
>>>> am doing is
>>>> 1. Open Explorer. From Preprocess Tab I select the train arff file.
>>>> 2. Then went to Classify tab, and used the "Test Option" Use Training
>>>> set
>>>> and ran the J48 on it. It shows the output fine so far.
>>>> 3. Now I again select the Supplied Test Set under Test Option and select
>>>> the
>>>> test arff file. But now it's giving weird output of `NaN`, in ROC it's
>>>> showing ? mark.
>>>> It is the right way?
>>>> My objective is first to train with the train data set sample that I
>>>> attached and then just give the 1 instance test arff file and want it to
>>>> tell me probability which author it is.
>>>> Thank you.
>>>>
>>>> ________________________________
>>>> From: Sebastian Luna Valero <>
>>>> To: Weka machine learning workbench list.
>>>> <>
>>>> Sent: Tuesday, June 14, 2011 2:14:45 AM
>>>> Subject: Re: [Wekalist] Getting error as train and test dataset are not
>>>> compatible
>>>>
>>>>
>>>> Hi,
>>>>
>>>> Convert both train and test files into arff format and tray again...
>>>>
>>>> HTH,
>>>> Sebastian
>>>>
>>>>
>>>>
>>>>> Hi,
>>>>> I have two csv files which I created; one for train and another for
>>>>> test.
>>>>> The contents are as below:
>>>>> hasreply , totalsentences , totallines , ratioblanklines ,
>>>>> totalwords , avgwordlength , totalfnword , ratiofnwords , totalchars ,
>>>>> ratioletters , ratiodigits , ratioucase , totalspcchars , totalpunc ,
>>>>> ratiopunc , author
>>>>> YES
>>>>>
>>>>>
>>>>>
>>>>> ,3,6,0.3423234,34,4.45454,5,0.3432,35,1.23232,0.342342,0.342342,8,5,0.234234,
>>>>> Allen-P
>>>>> YES
>>>>>
>>>>>
>>>>>
>>>>> ,4,7,1.3423234,35,5.45454,6,1.3432,36,2.23232,1.342342,1.342342,9,6,1.234234,
>>>>> Allen-P
>>>>> YES
>>>>>
>>>>>
>>>>>
>>>>> ,5,8,2.3423234,36,6.45454,7,2.3432,37,3.23232,2.342342,2.342342,10,7,2.234234,
>>>>> Allen-P
>>>>> YES
>>>>>
>>>>>
>>>>>
>>>>> ,6,9,3.3423234,37,7.45454,8,3.3432,38,4.23232,3.342342,3.342342,11,8,3.234234,
>>>>> Allen-P
>>>>> NO
>>>>>
>>>>>
>>>>>
>>>>> ,5,8,2.3423234,36,6.45454,7,2.3432,37,3.23232,2.342342,2.342342,10,7,2.234234,
>>>>> Tanveer
>>>>> NO
>>>>>
>>>>>
>>>>>
>>>>> ,6,9,3.3423234,37,7.45454,8,3.3432,38,4.23232,3.342342,3.342342,11,8,3.234234,
>>>>> Tanveer
>>>>>
>>>>> And the test data is as follows:
>>>>> hasreply , totalsentences , totallines , ratioblanklines ,
>>>>> totalwords , avgwordlength , totalfnword , ratiofnwords , totalchars ,
>>>>> ratioletters , ratiodigits , ratioucase , totalspcchars , totalpunc ,
>>>>> ratiopunc, author
>>>>> YES
>>>>>
>>>>>
>>>>>
>>>>> ,3,6,0.3423234,34,4.45454,3,0.3432,31,1.23232,0.342342,0.542342,2,7,0.234234,?
>>>>>
>>>>> Now after running the J48 on train data it shows me the
>>>>> statistics and output fine. Now when I give this test data then it says
>>>>> the train and test data are not compatible. All I want is , it will
>>>>> suggest me the author from train data. Also the format
>>>>> and columns are same and i put ? in place of author part in test data.
>>>>> What am I doing wrong?
>>>>> THanks._______________________________________________
>>>>> Wekalist mailing list
>>>>> Send posts to:
>>>>> List info and subscription status:
>>>>> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
>>>>> List etiquette:
>>>>> http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>>>>>
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Wekalist mailing list
>>>> Send posts to:
>>>> List info and subscription status:
>>>> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
>>>> List etiquette:
>>>> http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Wekalist mailing list
>>>> Send posts to:
>>>> List info and subscription status:
>>>> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
>>>> List etiquette:
>>>> http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>>>>
>>>>
>>>
>>
>>
>>
>
>
>
_______________________________________________
Wekalist mailing list
Send posts to:
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
)
Execellent Christophe, Thanks a lot. Just what I was after for predicting author. But Why it's like that the question (?) mark thing prediction style only
works in text mode and not in Explorer? In explorer I have to give false author name with each instance of my test data file as it gives NaN if I put ? in author name.
thanks.
________________________________
From: Christophe Salperwyck <>
To: Tanveer Chowdhury <>
Cc: Weka machine learning workbench list. <>
Sent: Friday, June 17, 2011 4:14:19 AM
Subject: Re: [Wekalist] Getting error as train and test dataset are not compatible
Hi,
You need to apply the built model on the new data.
Reading this should help you (from figure 30-32):
http://maya.cs.depaul.edu/classes/ect584/weka/classify.html
"
Based on the above command, our classification model has been stored
in the file "bank.model" and placed in the directory we specified. We
can now apply this model to the new instances. The advantage of
building a model and storing it is that it can be applied at any time
to different sets of unclassified instances. The command for doing so
is:
java weka.classifiers.trees.J48 -p 9 -l directory-path\bank.model -T
directory-path \bank-new.arff
In the above command, the option -p 9 indicates that we want to
predict a value for attribute number 9 (which is "pep"). The -l
options specifies the directory path and name of the model file (this
is what was created in the previous step). Finally, the -T option
specifies the name (and path) of the test data. In our example, the
test data is our new instances file "bank-new.arff").
"
Best,
Christophe
2011/6/16 Tanveer Chowdhury <>:
> Hi,
> Thanks for your help. Now it's working for both train and test. But I need
> to understand something here.
> Lets say I have 3 users say, A1 to A3 and have their training data in arff
> format. In that arff file in the last attribute will be author with of of
> their names in the { A1,A2,A3} as thats my class for classfication.
> Now in test data I kind of don't know who the author is and that's what I
> want weka to let me know. Now if I put the arff file of test like this:
> .... lines truncated ...
> @attribute author {A1, A2, A3}
>
> @data
> YES
> ,3,6,0.3423234,34,4.45454,5,0.3432,35,1.23232,0.342342,0.342342,8,5,0.234234,
> A2
>
> Here as I am giving "A2" as the author then what's the point of using weka?
> Though it tells me that the predicted author is `A1` and not `A2` even if I
> give `A2`. Then why we even put this false thing in the test data? may be I
> am getting the whole picture wrong.
>
> Hope I made myself clear. thank you.
> ________________________________
> From: Christophe Salperwyck <>
> To: Tanveer Chowdhury <>
> Cc: Weka machine learning workbench list. <>
> Sent: Thursday, June 16, 2011 3:36:09 AM
> Subject: Re: [Wekalist] Getting error as train and test dataset are not
> compatible
>
> In your test file you need to have the class value set, now you have
> "?" (which means missing value, that's why it is not working as I said
> previously).
>
> Best
>
> 2011/6/16 Tanveer Chowdhury <>:
>> Hi,
>> First of all thank you so much for taking your time to reply.
>> As you suggested I now took this sample bank arff file; one for train and
>> one for test but still it outputs this `NaN`
>> after running test.
>> I have attached the result buffer and the arff file for the bank.
>> Thank you.
>> ________________________________
>> From: Christophe Salperwyck <>
>> To: Tanveer Chowdhury <>; Weka machine learning
>> workbench list. <>
>> Sent: Wednesday, June 15, 2011 11:53:02 PM
>> Subject: Re: [Wekalist] Getting error as train and test dataset are not
>> compatible
>>
>> after looking in your arff file. The author values are missing in the
>> test file so the classifier can't know if its prediction were right or
>> wrong. You need to have a test file without missing classes (at least
>> for almost all lines...)
>>
>> By the way it will be hard to get something from so small data files.
>>
>> 2011/6/16 Christophe Salperwyck <>:
>>> Hi,
>>>
>>> If you get a NaN, it is probably because the number of classes in the
>>> train file is greater than the number in the test file...
>>>
>>> By the way did you specify what is the class attribute in the classify
>>> tab? (default is the last one)
>>>
>>> Best,
>>> Christophe
>>>
>>> 2011/6/16 Tanveer Chowdhury <>:
>>>> Hi I created both train and test files to arff format.
>>>> Now the first step, creating the model with train data is fine as it was
>>>> before too. Now when give the arff test file then the output it gives I
>>>> can't make any sense of it.
>>>> What is the step to do this anyway? May be I am doing it wrong way. What
>>>> I
>>>> am doing is
>>>> 1. Open Explorer. From Preprocess Tab I select the train arff file.
>>>> 2. Then went to Classify tab, and used the "Test Option" Use Training
>>>> set
>>>> and ran the J48 on it. It shows the output fine so far.
>>>> 3. Now I again select the Supplied Test Set under Test Option and select
>>>> the
>>>> test arff file. But now it's giving weird output of `NaN`, in ROC it's
>>>> showing ? mark.
>>>> It is the right way?
>>>> My objective is first to train with the train data set sample that I
>>>> attached and then just give the 1 instance test arff file and want it to
>>>> tell me probability which author it is.
>>>> Thank you.
>>>>
>>>> ________________________________
>>>> From: Sebastian Luna Valero <>
>>>> To: Weka machine learning workbench list.
>>>> <>
>>>> Sent: Tuesday, June 14, 2011 2:14:45 AM
>>>> Subject: Re: [Wekalist] Getting error as train and test dataset are not
>>>> compatible
>>>>
>>>>
>>>> Hi,
>>>>
>>>> Convert both train and test files into arff format and tray again...
>>>>
>>>> HTH,
>>>> Sebastian
>>>>
>>>>
>>>>
>>>>> Hi,
>>>>> I have two csv files which I created; one for train and another for
>>>>> test.
>>>>> The contents are as below:
>>>>> hasreply , totalsentences , totallines , ratioblanklines ,
>>>>> totalwords , avgwordlength , totalfnword , ratiofnwords , totalchars ,
>>>>> ratioletters , ratiodigits , ratioucase , totalspcchars , totalpunc ,
>>>>> ratiopunc , author
>>>>> YES
>>>>>
>>>>>
>>>>>
>>>>> ,3,6,0.3423234,34,4.45454,5,0.3432,35,1.23232,0.342342,0.342342,8,5,0.234234,
>>>>> Allen-P
>>>>> YES
>>>>>
>>>>>
>>>>>
>>>>> ,4,7,1.3423234,35,5.45454,6,1.3432,36,2.23232,1.342342,1.342342,9,6,1.234234,
>>>>> Allen-P
>>>>> YES
>>>>>
>>>>>
>>>>>
>>>>> ,5,8,2.3423234,36,6.45454,7,2.3432,37,3.23232,2.342342,2.342342,10,7,2.234234,
>>>>> Allen-P
>>>>> YES
>>>>>
>>>>>
>>>>>
>>>>> ,6,9,3.3423234,37,7.45454,8,3.3432,38,4.23232,3.342342,3.342342,11,8,3.234234,
>>>>> Allen-P
>>>>> NO
>>>>>
>>>>>
>>>>>
>>>>> ,5,8,2.3423234,36,6.45454,7,2.3432,37,3.23232,2.342342,2.342342,10,7,2.234234,
>>>>> Tanveer
>>>>> NO
>>>>>
>>>>>
>>>>>
>>>>> ,6,9,3.3423234,37,7.45454,8,3.3432,38,4.23232,3.342342,3.342342,11,8,3.234234,
>>>>> Tanveer
>>>>>
>>>>> And the test data is as follows:
>>>>> hasreply , totalsentences , totallines , ratioblanklines ,
>>>>> totalwords , avgwordlength , totalfnword , ratiofnwords , totalchars ,
>>>>> ratioletters , ratiodigits , ratioucase , totalspcchars , totalpunc ,
>>>>> ratiopunc, author
>>>>> YES
>>>>>
>>>>>
>>>>>
>>>>> ,3,6,0.3423234,34,4.45454,3,0.3432,31,1.23232,0.342342,0.542342,2,7,0.234234,?
>>>>>
>>>>> Now after running the J48 on train data it shows me the
>>>>> statistics and output fine. Now when I give this test data then it says
>>>>> the train and test data are not compatible. All I want is , it will
>>>>> suggest me the author from train data. Also the format
>>>>> and columns are same and i put ? in place of author part in test data.
>>>>> What am I doing wrong?
>>>>> THanks._______________________________________________
>>>>> Wekalist mailing list
>>>>> Send posts to:
>>>>> List info and subscription status:
>>>>> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
>>>>> List etiquette:
>>>>> http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>>>>>
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Wekalist mailing list
>>>> Send posts to:
>>>> List info and subscription status:
>>>> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
>>>> List etiquette:
>>>> http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Wekalist mailing list
>>>> Send posts to:
>>>> List info and subscription status:
>>>> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
>>>> List etiquette:
>>>> http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>>>>
>>>>
>>>
>>
>>
>>
>
>
>
|
NewsArc Lists
| Culture Pages
| Computing Archive
| Media-Pages
Link to this page on your blog or website by copying the HTML code below and pasting it into your site:
|
|