Report
Simple data balancing achieves competitive worst-group-accuracy
العنوان: | Simple data balancing achieves competitive worst-group-accuracy |
---|---|
المؤلفون: | Idrissi, Badr Youbi, Arjovsky, Martin, Pezeshki, Mohammad, Lopez-Paz, David |
سنة النشر: | 2021 |
المجموعة: | Computer Science |
مصطلحات موضوعية: | Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Cryptography and Security |
الوصف: | We study the problem of learning classifiers that perform well across (known or unknown) groups of data. After observing that common worst-group-accuracy datasets suffer from substantial imbalances, we set out to compare state-of-the-art methods to simple balancing of classes and groups by either subsampling or reweighting data. Our results show that these data balancing baselines achieve state-of-the-art-accuracy, while being faster to train and requiring no additional hyper-parameters. In addition, we highlight that access to group information is most critical for model selection purposes, and not so much during training. All in all, our findings beg closer examination of benchmarks and methods for research in worst-group-accuracy optimization. Comment: Accepted at CLeaR (Causal Learning and Reasoning) 2022 |
نوع الوثيقة: | Working Paper |
URL الوصول: | http://arxiv.org/abs/2110.14503 |
رقم الانضمام: | edsarx.2110.14503 |
قاعدة البيانات: | arXiv |
ResultId |
1 |
---|---|
Header |
edsarx arXiv edsarx.2110.14503 1022 3 Report report 1021.66790771484 |
PLink |
https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&scope=site&db=edsarx&AN=edsarx.2110.14503&custid=s6537998&authtype=sso |
FullText |
Array
(
[Availability] => 0
)
Array ( [0] => Array ( [Url] => http://arxiv.org/abs/2110.14503 [Name] => EDS - Arxiv [Category] => fullText [Text] => View record in Arxiv [MouseOverText] => View record in Arxiv ) ) |
Items |
Array
(
[Name] => Title
[Label] => Title
[Group] => Ti
[Data] => Simple data balancing achieves competitive worst-group-accuracy
)
Array ( [Name] => Author [Label] => Authors [Group] => Au [Data] => <searchLink fieldCode="AR" term="%22Idrissi%2C+Badr+Youbi%22">Idrissi, Badr Youbi</searchLink><br /><searchLink fieldCode="AR" term="%22Arjovsky%2C+Martin%22">Arjovsky, Martin</searchLink><br /><searchLink fieldCode="AR" term="%22Pezeshki%2C+Mohammad%22">Pezeshki, Mohammad</searchLink><br /><searchLink fieldCode="AR" term="%22Lopez-Paz%2C+David%22">Lopez-Paz, David</searchLink> ) Array ( [Name] => DatePubCY [Label] => Publication Year [Group] => Date [Data] => 2021 ) Array ( [Name] => Subset [Label] => Collection [Group] => HoldingsInfo [Data] => Computer Science ) Array ( [Name] => Subject [Label] => Subject Terms [Group] => Su [Data] => <searchLink fieldCode="DE" term="%22Computer+Science+-+Machine+Learning%22">Computer Science - Machine Learning</searchLink><br /><searchLink fieldCode="DE" term="%22Computer+Science+-+Artificial+Intelligence%22">Computer Science - Artificial Intelligence</searchLink><br /><searchLink fieldCode="DE" term="%22Computer+Science+-+Cryptography+and+Security%22">Computer Science - Cryptography and Security</searchLink> ) Array ( [Name] => Abstract [Label] => Description [Group] => Ab [Data] => We study the problem of learning classifiers that perform well across (known or unknown) groups of data. After observing that common worst-group-accuracy datasets suffer from substantial imbalances, we set out to compare state-of-the-art methods to simple balancing of classes and groups by either subsampling or reweighting data. Our results show that these data balancing baselines achieve state-of-the-art-accuracy, while being faster to train and requiring no additional hyper-parameters. In addition, we highlight that access to group information is most critical for model selection purposes, and not so much during training. All in all, our findings beg closer examination of benchmarks and methods for research in worst-group-accuracy optimization.<br />Comment: Accepted at CLeaR (Causal Learning and Reasoning) 2022 ) Array ( [Name] => TypeDocument [Label] => Document Type [Group] => TypDoc [Data] => Working Paper ) Array ( [Name] => URL [Label] => Access URL [Group] => URL [Data] => <link linkTarget="URL" linkTerm="http://arxiv.org/abs/2110.14503" linkWindow="_blank">http://arxiv.org/abs/2110.14503</link> ) Array ( [Name] => AN [Label] => Accession Number [Group] => ID [Data] => edsarx.2110.14503 ) |
RecordInfo |
Array
(
[BibEntity] => Array
(
[Subjects] => Array
(
[0] => Array
(
[SubjectFull] => Computer Science - Machine Learning
[Type] => general
)
[1] => Array
(
[SubjectFull] => Computer Science - Artificial Intelligence
[Type] => general
)
[2] => Array
(
[SubjectFull] => Computer Science - Cryptography and Security
[Type] => general
)
)
[Titles] => Array
(
[0] => Array
(
[TitleFull] => Simple data balancing achieves competitive worst-group-accuracy
[Type] => main
)
)
)
[BibRelationships] => Array
(
[HasContributorRelationships] => Array
(
[0] => Array
(
[PersonEntity] => Array
(
[Name] => Array
(
[NameFull] => Idrissi, Badr Youbi
)
)
)
[1] => Array
(
[PersonEntity] => Array
(
[Name] => Array
(
[NameFull] => Arjovsky, Martin
)
)
)
[2] => Array
(
[PersonEntity] => Array
(
[Name] => Array
(
[NameFull] => Pezeshki, Mohammad
)
)
)
[3] => Array
(
[PersonEntity] => Array
(
[Name] => Array
(
[NameFull] => Lopez-Paz, David
)
)
)
)
[IsPartOfRelationships] => Array
(
[0] => Array
(
[BibEntity] => Array
(
[Dates] => Array
(
[0] => Array
(
[D] => 27
[M] => 10
[Type] => published
[Y] => 2021
)
)
)
)
)
)
)
|
IllustrationInfo |