{"id":16360,"date":"2023-07-26T20:33:13","date_gmt":"2023-07-26T20:33:13","guid":{"rendered":"https:\/\/www.transcribeme.com\/?p=16360"},"modified":"2024-06-28T01:57:58","modified_gmt":"2024-06-28T01:57:58","slug":"why-annotated-data-is-important-for-machine-learning","status":"publish","type":"post","link":"https:\/\/www.transcribeme.com\/blog\/why-annotated-data-is-important-for-machine-learning\/","title":{"rendered":"Why Annotated Data is So Important to Machine Learning"},"content":{"rendered":"[vc_row type=&#8221;full_width_background&#8221; full_screen_row_position=&#8221;middle&#8221; column_margin=&#8221;default&#8221; column_direction=&#8221;default&#8221; column_direction_tablet=&#8221;default&#8221; column_direction_phone=&#8221;default&#8221; scene_position=&#8221;center&#8221; bottom_padding=&#8221;60px&#8221; text_color=&#8221;dark&#8221; text_align=&#8221;left&#8221; row_border_radius=&#8221;none&#8221; row_border_radius_applies=&#8221;bg&#8221; overflow=&#8221;visible&#8221; overlay_strength=&#8221;0.3&#8243; gradient_direction=&#8221;left_to_right&#8221; shape_divider_position=&#8221;bottom&#8221; bg_image_animation=&#8221;none&#8221; gradient_type=&#8221;default&#8221; shape_type=&#8221;&#8221;][vc_column column_padding=&#8221;no-extra-padding&#8221; column_padding_tablet=&#8221;inherit&#8221; column_padding_phone=&#8221;inherit&#8221; column_padding_position=&#8221;all&#8221; column_element_direction_desktop=&#8221;default&#8221; column_element_spacing=&#8221;default&#8221; desktop_text_alignment=&#8221;default&#8221; tablet_text_alignment=&#8221;default&#8221; phone_text_alignment=&#8221;default&#8221; background_color_opacity=&#8221;1&#8243; background_hover_color_opacity=&#8221;1&#8243; column_backdrop_filter=&#8221;none&#8221; column_shadow=&#8221;none&#8221; column_border_radius=&#8221;none&#8221; column_link_target=&#8221;_self&#8221; column_position=&#8221;default&#8221; gradient_direction=&#8221;left_to_right&#8221; overlay_strength=&#8221;0.3&#8243; width=&#8221;1\/1&#8243; tablet_width_inherit=&#8221;default&#8221; animation_type=&#8221;default&#8221; bg_image_animation=&#8221;none&#8221; border_type=&#8221;simple&#8221; column_border_width=&#8221;none&#8221; column_border_style=&#8221;solid&#8221;][vc_column_text]\n<blockquote><p>TranscribeMe creates structured data sets for customers to use to create or enhance machine learning models.<\/p><\/blockquote>\n<p>Before getting to case studies illustrating this work, some terms need to be either defined or clarified, i.e., \u201cstructured data\u201d and \u201cAI.\u201d<\/p>\n<p>I consider AI to be a misnomer. Intelligence is intelligence; excluding all other flora and fauna, it divides into human or machine. So for me, there\u2019s nothing artificial about an intelligent machine. It\u2019s simply not human.[\/vc_column_text][divider line_type=&#8221;No Line&#8221; custom_height=&#8221;20&#8243;][image_with_animation image_url=&#8221;16362&#8243; image_size=&#8221;full&#8221; animation_type=&#8221;entrance&#8221; animation=&#8221;None&#8221; animation_movement_type=&#8221;transform_y&#8221; hover_animation=&#8221;none&#8221; alignment=&#8221;center&#8221; border_radius=&#8221;5px&#8221; box_shadow=&#8221;small_depth&#8221; image_loading=&#8221;default&#8221; max_width=&#8221;100%&#8221; max_width_mobile=&#8221;default&#8221;][divider line_type=&#8221;No Line&#8221; custom_height=&#8221;20&#8243;][vc_column_text]\n<h3>Learning Through Structured Data<\/h3>\n[\/vc_column_text][vc_column_text]<span style=\"font-weight: 400;\">Consider how humans learn. A newborn is pretty much helpless, but from birth it packs an enormously powerful and complex brain that from day one is collecting, integrating, and assimilating environmental data, including speech. Without speech, the child is in stealth mode, but the right brain is hyper engaged in an activity that data scientists would call unsupervised learning.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">As the child grows, <\/span><i><span style=\"font-weight: 400;\">structured data<\/span><\/i><span style=\"font-weight: 400;\"> is introduced in the form of books. Initially, a parent may read to the child and point out elements in the story. For example, while reading \u201cGoodnight Moon,\u201d the parent might say, \u201cMoon,\u201d then point to its picture, tying the word to a visual. That is <\/span><a href=\"https:\/\/www.transcribeme.com\/data-annotation\/\"><span style=\"font-weight: 400;\">data annotation<\/span><\/a><span style=\"font-weight: 400;\">!<\/span><\/p>\n<p><span style=\"font-weight: 400;\">As children continue to learn, the enormous capacity of the brain to log, store, and collate data comes into play and the children become, for the most part, autonomous learners.<\/span><\/p>\n<p><b>A newborn machine<\/b><span style=\"font-weight: 400;\"> has neither a right brain, nor the nearly unlimited data capacity of a human brain to begin learning and storing data. It\u2019s estimated that <\/span><a href=\"https:\/\/www.scientificamerican.com\/article\/what-is-the-memory-capacity\/\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">a human brain can store 2.5 petabytes of information<\/span><\/a><span style=\"font-weight: 400;\">. That would be equivalent to a DVR recording continuously for 300 years!<\/span><\/p>\n<p><span style=\"font-weight: 400;\">A newborn machine begins its quest for intelligence at the Goodnight Moon stage where a pairing takes place: an audio recording of the word &#8220;moon&#8221; with the written word, or an image of the moon with an audio recording of the word.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">As is the case with the child learner, this is data annotation.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">An example of structured data could be, let&#8217;s say, a complex set of data defining all North American songbirds at the exclusion of all else. This would produce an intelligent machine that could identify every single songbird on the continent. But it couldn&#8217;t tell us a thing about butterflies! And there would be nothing in its database or algorithmic logic to take it from songbird to butterfly.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">A new set of structured data must be created and assimilated for every new thing we want our machine to learn. It\u2019s always been this way from the beginning of time, machine learning time, that is.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Here\u2019s a quote from Wikipedia in the article, <\/span><a href=\"https:\/\/en.wikipedia.org\/wiki\/Expert_system\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">Expert System<\/span><\/a><span style=\"font-weight: 400;\">: \u201cIn the late 1950s&#8230;<\/span><span style=\"font-weight: 400;\"> biomedical researchers started creating computer-aided systems for diagnostic applications in medicine and biology. These early diagnostic systems used patients\u2019 symptoms and laboratory test results as inputs to generate a diagnostic outcome.\u201d Even for the first machines, data annotation was required.<\/span><span style=\"font-weight: 400;\"><br \/>\n<\/span><\/p>\n<p><span style=\"font-weight: 400;\">From the 1950s until now, all machine learning has required data annotation to create structured datasets to create or enhance machine learning models. There have been many claims of unsupervised learning, but that has not been true in cases we\u2019ve seen. The machines have gotten more sophisticated with their data collection, but overall the machine needs to be trained for a specific use.<\/span>[\/vc_column_text][\/vc_column][\/vc_row][vc_row type=&#8221;full_width_background&#8221; full_screen_row_position=&#8221;middle&#8221; column_margin=&#8221;default&#8221; column_direction=&#8221;default&#8221; column_direction_tablet=&#8221;default&#8221; column_direction_phone=&#8221;default&#8221; bg_color=&#8221;#f7f7f7&#8243; scene_position=&#8221;center&#8221; top_padding=&#8221;70px&#8221; constrain_group_1=&#8221;yes&#8221; bottom_padding=&#8221;70px&#8221; text_color=&#8221;dark&#8221; text_align=&#8221;left&#8221; row_border_radius=&#8221;none&#8221; row_border_radius_applies=&#8221;bg&#8221; overflow=&#8221;visible&#8221; overlay_strength=&#8221;0.3&#8243; gradient_direction=&#8221;left_to_right&#8221; shape_divider_position=&#8221;bottom&#8221; bg_image_animation=&#8221;none&#8221; gradient_type=&#8221;default&#8221; shape_type=&#8221;&#8221;][vc_column column_padding=&#8221;no-extra-padding&#8221; column_padding_tablet=&#8221;inherit&#8221; column_padding_phone=&#8221;inherit&#8221; column_padding_position=&#8221;all&#8221; column_element_direction_desktop=&#8221;default&#8221; column_element_spacing=&#8221;default&#8221; desktop_text_alignment=&#8221;default&#8221; tablet_text_alignment=&#8221;default&#8221; phone_text_alignment=&#8221;default&#8221; background_color_opacity=&#8221;1&#8243; background_hover_color_opacity=&#8221;1&#8243; column_backdrop_filter=&#8221;none&#8221; column_shadow=&#8221;none&#8221; column_border_radius=&#8221;none&#8221; column_link_target=&#8221;_self&#8221; column_position=&#8221;default&#8221; gradient_direction=&#8221;left_to_right&#8221; overlay_strength=&#8221;0.3&#8243; width=&#8221;1\/1&#8243; tablet_width_inherit=&#8221;default&#8221; animation_type=&#8221;default&#8221; bg_image_animation=&#8221;none&#8221; border_type=&#8221;simple&#8221; column_border_width=&#8221;none&#8221; column_border_style=&#8221;solid&#8221;][image_with_animation image_url=&#8221;16363&#8243; image_size=&#8221;full&#8221; animation_type=&#8221;entrance&#8221; animation=&#8221;None&#8221; animation_movement_type=&#8221;transform_y&#8221; hover_animation=&#8221;none&#8221; alignment=&#8221;center&#8221; border_radius=&#8221;5px&#8221; box_shadow=&#8221;small_depth&#8221; image_loading=&#8221;default&#8221; max_width=&#8221;100%&#8221; max_width_mobile=&#8221;default&#8221;][divider line_type=&#8221;No Line&#8221; custom_height=&#8221;20&#8243;][vc_column_text]\n<h2>Use Cases for Annotated Data<\/h2>\n[\/vc_column_text][vc_column_text]<span style=\"font-weight: 400;\">Every day AI and machine learning technologies are delivering astounding accomplishments that benefit a broad spectrum of fields and people around the world, including encompassing areas such as software and development, cybersecurity, medicine, engineering, customer service, finance, manufacturing, and more. <\/span><span style=\"font-weight: 400;\"><br \/>\n<\/span><span style=\"font-weight: 400;\"><br \/>\n<\/span><span style=\"font-weight: 400;\">But scientists, technologists, and huge industries are not the only ones reaping the benefits of machine learning. Small businesses and individuals alike are beginning to understand that data collection and analysis are now the norm, so it is no wonder that <\/span><a href=\"https:\/\/www.transcribeme.com\/ai-machine-learning\/\"><span style=\"font-weight: 400;\">AI and machine learning<\/span><\/a><span style=\"font-weight: 400;\"> are among the fastest growing technologies globally.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">These technologies include audio, images, videos, podcasts and more. Simply put, data is labeled to make it comprehensible to AIs. The key is the accuracy of the data sets and the quantity of data sets is also very important so that there is increased variety in the verbiage and context.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">This is where TranscribeMe comes in. We have been asked to provide annotated data for a variety of use cases. And we have teams that are specially trained to label and process data appropriately for any given project. Here are just a few examples:<\/span>[\/vc_column_text][nectar_icon_list color=&#8221;Accent-Color&#8221; direction=&#8221;vertical&#8221; icon_size=&#8221;medium&#8221; icon_style=&#8221;border&#8221;][nectar_icon_list_item icon_type=&#8221;icon&#8221; text_full_html=&#8221;html&#8221; title=&#8221;List Item&#8221; id=&#8221;1690402556397-7&#8243; tab_id=&#8221;1690402556397-0&#8243; header=&#8221;Medical Services&#8221; icon_fontawesome=&#8221;fa fa-user-md&#8221;]<span style=\"font-weight: 400;\"><strong>Topic:<\/strong> Medical Emergency Screening<br \/>\n<\/span><span style=\"font-weight: 400;\"><strong>Form of Data Annotation:<\/strong> Audio<br \/>\n<\/span><span style=\"font-weight: 400;\"><strong>Process:<\/strong> Annotators listen to agonal breathing recordings and mark the beginnings and ends of the wavelengths.<br \/>\n<\/span><span style=\"font-weight: 400;\"><strong>Purpose:<\/strong> To be able to teach the provider&#8217;s automated system to screen patient calls for agonal breathing in order to identify callers who are experiencing a heart attack or stroke. <\/span>[\/nectar_icon_list_item][nectar_icon_list_item icon_type=&#8221;icon&#8221; text_full_html=&#8221;html&#8221; title=&#8221;List Item&#8221; id=&#8221;1690402874466-5&#8243; tab_id=&#8221;1690402874466-6&#8243; header=&#8221;Fast Food Industry&#8221; icon_fontawesome=&#8221;fa fa-cutlery&#8221;]<span style=\"font-weight: 400;\"><strong>Topic:<\/strong> Accuracy of Automated Orders<br \/>\n<\/span><span style=\"font-weight: 400;\"><strong>Form of Data Annotation:<\/strong> Audio\/text<br \/>\n<\/span><span style=\"font-weight: 400;\"><strong>Process:<\/strong> Customers&#8217; drive-thru orders are transcribed.<br \/>\n<\/span><span style=\"font-weight: 400;\"><strong>Purpose:<\/strong> To train the restaurant&#8217;s automated system to recognize drive-thru orders that are placed by learning to recognize menu items regardless of customers&#8217; accents and despite high levels of surrounding noise.<\/span>[\/nectar_icon_list_item][nectar_icon_list_item icon_type=&#8221;icon&#8221; text_full_html=&#8221;html&#8221; title=&#8221;List Item&#8221; id=&#8221;1690402900585-8&#8243; tab_id=&#8221;1690402900586-5&#8243; header=&#8221;Telephony Company&#8221; icon_fontawesome=&#8221;fa fa-phone&#8221;]<span style=\"font-weight: 400;\"><strong>Topic:<\/strong> Customer Service Analysis<br \/>\n<\/span><span style=\"font-weight: 400;\"><strong>Form of Data Annotation:<\/strong> Text<br \/>\n<\/span><span style=\"font-weight: 400;\"><strong>Process:<\/strong> Specific labels are used to tag words or phrases in pre-transcribed customer service conversations.<br \/>\n<\/span><span style=\"font-weight: 400;\"><strong>Purpose:<\/strong> To build custom speech models for call center use cases by identifying customer sentiment, logging why customers call, as well as how the calls end, and by qualifying the agents&#8217; responses.<\/span>[\/nectar_icon_list_item][nectar_icon_list_item icon_type=&#8221;icon&#8221; text_full_html=&#8221;html&#8221; title=&#8221;List Item&#8221; id=&#8221;1690402958954-10&#8243; tab_id=&#8221;1690402958955-2&#8243; header=&#8221; Court Stenography Company&#8221; icon_fontawesome=&#8221;fa fa-balance-scale&#8221;]<span style=\"font-weight: 400;\"><strong>Topic:<\/strong> Annotation via Keywords<br \/>\n<\/span><span style=\"font-weight: 400;\"><strong>Form of Data Annotation:<\/strong> keyword spotting<br \/>\n<\/span><span style=\"font-weight: 400;\"><strong>Process:<\/strong> Words and phrases from notices of depositions are tagged according to keywords per the clients&#8217; instructions.<br \/>\n<\/span><span style=\"font-weight: 400;\"><strong>Purpose:<\/strong> To compile data sets from deposition notices using keywords that identify plaintiffs, defendants, witnesses, attorneys, deposition location, date, time, and other similar information.<\/span>[\/nectar_icon_list_item][nectar_icon_list_item icon_type=&#8221;icon&#8221; text_full_html=&#8221;html&#8221; title=&#8221;List Item&#8221; id=&#8221;1690403022692-9&#8243; tab_id=&#8221;1690403022693-1&#8243; header=&#8221;Self-Driving Vehicle Manufacturer&#8221; icon_fontawesome=&#8221;fa fa-car&#8221;]<span style=\"font-weight: 400;\"><strong>Topic:<\/strong> Passenger Safety<br \/>\n<\/span><span style=\"font-weight: 400;\"><strong>Form of Data Annotation:<\/strong> image tagging<br \/>\n<\/span><span style=\"font-weight: 400;\"><strong>Process:<\/strong> Annotators use special software to draw a shape around specific images in photos and videos.<br \/>\n<\/span><span style=\"font-weight: 400;\"><strong>Purpose:<\/strong> Tagged images are used to teach self-driving vehicles to avoid obstacles in the road such as potholes, cracks, water, etc.<\/span>[\/nectar_icon_list_item][\/nectar_icon_list][\/vc_column][\/vc_row][vc_row type=&#8221;full_width_background&#8221; full_screen_row_position=&#8221;middle&#8221; column_margin=&#8221;default&#8221; column_direction=&#8221;default&#8221; column_direction_tablet=&#8221;default&#8221; column_direction_phone=&#8221;default&#8221; bg_color=&#8221;#ffffff&#8221; scene_position=&#8221;center&#8221; top_padding=&#8221;70px&#8221; constrain_group_1=&#8221;yes&#8221; bottom_padding=&#8221;70px&#8221; text_color=&#8221;dark&#8221; text_align=&#8221;left&#8221; row_border_radius=&#8221;none&#8221; row_border_radius_applies=&#8221;bg&#8221; overflow=&#8221;visible&#8221; overlay_strength=&#8221;0.3&#8243; gradient_direction=&#8221;left_to_right&#8221; shape_divider_position=&#8221;bottom&#8221; bg_image_animation=&#8221;none&#8221; gradient_type=&#8221;default&#8221; shape_type=&#8221;&#8221;][vc_column column_padding=&#8221;no-extra-padding&#8221; column_padding_tablet=&#8221;inherit&#8221; column_padding_phone=&#8221;inherit&#8221; column_padding_position=&#8221;all&#8221; column_element_direction_desktop=&#8221;default&#8221; column_element_spacing=&#8221;default&#8221; desktop_text_alignment=&#8221;default&#8221; tablet_text_alignment=&#8221;default&#8221; phone_text_alignment=&#8221;default&#8221; background_color_opacity=&#8221;1&#8243; background_hover_color_opacity=&#8221;1&#8243; column_backdrop_filter=&#8221;none&#8221; column_shadow=&#8221;none&#8221; column_border_radius=&#8221;none&#8221; column_link_target=&#8221;_self&#8221; column_position=&#8221;default&#8221; gradient_direction=&#8221;left_to_right&#8221; overlay_strength=&#8221;0.3&#8243; width=&#8221;1\/1&#8243; tablet_width_inherit=&#8221;default&#8221; animation_type=&#8221;default&#8221; bg_image_animation=&#8221;none&#8221; border_type=&#8221;simple&#8221; column_border_width=&#8221;none&#8221; column_border_style=&#8221;solid&#8221;][vc_column_text]\n<h2><b>We Train ASR\u2019s<\/b><\/h2>\n[\/vc_column_text][vc_column_text]<span style=\"font-weight: 400;\">As technology advances and as more general transcribed audio becomes available on the net, ASR systems can scrape this data and self-train to a degree. We\u2019re currently working with a company that is actively doing this and has produced very good results\u2013but not great results. Consequently, they have come to us to acquire what is considered the gold standard in training data\u2013human transcribed and annotated audio to text. That human factor is what it takes to make a good ASR a much better ASR.<\/span>[\/vc_column_text][divider line_type=&#8221;No Line&#8221; custom_height=&#8221;20&#8243;][\/vc_column][\/vc_row][vc_row type=&#8221;full_width_background&#8221; full_screen_row_position=&#8221;middle&#8221; column_margin=&#8221;default&#8221; column_direction=&#8221;default&#8221; column_direction_tablet=&#8221;default&#8221; column_direction_phone=&#8221;default&#8221; bg_color=&#8221;#f7f7f7&#8243; scene_position=&#8221;center&#8221; top_padding=&#8221;5%&#8221; constrain_group_1=&#8221;yes&#8221; bottom_padding=&#8221;5%&#8221; text_color=&#8221;dark&#8221; text_align=&#8221;left&#8221; row_border_radius=&#8221;none&#8221; row_border_radius_applies=&#8221;bg&#8221; overflow=&#8221;visible&#8221; overlay_strength=&#8221;0.3&#8243; gradient_direction=&#8221;left_to_right&#8221; shape_divider_position=&#8221;bottom&#8221; bg_image_animation=&#8221;none&#8221; gradient_type=&#8221;default&#8221; shape_type=&#8221;&#8221;][vc_column column_padding=&#8221;no-extra-padding&#8221; column_padding_tablet=&#8221;inherit&#8221; column_padding_phone=&#8221;inherit&#8221; column_padding_position=&#8221;all&#8221; column_element_direction_desktop=&#8221;default&#8221; column_element_spacing=&#8221;default&#8221; desktop_text_alignment=&#8221;default&#8221; tablet_text_alignment=&#8221;default&#8221; phone_text_alignment=&#8221;default&#8221; background_color_opacity=&#8221;1&#8243; background_hover_color_opacity=&#8221;1&#8243; column_backdrop_filter=&#8221;none&#8221; column_shadow=&#8221;none&#8221; column_border_radius=&#8221;none&#8221; column_link_target=&#8221;_self&#8221; column_position=&#8221;default&#8221; gradient_direction=&#8221;left_to_right&#8221; overlay_strength=&#8221;0.3&#8243; width=&#8221;1\/1&#8243; tablet_width_inherit=&#8221;default&#8221; animation_type=&#8221;default&#8221; bg_image_animation=&#8221;none&#8221; border_type=&#8221;simple&#8221; column_border_width=&#8221;none&#8221; column_border_style=&#8221;solid&#8221;][vc_column_text]<span style=\"font-weight: 400;\">**<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Ledley RS, and Lusted LB (1959). &#8220;Reasoning foundations of medical diagnosis&#8221;. <\/span><i><span style=\"font-weight: 400;\">Science<\/span><\/i><span style=\"font-weight: 400;\">. <\/span><b>130<\/b><span style=\"font-weight: 400;\"> (3366): 9\u201321. <\/span><a href=\"https:\/\/en.wikipedia.org\/wiki\/Bibcode_(identifier)\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">Bibcode<\/span><\/a><span style=\"font-weight: 400;\">:<\/span><a href=\"https:\/\/ui.adsabs.harvard.edu\/abs\/1959Sci...130....9L\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">1959Sci&#8230;130&#8230;.9L<\/span><\/a><span style=\"font-weight: 400;\">. <\/span><a href=\"https:\/\/en.wikipedia.org\/wiki\/Doi_(identifier)\"><span style=\"font-weight: 400;\">doi<\/span><\/a><span style=\"font-weight: 400;\">:<\/span><a href=\"https:\/\/doi.org\/10.1126%2Fscience.130.3366.9\" target=\"_blank\" rel=\"noopener\" class=\"broken_link\"><span style=\"font-weight: 400;\">10.1126\/science.130.3366.9<\/span><\/a><span style=\"font-weight: 400;\">. <\/span><a href=\"https:\/\/en.wikipedia.org\/wiki\/PMID_(identifier)\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">PMID<\/span><\/a> <a href=\"https:\/\/pubmed.ncbi.nlm.nih.gov\/13668531\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">13668531<\/span><\/a><\/p>\n<p><span style=\"font-weight: 400;\">Weiss SM, Kulikowski CA, Amarel S, Safir A (1978). &#8220;A model-based method for computer-aided medical decision-making&#8221;. <\/span><i><span style=\"font-weight: 400;\">Artificial Intelligence<\/span><\/i><span style=\"font-weight: 400;\">. <\/span><b>11<\/b><span style=\"font-weight: 400;\"> (1\u20132): 145\u2013172. <\/span><a href=\"https:\/\/en.wikipedia.org\/wiki\/Doi_(identifier)\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">doi<\/span><\/a><span style=\"font-weight: 400;\">:<\/span><a href=\"https:\/\/doi.org\/10.1016%2F0004-3702%2878%2990015-2\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">10.1016\/0004-3702(78)90015-2<\/span><\/a>[\/vc_column_text][\/vc_column][\/vc_row]\n","protected":false},"excerpt":{"rendered":"<p>[vc_row type=&#8221;full_width_background&#8221; full_screen_row_position=&#8221;middle&#8221; column_margin=&#8221;default&#8221; column_direction=&#8221;default&#8221; column_direction_tablet=&#8221;default&#8221; column_direction_phone=&#8221;default&#8221; scene_position=&#8221;center&#8221; bottom_padding=&#8221;60px&#8221; text_color=&#8221;dark&#8221; text_align=&#8221;left&#8221; row_border_radius=&#8221;none&#8221; row_border_radius_applies=&#8221;bg&#8221; overflow=&#8221;visible&#8221; overlay_strength=&#8221;0.3&#8243; gradient_direction=&#8221;left_to_right&#8221; shape_divider_position=&#8221;bottom&#8221; bg_image_animation=&#8221;none&#8221; gradient_type=&#8221;default&#8221; shape_type=&#8221;&#8221;][vc_column column_padding=&#8221;no-extra-padding&#8221; column_padding_tablet=&#8221;inherit&#8221; column_padding_phone=&#8221;inherit&#8221; column_padding_position=&#8221;all&#8221; column_element_direction_desktop=&#8221;default&#8221; column_element_spacing=&#8221;default&#8221; desktop_text_alignment=&#8221;default&#8221; tablet_text_alignment=&#8221;default&#8221; phone_text_alignment=&#8221;default&#8221; background_color_opacity=&#8221;1&#8243;&#8230;<\/p>\n","protected":false},"author":7,"featured_media":16361,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"inline_featured_image":false,"footnotes":""},"categories":[188,4],"tags":[1031,1033,1032,20],"class_list":{"0":"post-16360","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-ai-technology-transcription","8":"category-blog","9":"tag-court","10":"tag-court-reporting","11":"tag-legal","12":"tag-transcription"},"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v23.3 (Yoast SEO v24.7) - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Why Annotated Data is So Important to Machine Learning - TranscribeMe<\/title>\n<meta name=\"description\" content=\"Structured annotated datasets help to feed &amp; build machine learning models. Even ASR (automatic speech recognition) systems need transcribed &amp; annotated data.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.transcribeme.com\/blog\/why-annotated-data-is-important-for-machine-learning\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Why Annotated Data is So Important to Machine Learning\" \/>\n<meta property=\"og:description\" content=\"Structured annotated datasets help to feed &amp; build machine learning models. Even ASR (automatic speech recognition) systems need transcribed &amp; annotated data.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.transcribeme.com\/blog\/why-annotated-data-is-important-for-machine-learning\/\" \/>\n<meta property=\"og:site_name\" content=\"TranscribeMe\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/TranscribeMe\/\" \/>\n<meta property=\"article:published_time\" content=\"2023-07-26T20:33:13+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2024-06-28T01:57:58+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.transcribeme.com\/wp-content\/uploads\/2023\/07\/tme-data.webp\" \/>\n\t<meta property=\"og:image:width\" content=\"1920\" \/>\n\t<meta property=\"og:image:height\" content=\"949\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/webp\" \/>\n<meta name=\"author\" content=\"Transcribe Me\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@transcribeme\" \/>\n<meta name=\"twitter:site\" content=\"@transcribeme\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Transcribe Me\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"7 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/www.transcribeme.com\/blog\/why-annotated-data-is-important-for-machine-learning\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/www.transcribeme.com\/blog\/why-annotated-data-is-important-for-machine-learning\/\"},\"author\":{\"name\":\"Transcribe Me\",\"@id\":\"https:\/\/www.transcribeme.com\/#\/schema\/person\/632cda4e18ad799c64ebcfa85ca09c22\"},\"headline\":\"Why Annotated Data is So Important to Machine Learning\",\"datePublished\":\"2023-07-26T20:33:13+00:00\",\"dateModified\":\"2024-06-28T01:57:58+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/www.transcribeme.com\/blog\/why-annotated-data-is-important-for-machine-learning\/\"},\"wordCount\":2148,\"publisher\":{\"@id\":\"https:\/\/www.transcribeme.com\/#organization\"},\"image\":{\"@id\":\"https:\/\/www.transcribeme.com\/blog\/why-annotated-data-is-important-for-machine-learning\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.transcribeme.com\/wp-content\/uploads\/2023\/07\/tme-data.webp\",\"keywords\":[\"Court\",\"Court Reporting\",\"Legal\",\"transcription\"],\"articleSection\":[\"AI Technology &amp; Transcription\",\"Blog\"],\"inLanguage\":\"en\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.transcribeme.com\/blog\/why-annotated-data-is-important-for-machine-learning\/\",\"url\":\"https:\/\/www.transcribeme.com\/blog\/why-annotated-data-is-important-for-machine-learning\/\",\"name\":\"Why Annotated Data is So Important to Machine Learning - TranscribeMe\",\"isPartOf\":{\"@id\":\"https:\/\/www.transcribeme.com\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/www.transcribeme.com\/blog\/why-annotated-data-is-important-for-machine-learning\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/www.transcribeme.com\/blog\/why-annotated-data-is-important-for-machine-learning\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.transcribeme.com\/wp-content\/uploads\/2023\/07\/tme-data.webp\",\"datePublished\":\"2023-07-26T20:33:13+00:00\",\"dateModified\":\"2024-06-28T01:57:58+00:00\",\"description\":\"Structured annotated datasets help to feed & build machine learning models. Even ASR (automatic speech recognition) systems need transcribed & annotated data.\",\"breadcrumb\":{\"@id\":\"https:\/\/www.transcribeme.com\/blog\/why-annotated-data-is-important-for-machine-learning\/#breadcrumb\"},\"inLanguage\":\"en\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.transcribeme.com\/blog\/why-annotated-data-is-important-for-machine-learning\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en\",\"@id\":\"https:\/\/www.transcribeme.com\/blog\/why-annotated-data-is-important-for-machine-learning\/#primaryimage\",\"url\":\"https:\/\/www.transcribeme.com\/wp-content\/uploads\/2023\/07\/tme-data.webp\",\"contentUrl\":\"https:\/\/www.transcribeme.com\/wp-content\/uploads\/2023\/07\/tme-data.webp\",\"width\":1920,\"height\":949,\"caption\":\"Why Structured & Annotated Data is So Important to Machine Learning\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.transcribeme.com\/blog\/why-annotated-data-is-important-for-machine-learning\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.transcribeme.com\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Why Annotated Data is So Important to Machine Learning\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.transcribeme.com\/#website\",\"url\":\"https:\/\/www.transcribeme.com\/\",\"name\":\"TranscribeMe\",\"description\":\"The most accurate transcription starting at $0.79 per minute\",\"publisher\":{\"@id\":\"https:\/\/www.transcribeme.com\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.transcribeme.com\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.transcribeme.com\/#organization\",\"name\":\"TranscribeMe.com\",\"url\":\"https:\/\/www.transcribeme.com\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en\",\"@id\":\"https:\/\/www.transcribeme.com\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/www.transcribeme.com\/wp-content\/uploads\/2021\/09\/featured-image-thumb.jpg\",\"contentUrl\":\"https:\/\/www.transcribeme.com\/wp-content\/uploads\/2021\/09\/featured-image-thumb.jpg\",\"width\":512,\"height\":512,\"caption\":\"TranscribeMe.com\"},\"image\":{\"@id\":\"https:\/\/www.transcribeme.com\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/TranscribeMe\/\",\"https:\/\/x.com\/transcribeme\",\"https:\/\/www.linkedin.com\/company\/transcribeme\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.transcribeme.com\/#\/schema\/person\/632cda4e18ad799c64ebcfa85ca09c22\",\"name\":\"Transcribe Me\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en\",\"@id\":\"https:\/\/www.transcribeme.com\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/edb71dcbf6cd2a48f0eb4e9030185de7d39db37c0c53f317d6aadf73b387973b?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/edb71dcbf6cd2a48f0eb4e9030185de7d39db37c0c53f317d6aadf73b387973b?s=96&d=mm&r=g\",\"caption\":\"Transcribe Me\"}}]}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Why Annotated Data is So Important to Machine Learning - TranscribeMe","description":"Structured annotated datasets help to feed & build machine learning models. Even ASR (automatic speech recognition) systems need transcribed & annotated data.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.transcribeme.com\/blog\/why-annotated-data-is-important-for-machine-learning\/","og_locale":"en_US","og_type":"article","og_title":"Why Annotated Data is So Important to Machine Learning","og_description":"Structured annotated datasets help to feed & build machine learning models. Even ASR (automatic speech recognition) systems need transcribed & annotated data.","og_url":"https:\/\/www.transcribeme.com\/blog\/why-annotated-data-is-important-for-machine-learning\/","og_site_name":"TranscribeMe","article_publisher":"https:\/\/www.facebook.com\/TranscribeMe\/","article_published_time":"2023-07-26T20:33:13+00:00","article_modified_time":"2024-06-28T01:57:58+00:00","og_image":[{"width":1920,"height":949,"url":"https:\/\/www.transcribeme.com\/wp-content\/uploads\/2023\/07\/tme-data.webp","type":"image\/webp"}],"author":"Transcribe Me","twitter_card":"summary_large_image","twitter_creator":"@transcribeme","twitter_site":"@transcribeme","twitter_misc":{"Written by":"Transcribe Me","Est. reading time":"7 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.transcribeme.com\/blog\/why-annotated-data-is-important-for-machine-learning\/#article","isPartOf":{"@id":"https:\/\/www.transcribeme.com\/blog\/why-annotated-data-is-important-for-machine-learning\/"},"author":{"name":"Transcribe Me","@id":"https:\/\/www.transcribeme.com\/#\/schema\/person\/632cda4e18ad799c64ebcfa85ca09c22"},"headline":"Why Annotated Data is So Important to Machine Learning","datePublished":"2023-07-26T20:33:13+00:00","dateModified":"2024-06-28T01:57:58+00:00","mainEntityOfPage":{"@id":"https:\/\/www.transcribeme.com\/blog\/why-annotated-data-is-important-for-machine-learning\/"},"wordCount":2148,"publisher":{"@id":"https:\/\/www.transcribeme.com\/#organization"},"image":{"@id":"https:\/\/www.transcribeme.com\/blog\/why-annotated-data-is-important-for-machine-learning\/#primaryimage"},"thumbnailUrl":"https:\/\/www.transcribeme.com\/wp-content\/uploads\/2023\/07\/tme-data.webp","keywords":["Court","Court Reporting","Legal","transcription"],"articleSection":["AI Technology &amp; Transcription","Blog"],"inLanguage":"en"},{"@type":"WebPage","@id":"https:\/\/www.transcribeme.com\/blog\/why-annotated-data-is-important-for-machine-learning\/","url":"https:\/\/www.transcribeme.com\/blog\/why-annotated-data-is-important-for-machine-learning\/","name":"Why Annotated Data is So Important to Machine Learning - TranscribeMe","isPartOf":{"@id":"https:\/\/www.transcribeme.com\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.transcribeme.com\/blog\/why-annotated-data-is-important-for-machine-learning\/#primaryimage"},"image":{"@id":"https:\/\/www.transcribeme.com\/blog\/why-annotated-data-is-important-for-machine-learning\/#primaryimage"},"thumbnailUrl":"https:\/\/www.transcribeme.com\/wp-content\/uploads\/2023\/07\/tme-data.webp","datePublished":"2023-07-26T20:33:13+00:00","dateModified":"2024-06-28T01:57:58+00:00","description":"Structured annotated datasets help to feed & build machine learning models. Even ASR (automatic speech recognition) systems need transcribed & annotated data.","breadcrumb":{"@id":"https:\/\/www.transcribeme.com\/blog\/why-annotated-data-is-important-for-machine-learning\/#breadcrumb"},"inLanguage":"en","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.transcribeme.com\/blog\/why-annotated-data-is-important-for-machine-learning\/"]}]},{"@type":"ImageObject","inLanguage":"en","@id":"https:\/\/www.transcribeme.com\/blog\/why-annotated-data-is-important-for-machine-learning\/#primaryimage","url":"https:\/\/www.transcribeme.com\/wp-content\/uploads\/2023\/07\/tme-data.webp","contentUrl":"https:\/\/www.transcribeme.com\/wp-content\/uploads\/2023\/07\/tme-data.webp","width":1920,"height":949,"caption":"Why Structured & Annotated Data is So Important to Machine Learning"},{"@type":"BreadcrumbList","@id":"https:\/\/www.transcribeme.com\/blog\/why-annotated-data-is-important-for-machine-learning\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.transcribeme.com\/"},{"@type":"ListItem","position":2,"name":"Why Annotated Data is So Important to Machine Learning"}]},{"@type":"WebSite","@id":"https:\/\/www.transcribeme.com\/#website","url":"https:\/\/www.transcribeme.com\/","name":"TranscribeMe","description":"The most accurate transcription starting at $0.79 per minute","publisher":{"@id":"https:\/\/www.transcribeme.com\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.transcribeme.com\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en"},{"@type":"Organization","@id":"https:\/\/www.transcribeme.com\/#organization","name":"TranscribeMe.com","url":"https:\/\/www.transcribeme.com\/","logo":{"@type":"ImageObject","inLanguage":"en","@id":"https:\/\/www.transcribeme.com\/#\/schema\/logo\/image\/","url":"https:\/\/www.transcribeme.com\/wp-content\/uploads\/2021\/09\/featured-image-thumb.jpg","contentUrl":"https:\/\/www.transcribeme.com\/wp-content\/uploads\/2021\/09\/featured-image-thumb.jpg","width":512,"height":512,"caption":"TranscribeMe.com"},"image":{"@id":"https:\/\/www.transcribeme.com\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/TranscribeMe\/","https:\/\/x.com\/transcribeme","https:\/\/www.linkedin.com\/company\/transcribeme"]},{"@type":"Person","@id":"https:\/\/www.transcribeme.com\/#\/schema\/person\/632cda4e18ad799c64ebcfa85ca09c22","name":"Transcribe Me","image":{"@type":"ImageObject","inLanguage":"en","@id":"https:\/\/www.transcribeme.com\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/edb71dcbf6cd2a48f0eb4e9030185de7d39db37c0c53f317d6aadf73b387973b?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/edb71dcbf6cd2a48f0eb4e9030185de7d39db37c0c53f317d6aadf73b387973b?s=96&d=mm&r=g","caption":"Transcribe Me"}}]}},"_links":{"self":[{"href":"https:\/\/www.transcribeme.com\/wp-json\/wp\/v2\/posts\/16360","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.transcribeme.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.transcribeme.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.transcribeme.com\/wp-json\/wp\/v2\/users\/7"}],"replies":[{"embeddable":true,"href":"https:\/\/www.transcribeme.com\/wp-json\/wp\/v2\/comments?post=16360"}],"version-history":[{"count":0,"href":"https:\/\/www.transcribeme.com\/wp-json\/wp\/v2\/posts\/16360\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.transcribeme.com\/wp-json\/wp\/v2\/media\/16361"}],"wp:attachment":[{"href":"https:\/\/www.transcribeme.com\/wp-json\/wp\/v2\/media?parent=16360"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.transcribeme.com\/wp-json\/wp\/v2\/categories?post=16360"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.transcribeme.com\/wp-json\/wp\/v2\/tags?post=16360"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}