Subtitles section Play video Print subtitles Hi and welcome to this data science news special! As a data science professional or at least an enthusiast, you probably have Pandas in your heart - Python’s primary library for data analysis and manipulation. Okay. What you may not have heard already is that Pandas 1.0.0 was officially released! Although at first sight this latest version is not much different for the user than the previous release starting with a 0: 0.25.3, there are plenty of enhanced features that boost performance and lay a better foundation in the long run. They represent 1.0.0 as a stable version of pandas with a strengthened API, which has also been cleaned of many prior version deprecations. Here are the most notable improvements that come with 1.0.0. One. The dedicated string and Boolean data types These features are still “experimental”, which means that further improvements are expected to happen in the near future. So, as of yet, pandas will not automatically assign “string” or “bool” to your data. This can only happen if you explicitly specify dtype=”string” or dtype=”bool” while creating a new structure. However, in the future, this may become the default way in which pandas treats data of this type. We’ll just have to wait and see. Also, you must consider the benefit of having the new “string” data type. For example, until now, pandas would treat a date value and a string value as “object”. Using “string” allows you to distinguish between the two, so now you can select and manipulate string data much more easily. Which leads us to the second point worth mentioning. Two. The .select_dtypes() method is much quicker now! It relies on vectorization instead of iterating over a loop. So, you can run .select_dtypes(“string”) to pull all string values, or .select_dtypes(“bool”) to retrieve the Boolean data from a DataFrame, provided that you have set them as such beforehand. Three. We now can enjoy the pandas.NA scalar that denotes missing values. Using pandas.NA is a new concept in the scientific ecosystem of Python, and its goal is to provide an indicator for missing values that can be used consistently and successfully across data types. That said, this feature is currently “experimental”, too. The reason is that it is yet to be further verified how it will intertwine with the simultaneous work of other packages such as NumPy. Four. A method that will convert the data types of columns containing such null values has been introduced – .convert_dtypes(). Five. The well-known .info() has been improved. It is much more readable and this does help you to explore your data in a quicker and more efficient way. Six. Now we also have the “to_markdown()” – this new method allows you to display a Series or DataFrame object as a markdown table. So overall, a lot has been done but mainly on the backend. For everyday users like us, the development of clear data types, consistent with other libraries is surely the most prominent improvement. In any case, it is worth checking the official release notes for more information before you start using 1.0.0. There you can find out more about the changes related to using such features as the .sort_index() or .sort_values() methods and many more. Finally, note that you need at least Python 3.6.1 to use this new version. If you are just starting to learn pandas, don’t forget to check the link in the description. If not, ‘pip install --upgrade pandas’ and have fun!
B1 string data version boolean method experimental Pandas 1.0.0 – 6 key features in the new version 1 0 林宜悉 posted on 2020/03/09 More Share Save Report Video vocabulary