Front page | perl.perl6.users |
Postings from July 2020
Re: Baby steps to create a dataframe structure
From: Ralph Mellor
July 29, 2020 14:28
Re: Baby steps to create a dataframe structure
Message ID: CAPLR5ScnSNAprXjzeg3p=enf2G+e44a_-pM-BPNcOTKK=2MkOg@mail.gmail.com
The first half of this email is some StackOverflow links and commentary.
The second half is some specific responses to your questions.
There's a search field at the top of those pages; make sure you keep
the [raku] *tag* (tags go in square brackets) then add whatever else
you feel like searching for. What you search for will search all Qs and
all As but not their comments. (IIrc there is a way to search comments
but it's obscure.) If you wish to search for a phrase use quotes, eg "foo bar".
One of many important things JJ has done imo was to encourage use of SO.
I have personally responded to that call, and the increased rate and quality
of my own SO answers in the last couple years was a direct response to
something JJ wrote (a tweet iirc). I'm also about ready to start asking Qs too.
Yes, SO is worryingly proprietary, and may simply disappear, in a single
moment due to a corporate decision, as a good place to store content.
And some folk refuse to use it for this and other good reasons. But for
now it's a large store of Q+A format info on a growing range of raku topics.
I'd say at this point it's of roughly equaly importance and value as the
docs, ignoring the valuable security we have with the docs, i.e. knowing
that JJ++ and others' doc contributions are under a classic FOSS license.
Anyhow, my point is that it's wise to at least search SO's [raku] tag even
if you don't wish to contribute Qs or As, and if you have *appropriate* Qs
it's a good place to ask questions and get decent answers most of the time.
> How I define a data structure: an array of arrays?
(Note that this would not be a "good" Q for SO without careful
elaboration because it's too vague and generic. That said, feel
free to post Qs regardless of their quality, provided you're willing
o be patient with folk editing your Qs, or altering their status, or
commenting on them to guide folk toward asking "good" Qs.)
The following focuses on conventional programming language
terminology. I think R has some unusual terminology but I will
ignore it for the following and focus on the conventional use of
the terms you've asked about.
A "data structure" is just data that isn't a scalar. A scalar is just
a guaranteed single atomic thing, at the level one is viewing the
So if you have a pair of things, that's a data structure, and if
you have a list of things, where the number of things can be
zero, one, or more, then even if there's just one thing in the
list, that's still a data structure. Same is true even if there's
*nothing* in the list; an empty list is still a data structure.
(It's possible that data that's a scalar from one perspective is
a data structure from another. Consider a pizza box. Viewed
from the outside it's a single atomic thing -- a pizza box. But
it may be possible to open it up, and there's a good chance
that inside it contains 6 pieces of pizza. The same sort of
thing can be true of a scalar piece of data.)
If a data structure is a 2D row/col thing then an array of arrays
would be one way to model that if by "array" you just mean an
integer indexed list of data. (And again, this could be a data
structure even if there were zero rows and zero columns.)
In Raku I could well imagine that a multidimensional native
string array would be a good choice for data imported from
a CSV file and maintained in raw form.
> Given the raku itself (and maybe some already existing
> packages) what the structures and functions I may use.
I think others have given you some good starting thoughts.
Be ready to jump on SO and ask questions when you have
very specific examples of things to do.
> df.column1 ... return a list of values on this column
That thought should be a topic in its own right. So either a
fresh email or an SO Q (or both, linking in both directions)
on nothing more than that focused thought, with at least
one highly specific and well chosen example. There are a
huge variety of sub-topics from just that one thought, though
a specific example should help somewhat narrow things.
> when it read the delim file it should check each column type
Again. that's a huge topic that should be asked as a
completely distinct Q with a good example.
On Wed, Jul 22, 2020 at 12:42 AM Aureliano Guedes
> Hi all,
> I'd like to learn Raku deep enough to build a data structure. I have experience with Perl5, Python, R, and even C/C++, then I get boring feelings to learn something new from the beginning. Also, I prefer learning a new language by applying f to something.
> Since I work with data analysis and data science, I'd like to try to develop a data structure to dataframe in pure Raku. And if I do a basic but useful thing capable to load a field delimited file (as CSV or TSV) into a dataframe, I'll transform in a package and upload it to GitHub to comparatively enhance the package.
> What I need is suggestions for how do I start it.
> - How I define a data structure: an array of arrays?
> - Given the raku itself (and maybe some already existing packages) what the structures and functions I may use.
> I got these ideas to start:
> The dataframe should support columns name to be called as:
> and it should return a list of values on this column.
> Also, when it read the delim file it should check each column type.
> All suggestions are welcome.
> Aureliano Guedes
> skype: aureliano.guedes
> contato: (11) 94292-6110
> whatsapp +5511942926110